Project

General

Profile

Bug #3095

segfault in t_vcm initialization

Added by Szilárd Páll 30 days ago. Updated 2 days ago.

Status:
Rejected
Priority:
High
Assignee:
-
Category:
mdrun
Target version:
Affected version - extra info:
8d8cd4851d
Affected version:
Difficulty:
uncategorized
Close

Description

Thread 1 "gmx" received signal SIGSEGV, Segmentation fault.
0x00007ffff6fc4a20 in t_vcm::t_vcm (this=0x7fffffff8a68, groups=..., ir=...)
    at /home/pszilard/projects/gromacs/gromacs-master/src/gromacs/mdlib/vcm.cpp:96
96                group_ndf[g]  = ir.opts.nrdf[g];
(gdb) bt 
#0  0x00007ffff6fc4a20 in t_vcm::t_vcm (this=0x7fffffff8a68, groups=..., ir=...)
    at /home/pszilard/projects/gromacs/gromacs-master/src/gromacs/mdlib/vcm.cpp:96
#1  0x00007ffff70cd40f in gmx::LegacySimulator::do_md (this=0x2002030)
    at /home/pszilard/projects/gromacs/gromacs-master/src/gromacs/mdrun/md.cpp:485
#2  0x00007ffff70cab70 in gmx::LegacySimulator::run (this=0x2002030)
    at /home/pszilard/projects/gromacs/gromacs-master/src/gromacs/mdrun/legacysimulator.cpp:72
#3  0x00007ffff70fde62 in gmx::Mdrunner::mdrunner (this=0x7fffffffc170)
    at /home/pszilard/projects/gromacs/gromacs-master/src/gromacs/mdrun/runner.cpp:1580
#4  0x00000000004134d5 in gmx::gmx_mdrun (argc=3, argv=0x7fffffffd370)
    at /home/pszilard/projects/gromacs/gromacs-master/src/programs/mdrun/mdrun.cpp:269
#5  0x00007ffff6a1e5c4 in gmx::(anonymous namespace)::CMainCommandLineModule::run (this=0x652460, argc=3, 
    argv=0x7fffffffd370)
    at /home/pszilard/projects/gromacs/gromacs-master/src/gromacs/commandline/cmdlinemodulemanager.cpp:133
#6  0x00007ffff6a1ddd2 in gmx::CommandLineModuleManager::run (this=0x7fffffffd260, argc=3, argv=0x7fffffffd370)
    at /home/pszilard/projects/gromacs/gromacs-master/src/gromacs/commandline/cmdlinemodulemanager.cpp:589
#7  0x0000000000410433 in main (argc=4, argv=0x7fffffffd368)
    at /home/pszilard/projects/gromacs/gromacs-master/src/programs/gmx.cpp:60
topol.tpr (1.57 MB) topol.tpr Szilárd Páll, 09/18/2019 01:53 PM
md.log (13.2 KB) md.log Szilárd Páll, 09/18/2019 03:10 PM

History

#1 Updated by Berk Hess 30 days ago

I can't reproduce this. Can you attach a log file?

#2 Updated by Szilárd Páll 30 days ago

#3 Updated by Berk Hess 30 days ago

I ran valgrind and it doesn't complain.
But after reinstalling my operating system I lost OpenCL. How do I get libOpenCL.so again?

PS: Why does the log file say that it's using 4x4 kernels and lists?

#4 Updated by Szilárd Páll 30 days ago

Berk Hess wrote:

I ran valgrind and it doesn't complain.
But after reinstalling my operating system I lost OpenCL. How do I get libOpenCL.so again?

On ubuntu it is in the ocl-icd-opencl package. However I OpenCL or GPU is not necessary, it reproduces in a simple CPU run too.

PS: Why does the log file say that it's using 4x4 kernels and lists?

Because it's Intel OpenCL with cluster size of 4.

#5 Updated by Berk Hess 30 days ago

But I can't reproduce it and valgrind doesn't complain.

#6 Updated by Szilárd Páll 29 days ago

Not sure what the issue is, I reproduced it on multiple machines now.
I'll try to look into it, but realistically only when the release rush is over.

#7 Updated by Paul Bauer 24 days ago

  • Target version changed from 2020-beta1 to 2020-beta2

bumped

#8 Updated by Paul Bauer 21 days ago

just reproduced it myself and caught it in valgrind, will check what happens

#9 Updated by Paul Bauer 21 days ago

  • Status changed from New to Accepted

#10 Updated by Paul Bauer 21 days ago

so, this is something really weird going on. The issue shows up because ir->opts.ngtc is zero in the TPR, with the result that the evaluation

group_ndf[g]  = ir.opts.nrdf[g]; (gromacs/mdlib/vcm.cpp:587)

segfaults, as ir.opts.nrdf has the size of ir->opts.ngtc.

Checking the TPR I saw that no temperature coupling groups are defined, and such also no degrees of freedom.
I'll bisect back to the version where you generated the TPR, to see if this got introduced recently.

#11 Updated by Paul Bauer 21 days ago

  • Status changed from Accepted to Blocked, need info

Ok, this input even segfaults when running in the same version that created it

#12 Updated by Paul Bauer 2 days ago

  • Status changed from Blocked, need info to Rejected

doesn't seem to be an actual issue

Also available in: Atom PDF