Project

General

Profile

Bug #2197

Regression (mdrun segfault) after 1ecd43b0034

Added by Vedran Miletic over 2 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
1ecd43b0034 onwards
Affected version:
Difficulty:
uncategorized
Close

Description

Since:

commit 1ecd43b00346806a549ba4c0258206fb5fb29aeb
Author: Berk Hess <hess@kth.se>
Date:   Tue Oct 18 15:09:34 2016 +0200

    Introduce ObservablesHistory container

    Introduces a ObservablesHistory class and moved energyhistory_t,
    edsamstate_t and swapstate_t into this.

    TODO: Move more observables history from t_state into this container.

    Also added documentation in state.h.

    Part of #2059.

    Change-Id: Ic1efd95c5be2dede137763bfd24e3fb7d676eadd

gmx mdrun using the attached files run like:

gmx grompp -f acacaca_3-5_md2.mdp -c acacaca_3-5_md2.gro -p acacaca_3-5.top -po acacaca_3-5_md2out.mdp -o acacaca_3-5_md2.tpr -renum
gmx mdrun -s acacaca_3-5_md2.tpr -o acacaca_3-5_md2.trr -cpo acacaca_3-5_md2.cpt -c acacaca_3-5_md2out.gro -e acacaca_3-5_md2.edr -g acacaca_3-5_md2.log -nsteps -2 -cpi acacaca_3-5_md1.cpt -noappend -x acacaca_3-5_md2.xtc -v -tunepme

will segfault with:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff627fc2d in do_cpt_enerhist (xd=0x680a00, bRead=1, fflags=0, enerhist=0x0, list=0x0) at /home/miletivn/workspace/gromacs-reactivemd/src/gromacs/fileio/checkpoint.cpp:1259
1259            enerhist->nsteps     = 0;
Missing separate debuginfos, use: dnf debuginfo-install fftw-libs-single-3.3.5-3.fc25.x86_64 libgcc-6.3.1-1.fc25.x86_64 libgfortran-6.3.1-1.fc25.x86_64 libgomp-6.3.1-1.fc25.x86_64 libquadmath-6.3.1-1.fc25.x86_64 libstdc++-6.3.1-1.fc25.x86_64 openblas-0.2.19-4.fc25.x86_64
(gdb) backtrace
#0  0x00007ffff627fc2d in do_cpt_enerhist (xd=0x680a00, bRead=1, fflags=0, enerhist=0x0, list=0x0) at /home/miletivn/workspace/gromacs-reactivemd/src/gromacs/fileio/checkpoint.cpp:1259
#1  0x00007ffff6282a02 in read_checkpoint (fn=0x68cb30 "acacaca_3-5_md1.cpt", pfplog=0x7fffffffbfe0, cr=0x6812a0, dd_nc=0x7fffffffc800, npme=0x7fffffffc710, eIntegrator=0, init_fep_state=0x681410, step=0x7fffffffbf58, t=0x7fffffffbf50, state=0x67fa70, bReadEkin=0x7fffffffc00c, 
    observablesHistory=0x7fffffffc020, simulation_part=0x7fffffffc080, bAppendOutputFiles=0, bForceAppend=2097152, reproducibilityRequested=0) at /home/miletivn/workspace/gromacs-reactivemd/src/gromacs/fileio/checkpoint.cpp:2156
#2  0x00007ffff628354f in load_checkpoint (fn=0x68cb30 "acacaca_3-5_md1.cpt", fplog=0x7fffffffbfe0, cr=0x6812a0, dd_nc=0x7fffffffc800, npme=0x7fffffffc710, ir=0x7fffffffc070, state=0x67fa70, bReadEkin=0x7fffffffc00c, observablesHistory=0x7fffffffc020, bAppend=0, bForceAppend=2097152, 
    reproducibilityRequested=0) at /home/miletivn/workspace/gromacs-reactivemd/src/gromacs/fileio/checkpoint.cpp:2386
#3  0x00000000004390ca in gmx::mdrunner (hw_opt=0x7fffffffcdb0, fplog=0x67eda0, cr=0x6812a0, nfile=33, fnm=0x7fffffffcf70, oenv=0x688880, bVerbose=1, nstglobalcomm=-1, ddxyz=0x7fffffffc800, dd_rank_order=1, npme=0, rdd=0, rconstr=0, dddlb_opt=0x43d4cf "auto", dlb_scale=0.800000012, 
    ddcsx=0x0, ddcsy=0x0, ddcsz=0x0, nbpu_opt=0x43d4cf "auto", nstlist_cmdline=0, nsteps_cmdline=-2, nstepout=100, resetstep=-1, nmultisim=0, repl_ex_nst=0, repl_ex_nex=0, repl_ex_seed=-1, pforce=-1, cpt_period=15, max_hours=-1, imdport=8888, Flags=3415040)
    at /home/miletivn/workspace/gromacs-reactivemd/src/programs/mdrun/runner.cpp:1024
#4  0x0000000000427cf2 in gmx_mdrun (argc=22, argv=0x7fffffffdcb0) at /home/miletivn/workspace/gromacs-reactivemd/src/programs/mdrun/mdrun.cpp:551
#5  0x00007ffff616d137 in gmx::(anonymous namespace)::CMainCommandLineModule::run (this=0x677a80, argc=22, argv=0x7fffffffdcb0) at /home/miletivn/workspace/gromacs-reactivemd/src/gromacs/commandline/cmdlinemodulemanager.cpp:133
#6  0x00007ffff616ec47 in gmx::CommandLineModuleManager::run (this=0x7fffffffdb80, argc=22, argv=0x7fffffffdcb0) at /home/miletivn/workspace/gromacs-reactivemd/src/gromacs/commandline/cmdlinemodulemanager.cpp:583
#7  0x000000000041bb7c in main (argc=23, argv=0x7fffffffdca8) at /home/miletivn/workspace/gromacs-reactivemd/src/programs/gmx.cpp:60
(gdb)
acacaca_3-5_md2.tar.gz (353 KB) acacaca_3-5_md2.tar.gz Vedran Miletic, 06/01/2017 04:23 PM

Associated revisions

Revision cd6b63db (diff)
Added by Berk Hess over 2 years ago

Fix checkpoint reading without energy history

When reading a checkpoint file without energy history a null pointer
could be dereferenced.

Fixes #2197.

Change-Id: Icf0e3022a62bdbcd890d1c79c4a0f6c748df4402

History

#1 Updated by Berk Hess over 2 years ago

I don't get a segfault with 9f989798b3d44e6bdada80323447b24fcfa5ff46
Or do I need to put in a (old?) .cpi file that you did not attach?

#2 Updated by Roland Schulz over 2 years ago

You might also want to attach the log file so that we see compiler version, SIMD instructions, and number of threads.

#3 Updated by Gerrit Code Review Bot over 2 years ago

Gerrit received a related patchset '1' for Issue #2197.
Uploader: Berk Hess ()
Change-Id: gromacs~master~Icf0e3022a62bdbcd890d1c79c4a0f6c748df4402
Gerrit URL: https://gerrit.gromacs.org/6685

#4 Updated by Berk Hess over 2 years ago

  • Status changed from New to Feedback wanted
  • Target version set to 2018

I can't reproduce this, but did notice an issue in the code.
Vedran, please check if my fix fixes your issue.

#5 Updated by Vedran Miletic over 2 years ago

Berk Hess wrote:

I don't get a segfault with 9f989798b3d44e6bdada80323447b24fcfa5ff46
Or do I need to put in a (old?) .cpi file that you did not attach?

Indeed, I forgot it, sorry for that. Anyhow, your patch on Gerrit fixes the problem!

#6 Updated by Berk Hess over 2 years ago

  • Status changed from Feedback wanted to Fix uploaded

#7 Updated by Berk Hess over 2 years ago

  • Status changed from Fix uploaded to Resolved

#8 Updated by Berk Hess over 2 years ago

#9 Updated by Berk Hess over 2 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF