Project

General

Profile

Bug #1955

Segmentation fault when minimizing box of water

Added by James Barnett over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

GROMACS:      gmx, version 2016-dev-20160510-e7e35d3-unknown
Executable:   /usr/sbin/gmx
Data prefix:  /usr
Command line:
  gmx --version

GROMACS version:    2016-dev-20160510-e7e35d3-unknown
GIT SHA1 hash:      e7e35d318984eb34901f5215482eba8bd71841e7
Branched from:      unknown
Precision:          single
Memory model:       64 bit
MPI library:        MPI
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 32)
GPU support:        disabled
SIMD instructions:  AVX2_256
FFT library:        fftw-3.3.4-sse2-avx
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      hwloc-1.11.0
Tracing support:    disabled
Built on:           Wed May 11 16:48:38 UTC 2016
Built by:           wes@cfe808fa184c [CMAKE]
Build OS/arch:      Linux 4.5.2-1-ARCH x86_64
Build CPU vendor:   Intel
Build CPU brand:    Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Build CPU family:   6   Model: 60   Stepping: 3
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler:         /usr/sbin/cc GNU 6.1.1
C compiler flags:    -march=core-avx2   -march=native -mtune=generic -O2 -pipe -fstack-protector-strong  -Wundef -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value -Wunused-parameter  -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds 
C++ compiler:       /usr/sbin/c++ GNU 6.1.1
C++ compiler flags:  -march=core-avx2   -march=native -mtune=generic -O2 -pipe -fstack-protector-strong  -std=c++0x  -Wundef -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wall -Wno-unused-function  -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds 
GROMACS:      gmx grompp, version 2016-dev-20160510-e7e35d3-unknown
Executable:   /usr/sbin/gmx
Data prefix:  /usr
Command line:
  gmx grompp -f mdp/min.mdp -v -o min

checking input for internal consistency...
Setting the LD random seed to -432366476
processing topology...
Generated 2211 of the 2211 non-bonded parameter combinations
Generating 1-4 interactions: fudge = 0.5
Generated 2211 of the 2211 1-4 parameter combinations
Excluding 2 bonded neighbours molecule type 'SOL'
turning H bonds into constraints...
processing coordinates...
double-checking input for internal consistency...
Cleaning up constraints and constant bonded interactions with virtual sites
Removing all charge groups because cutoff-scheme=Verlet
renumbering atomtypes...
converting bonded parameters...
initialising group options...
processing index file...
Analysing residue names:
There are:   909      Water residues
Making dummy/rest group for T-Coupling containing 3636 elements
Making dummy/rest group for Acceleration containing 3636 elements
Making dummy/rest group for Freeze containing 3636 elements
Making dummy/rest group for Energy Mon. containing 3636 elements
Making dummy/rest group for VCM containing 3636 elements
Number of degrees of freedom in T-Coupling group rest is 5451.00
Making dummy/rest group for User1 containing 3636 elements
Making dummy/rest group for User2 containing 3636 elements
Making dummy/rest group for Compressed X containing 3636 elements
Making dummy/rest group for Or. Res. Fit containing 3636 elements
Making dummy/rest group for QMMM containing 3636 elements
T-Coupling       has 1 element(s): rest
Energy Mon.      has 1 element(s): rest
Acceleration     has 1 element(s): rest
Freeze           has 1 element(s): rest
User1            has 1 element(s): rest
User2            has 1 element(s): rest
VCM              has 1 element(s): rest
Compressed X     has 1 element(s): rest
Or. Res. Fit     has 1 element(s): rest
QMMM             has 1 element(s): rest
Checking consistency between energy and charge groups...
Calculating fourier grid dimensions for X Y Z
Using a fourier grid of 25x25x25, spacing 0.120 0.120 0.120
Estimate for the relative computational load of the PME mesh part: 0.20
This run will generate roughly 2 Mb of data
writing run input file...
GROMACS:      gmx mdrun, version 2016-dev-20160510-e7e35d3-unknown
Executable:   /usr/sbin/gmx
Data prefix:  /usr
Command line:
  gmx mdrun -deffnm min -v

Running on 1 node with total 4 cores, 8 logical cores
Hardware detected on host cfe808fa184c (the node of MPI rank 0):
  CPU info:
    Vendor: Intel
    Brand:  Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
    SIMD instructions most likely to fit this hardware: AVX2_256
    SIMD instructions selected at GROMACS compile time: AVX2_256

  Hardware topology: Full, with devices

Reading file min.tpr, VERSION 2016-dev-20160510-e7e35d3-unknown (single precision)
Using 1 MPI process
Using 8 OpenMP threads 

Steepest Descents:
   Tolerance (Fmax)   =  1.00000e+01
   Number of steps    =        50000
Step=    0, Dmax= 1.0e-02 nm, Epot=  6.65807e+02 Fmax= 9.95624e+04, atom= 1629
Step=    1, Dmax= 1.0e-02 nm, Epot= -1.22684e+04 Fmax= 4.18033e+04, atom= 1629
Step=    2, Dmax= 1.2e-02 nm, Epot= -2.15497e+04 Fmax= 1.91245e+04, atom= 1629
Step=    3, Dmax= 1.4e-02 nm, Epot= -2.70253e+04 Fmax= 7.58158e+03, atom= 853
Step=    4, Dmax= 1.7e-02 nm, Epot= -3.11160e+04 Fmax= 2.97013e+03, atom= 853
Step=    5, Dmax= 2.1e-02 nm, Epot= -3.47536e+04 Fmax= 1.27441e+03, atom= 2386
Step=    6, Dmax= 2.5e-02 nm, Epot= -3.82821e+04 Fmax= 7.38159e+02, atom= 2234
Step=    7, Dmax= 3.0e-02 nm, Epot= -4.05334e+04 Fmax= 3.61420e+03, atom= 525
Step=    8, Dmax= 3.6e-02 nm, Epot= -4.12646e+04 Fmax= 1.10815e+03, atom= 525
Step=    9, Dmax= 4.3e-02 nm, Epot= -4.21067e+04 Fmax= 6.32448e+03, atom= 525
Step=   10, Dmax= 5.2e-02 nm, Epot= -4.24086e+04 Fmax= 1.38419e+03, atom= 525

step 11: One or more water molecules can not be settled.
Check for bad contacts and/or reduce the timestep if appropriate.

Wrote pdb files with previous and current coordinates
[cfe808fa184c:19480] *** Process received signal ***
[cfe808fa184c:19480] Signal: Segmentation fault (11)
[cfe808fa184c:19480] Signal code: Address not mapped (1)
[cfe808fa184c:19480] Failing at address: 0xfffffffe02618420
[cfe808fa184c:19480] [ 0] /usr/bin/../lib/libc.so.6(+0x33310)[0x7f7e3ba78310]
[cfe808fa184c:19480] [ 1] /usr/bin/../lib/libgromacs.so.2(+0xe48af8)[0x7f7e3d7b3af8]
[cfe808fa184c:19480] [ 2] /usr/bin/../lib/libgromacs.so.2(+0xe4917a)[0x7f7e3d7b417a]
[cfe808fa184c:19480] [ 3] /usr/bin/../lib/../lib/libgomp.so.1(GOMP_parallel+0x3f)[0x7f7e39f1115f]
[cfe808fa184c:19480] [ 4] /usr/bin/../lib/libgromacs.so.2(_Z17nbnxn_put_on_gridP12nbnxn_searchiPA3_fiPfS3_iifPKiS2_iPiiP16nbnxn_atomdata_t+0x1269)[0x7f7e3d7b72a9]
[cfe808fa184c:19480] [ 5] /usr/bin/../lib/libgromacs.so.2(_Z19do_force_cutsVERLETP8_IO_FILEP9t_commrecP10t_inputreclP6t_nrnbP13gmx_wallcycleP14gmx_localtop_tP12gmx_groups_tPA3_fSE_P9history_tSE_SE_P9t_mdatomsP14gmx_enerdata_tP8t_fcdataPfP7t_graphP10t_forcerecP19interaction_const_tP11gmx_vsite_tSN_dS0_P9gmx_edsamii+0x1015)[0x7f7e3d7fa625]
[cfe808fa184c:19480] [ 6] /usr/bin/../lib/libgromacs.so.2(_Z8do_forceP8_IO_FILEP9t_commrecP10t_inputreclP6t_nrnbP13gmx_wallcycleP14gmx_localtop_tP12gmx_groups_tPA3_fSE_P9history_tSE_SE_P9t_mdatomsP14gmx_enerdata_tP8t_fcdataPfP7t_graphP10t_forcerecP11gmx_vsite_tSN_dS0_P9gmx_edsamii+0x272)[0x7f7e3d7fb6a2]
[cfe808fa184c:19480] [ 7] /usr/bin/../lib/libgromacs.so.2(_ZN3gmx8do_steepEP8_IO_FILEP9t_commreciPK8t_filenmPK16gmx_output_env_tiiP11gmx_vsite_tP10gmx_constriP10t_inputrecP10gmx_mtop_tP8t_fcdataP7t_stateP9t_mdatomsP6t_nrnbP13gmx_wallcycleP9gmx_edsamP10t_forcereciiiffimP23gmx_walltime_accounting+0x5f9)[0x7f7e3d7a8049]
[cfe808fa184c:19480] [ 8] gmx(_ZN3gmx8mdrunnerEP12gmx_hw_opt_tP8_IO_FILEP9t_commreciPK8t_filenmPK16gmx_output_env_tiiPiiiffPKcfSE_SE_SE_SE_iliiiiiifffim+0x1870)[0x430930]
[cfe808fa184c:19480] [ 9] gmx(_Z9gmx_mdruniPPc+0x1650)[0x417bf0]
[cfe808fa184c:19480] [10] /usr/bin/../lib/libgromacs.so.2(_ZN3gmx24CommandLineModuleManager3runEiPPc+0x324)[0x7f7e3caff3f4]
[cfe808fa184c:19480] [11] gmx(main+0x8c)[0x40c67c]
[cfe808fa184c:19480] [12] /usr/bin/../lib/libc.so.6(__libc_start_main+0xf1)[0x7f7e3ba65741]
[cfe808fa184c:19480] [13] gmx(_start+0x29)[0x40c759]
[cfe808fa184c:19480] *** End of error message ***
Segmentation fault (core dumped)

This is in an Arch Linux docker container, but I have the same problem in a normal installation.

min.tpr (208 KB) min.tpr James Barnett, 05/14/2016 06:04 PM
min.mdp (368 Bytes) min.mdp James Barnett, 05/27/2016 10:39 PM
topol.top (139 Bytes) topol.top James Barnett, 05/27/2016 10:39 PM
conf.gro (384 KB) conf.gro James Barnett, 05/27/2016 10:39 PM

Associated revisions

Revision 41ce7792 (diff)
Added by Berk Hess over 3 years ago

Handle constraint errors with EM

All energy minimizers could fail with random errors when constraining
produced NaN coordinates.
Steepest descents now rejects steps with a constraint error.
All other minimizer produce a fatal error with the suggestion to use
steepest descents first.

Fixes #1955.

Change-Id: Ie2f7ad4039634d3c5f2597171ec47d6a145c5fcb

History

#1 Updated by Gerrit Code Review Bot over 3 years ago

Gerrit received a related patchset '1' for Issue #1955.
Uploader: James "Wes" Barnett ()
Change-Id: Idad6bfb340160c96aa979943aee32c3b909940a5
Gerrit URL: https://gerrit.gromacs.org/5860

#2 Updated by James Barnett over 3 years ago

The patch above actually didn't fix it after some more testing. This error occurs when the SETTLE error happens. Using -DFLEXIBLE the simulation runs fine and then I can read in the final configuration and remove -DFLEXIBLE to bypass the problem.

#3 Updated by Mark Abraham over 3 years ago

Can you share your input files please? It would be good to find where the segfault happens.

#4 Updated by James Barnett over 3 years ago

My input file is attached. Also, if I unset "NDEBUG" for src/gromacs/mdlib/nbnxn_grid.cpp I get the following error:

-------------------------------------------------------
Program: gmx mdrun, version 2016-beta1-dev-20160513-1c83c6c-dirty
Source file: src/gromacs/mdlib/nbnxn_grid.cpp (line 1185)
MPI rank: 1 (out of 8)

Fatal error:
grid cell cx -2147483648 cy -2147483648 out of range (max 2 5)
atom -nan -nan -nan, grid->c0 0.000000 2.000000

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

#5 Updated by Erik Lindahl over 3 years ago

Please upload separate gro/mdp/top files ; debugging is much easier when we can alter the system and settings!

#6 Updated by Erik Lindahl over 3 years ago

  • Status changed from New to Feedback wanted

#7 Updated by Erik Lindahl over 3 years ago

  • Target version set to 2016

#8 Updated by James Barnett over 3 years ago

Attached.

#9 Updated by Berk Hess over 3 years ago

  • Category set to mdrun
  • Status changed from Feedback wanted to In Progress
  • Assignee set to Berk Hess

The issue here is that the steepest descent update produces a configuration that can not be constrained, but the constraint error is ignored. Instead we should check, for all minimizers, if the constraining succeeded. If not, we should not calculate energies, but mark the step as unaccepted right away (at least for steep, for other minimizers we might need to generate a fatal error).

#10 Updated by Gerrit Code Review Bot over 3 years ago

Gerrit received a related patchset '1' for Issue #1955.
Uploader: Berk Hess ()
Change-Id: Ie2f7ad4039634d3c5f2597171ec47d6a145c5fcb
Gerrit URL: https://gerrit.gromacs.org/5948

#11 Updated by Berk Hess over 3 years ago

  • Status changed from In Progress to Fix uploaded

#12 Updated by Berk Hess over 3 years ago

  • Status changed from Fix uploaded to Resolved

#14 Updated by Mark Abraham over 3 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF