Bug #1306

segfault with an otherwise stable system when I turn on FEP (complete decoupling)

Added by Chris Neale over 3 years ago. Updated 9 months ago.

Target version:
Affected version - extra info:
4.6.3 and 4.6.1 (4.6.1 requires the UB angles changed to type-1)
Affected version:


In relation to my post:

I have a system with water and a drug (54 total atoms; 27 heavy atoms). The system is stable when I simulate it for 1 ns. However, Once I add the following options to the .mdp file, the run dies after a few ps with a segfault.

free-energy = yes
init-lambda = 1
couple-lambda0 = vdw-q
couple-lambda1 = none
couple-intramol = no
couple-moltype = drug

I do not get any step files or any lincs warnings. If I look at the .xtc and .edr files, there is no indication of something blowing up before the segfault. I have also verified that the drug runs without any problem in vacuum. I get the same behaviour if I remove constraints and use a timestep of 0.5 fs. The segfault is reproducible with v4.6.1 and v4.6.3. I am using the charmm FF, but I converted all UB angles in my drug to type-1 angles and still got the segfault. I also get the segfault with particle decomposition and/or while running a single thread. I am currently using the SD integrator, but I get the same segfault with md and md-vv. Couple-intramol=yes doesn't resolve it, neither does using separate T-coupling groups for the water and drug. Neither does turning off pressure coupling.

Here is the .mdp file that works fine, but gives me a segfault when I add the free energy stuff (above):

constraints = all-bonds
lincs-iter = 1
lincs-order = 6
constraint_algorithm = lincs
integrator = sd
dt = 0.002
tinit = 0
nsteps = 100000
nstcomm = 1
nstxout = 0
nstvout = 0
nstfout = 0
nstxtcout = 500
nstenergy = 500
nstlist = 10
nstlog=0 ; reduce log file size
ns_type = grid
vdwtype = cut-off
rlist = 0.8
rvdw = 0.8
rcoulomb = 0.8
coulombtype = cut-off
tc_grps = System
tau_t = 1.0
ld_seed = -1
ref_t = 310
gen_temp = 310
gen_vel = yes
unconstrained_start = no
gen_seed = -1
Pcoupl = berendsen
pcoupltype = isotropic
tau_p = 4
compressibility = 4.5e-5
ref_p = 1.0

I do realize that some of these settings are not ideal for a production run. I started with the real Charmm cutoffs + PME, etc, (which also gives the segfault) but this is what I am using right now for quick testing.

The only thing keeping me from filing a redmine issue is that if I remove my drug and do the FEP on one of the water molecules (using the FEP code listed above), I have no segfault. Therefore it is clearly related to the drug, whose parameters I built so I may have caused the problem somehow. Nevertheless, the drug runs fine in water and in vacuum without the FEP code, so I can't imagine what could be causing this segfault (also, the fact that it's a segfault means that I don;t get any useful info from mdrun as to what might be going wrong).

I have attached all of the files necessary to reproduce this segfault. I have already run the script so that you can see the output that I got. If you would like, I can also upload the files for the version of this test in which I changes all UB angles to type-1 angles (I still got the segfault there).

Thank you,

gromacs_redmine.tgz (936 KB) Chris Neale, 07/18/2013 11:48 PM

Associated revisions

Revision 5180a944 (diff)
Added by Erik Lindahl over 1 year ago

Fix single-prec. softcore issue in 4.6 branch

This is a backport of 412e2974 which we likely
forgot to put in 4.6. It primarily fixes
numerical issues in single precision for
sc_power==48, but it can also improve stability
of lower-power softcore free energy.

This should not be merged into 5.0 or master.

Refs #1580. Fixes #1306.

Change-Id: I5506d714214f0eb7e0517a0eda8d74cdba816dab


#1 Updated by Chris Neale over 3 years ago

Summary of this post:

I can avoid the segfault with:
couple-lambda0 = none
couple-lambda1 = none

More details:

Here are some more conditions where I get a crash:

All using:
free-energy = yes
couple-lambda0 = vdw-q
couple-lambda1 = none
couple-intramol = no
couple-moltype = BI167107

I get a segfault with also:
init-lambda = 1

coul-lambdas = 1
vdw-lambdas = 1
fep-lambdas= 0

coul-lambdas = 0 1
vdw-lambdas = 0 1
fep-lambdas= 1 0

Also, it still gives the segfault but runs longer before it does if I change:

couple-lambda0 = vdw-q


couple-lambda0 = vdw

And it is even more stable with

couple-lambda0 = none

In fact, I don't get the segfault within 200 ps (repeated twice) if I use these parameters (with grompp -maxwarn 1):

free-energy = yes
couple-lambda0 = none
couple-lambda1 = none
couple-intramol = no
couple-moltype = BI167107

I am running a longer simulation to confirm that this is stable and will post again if it crashes.

My guess is that the simulations at couple-lambda1 are checking the energies/forces at couple-lambda0 and that this is leading to a segfault. I tried to confirm that this is the problem by going back to

couple-lambda0 = vdw-q

and also adding

nstdhdl = 10000000
dhdl-derivatives = no
separate-dhdl-file = no

But I do still get a segfault with those settings, so perhaps it is more complicated.

#2 Updated by Chris Neale over 3 years ago

I still get the segfault when using double precision.

(actually, double precision does not segfault with the uploaded files)

#3 Updated by Chris Neale over 3 years ago

Might be related to a segfault that I get with the system at

which gives me a segfault in a single lambda run (no expanded ensemble / hamiltonian exchange) and can also be avoided by either turning off the free energy section in the .mdp file or by setting couple-lambda0 = none

#4 Updated by Chris Neale over 3 years ago

Looks like this is solved by either:

(a) using double precision

(b) setting the following mdp parameters:
sc-alpha = 0.5
sc-power = 1
sc-r-power = 6

(the default values, which appear to include sc-alpha = 0 , give a segfault with single precision but not double precision).

#5 Updated by Carsten Kutzner over 3 years ago

could it be that one atom of the B state gets on top of an A state atom just before the crash? Then 1/distance could become larger than the largest possible number in single precision. When the potential and forces are assembled from A and B states, it does not matter what is in the B state terms since they are multiplied with zero anyway -- unless there is an INF in the B state, in which case you get NaN forces! You could check for INFs and NaNs in the individual variables in the "Assemble A and B states" section in nb_free_energy.c. In double precision the chance of getting INF is obviously a lot smaller.

#6 Updated by Erik Lindahl almost 2 years ago

  • Status changed from New to Accepted

This appears to have been fixed already in release-5-0, but it still gives a problem for release-4.6.

#7 Updated by Gerrit Code Review Bot over 1 year ago

Gerrit received a related patchset '1' for Issue #1306.
Uploader: Erik Lindahl ()
Change-Id: I5506d714214f0eb7e0517a0eda8d74cdba816dab
Gerrit URL:

#8 Updated by Erik Lindahl over 1 year ago

  • Status changed from Accepted to Fix uploaded

Note that this has already been fixed in the 5.0 branch.

#9 Updated by Erik Lindahl over 1 year ago

  • Status changed from Fix uploaded to Resolved
  • % Done changed from 0 to 100

#10 Updated by Rossen Apostolov over 1 year ago

  • Status changed from Resolved to Closed

#11 Updated by Mark Abraham 9 months ago

  • Target version changed from future to 5.0.7

Also available in: Atom PDF