PME tuning incorrect with the group scheme
Below is my original post to GMX-USERS (Subject: System volume "jumps" on exact continuations).
There have been subsequent posts at
that may be worth reading. They may suggest the problem is in PP-PME tuning and that Gromacs version 4.5.3 may not be affected.
I've attached the files needed to reproduce the data in the 3rd row (6kbar) of this image
Here are the commands I ran:
gmx grompp -f eq4.mdp -c em3.pdb -o eq4.tpr -p topol.top -nice 0
mpiexec gmx_mpi mdrun -v -deffnm eq4 -c eq4.pdb -nice 0 >& myjob
(12 nodes, 144 cores, PP-PME tuning)
See files/folders: eq4.mdp em3.pdb topol.top posre.itp charmm22start.ff/ eq4.cpt eq4.edr eq4.log eq4.pdb eq4.tpr
gmx grompp -f 20ns.mdp -c eq4.pdb -o 20ns.tpr -t eq4.cpt -p topol.top -nice 0
mpiexec gmx_mpi mdrun -v -deffnm 20ns -c 20ns.pdb -nice 0 >& myjob
(4 nodes, 48 cores, PP-PME tuning)
See files: 20ns.mdp 20ns.cpt 20ns.edr 20ns.log 20ns_prev.cpt 20ns.tpr
gmx convert-tpr -s 20ns.tpr -extend 20000 -o 20-40ns.tpr -nice 0
mpiexec gmx_mpi mdrun -v -deffnm 20-40ns -c 20-40ns.pdb -nice 0 -noappend -cpi 20ns.cpt >& myjob
(2 nodes, 24 cores, PP-PME tuning)
See files: 20-40ns.cpt 20-40ns.part0002.edr 20-40ns.part0002.log 20-40ns_prev.cpt 20-40ns.tpr
gmx convert-tpr -s 20-40ns.tpr -extend 20000 -o 40-60ns.tpr -nice 0
mpiexec gmx_mpi mdrun -v -deffnm 40-60ns -c 40-60ns.pdb -nice 0 -noappend -cpi 20-40ns.cpt >& myjob
(2 nodes, 24 cores, no PP-PME tuning — I don’t know why.)
See files: 40-60ns.cpt 40-60ns.part0003.edr 40-60ns.part0003.log 40-60ns_prev.cpt 40-60ns.tpr
gmx convert-tpr -s 40-60ns.tpr -extend 20000 -o 60-80ns.tpr -nice 0
mpiexec gmx_mpi mdrun -v -deffnm 60-80ns -c 60-80ns.pdb -nice 0 -noappend -cpi 40-60ns.cpt >& myjob
(2 nodes, 24 cores, PP-PME tuning)
See files: 60-80ns.cpt 60-80ns.part0004.edr 60-80ns.part0004.log 60-80ns_prev.cpt 60-80ns.tpr
We are observing abrupt, discontinuous “jumps” in our simulation box volume for different chunks of trajectory when performing exact continuations of standard, explicit solvent, MD simulations in the NPT ensemble. Note that these volume jumps do not appear to occur after every single continuation (there seems to be some element of randomness); however, they occur at some point in almost all of our different simulations. There are no visually apparent problems with the trajectories upon viewing them in e.g., Pymol. We are not able to visualize anything unusual, such as “bubbles” for cases where the system volume jumps to larger values.
Here is what we have tested: * Temperature/Pressure Coupling Methods: Nosé-Hoover/Parrinello-Rahman and Berendsen/Berendsen are both affected * Gromacs Versions: 4.6, 4.6.1, 5.0, and 2016 are all affected * Machines/Architectures: three different clusters (we don’t know of a machine where it does not occur) are all affected: * the lab’s cluster where we installed Gromacs ourselves * the university cluster where Gromacs was installed by the university IT staff * the Hansen cluster at Purdue, which we access only through a GUI at the DiaGrid portal * Systems: Pure water systems as well as solvated single solute systems where the solute is a single protein, LJ sphere, or buckminsterfullerene are all affected * Force fields: versions of AMBER, CHARMM, GROMOS, and OPLS force fields are all affected * Box shape: cubic boxes and rhombic dodecahedrons are both affected * Continuation method: Restarts of killed simulations (killed on purpose to increase the numbers of nodes as resources became available -- we don't do this anymore, but this was how we first saw the problem) and exact continuations of completed simulations are both affected * Number of Nodes: We were almost convinced that it happened whenever we ran on >1 node (our usual situation), but not if we ran on 1 node only. We did some tests on one node on the lab’s cluster with and without MPI, to see whether or not it was MPI. System volume jumps were not (yet?) observed regardless of whether or not MPI was used. However, on the university cluster, we did tests of 24 processors using "-pe mpi-fill 24". Usually the 24 processors were spread across >1 node, but sometimes they completely filled one node. There was one instance where there was a system volume jump when the 24 processors changed from being distributed across >1 node to being on 1 node only. However, MPI was used in all of those simulations. So, we still have not proved that it is not MPI that is a problem. Unfortunately, this test result is still murky. Perhaps we should not have even mentioned it. * Cut-off Scheme: We have done significantly fewer simulations with the Verlet cut-off scheme; however, so far the system volume jumps have not occurred with Verlet. We are continuing these tests.
Here is how we do exact continuations:
gmx_mpi convert-tpr -s previous.tpr -extend 20000 -o next.tpr
mpiexec gmx_mpi mdrun -v -deffnm next -cpi previous.cpt -noappend >& myjob
Here<https://docs.google.com/document/d/1NUi3H7agEFMeEq5qxTNXWopaFXqIjUbHpLUqifVRVys/edit?usp=sharing> is an example (CHARMM22*-based) MDP file (only used for the initial production run).
We have performed the Gromacs regression tests (with one processor, >1 processor but only 1 node, and with more processors than will fit on one node) on the two machines that we have command line access to (lab and university clusters). Some of the regression tests reset the number of processors to 8 and thus are running on only one node. But, for what it is worth, all tests passed.
A linked Photobucket image<http://i1243.photobucket.com/albums/gg545/ploetz/box-volume_zps0foiwzs9.png> shows the same system ran for five 1-ns chunks with exact continuations on 2 machines: the university's or the lab's cluster. In Row 1, note that a particular system box volume is obtained on the university cluster with 1 node and MPI. No system volume jumps are noted. However, the average volume is not the same when ran on the lab’s cluster with 3 nodes, and a system volume jump occurs at 2ns (2nd Row). The system volume does at least roughly match that of the university cluster when ran on the lab's cluster with 1 node, both without (3rd Row) or with (4th Row) MPI.
Some may view these as small volume jumps, but they correspond to the volume of many water molecules (1 water has a volume of ~0.03 nm3). Often the volume jumps are larger than those shown in the linked figure (e.g., ~7 nm3).
When a jump occurs, the system stays at that new volume; it is not a "blip" that then re-adjusts to the original volume. These volume jumps seem suggestive of changes to the simulation settings or FF parameters occurring during the continuations.
We would greatly appreciate any advice. Uncertainty of the system volume prevents us from trusting any subsequent analysis.
Disable PME tuning with the group scheme
PME tuning with the group cut-off scheme does not work, as it did not
set fr->bTwinRange. Thus interactions between charge-group pairs at
distances between rlist and rcoulomb are missing. The search cost
does go up, so the PME tuning does not produce very large cut-off's
and grid spacings. This limits the error, but also the chance of
The fr->bTwinrange bug is easily fixed, but also the cut-off for
the domain decomposition communication is set incorrectly.
Since the code paths of the deprecated group scheme are too much
effort to fix, we simply disable PME tuning with the group scheme.
PME tuning is now completely disabled with the group scheme, and
#1 Updated by Berk Hess over 2 years ago
I commented on your mdp settings at #2199. I'll copy my comments here:
Your simulation settings are strange. They are extremely inefficient with group scheme, nstlist=1 and nstcalcenergy=1. But there are not very accurate either, since you are using a list buffer of only 0.05 nm. Your simulations would go much faster and be more accurate when using the group scheme with default settings. Futhermore, I see Cmap energy entries, so I assume you are using the CHARMM force field. That should officially be used with force switching with LJ up to 1.2 nm.
Why are you not using the more accurate and far more efficient Verlet cut-off scheme? Or have you tried that and observed the same issues? It could be that your issues are caused by the combined of the group scheme, too short pair list buffer and PME tuning.
#2 Updated by Berk Hess over 2 years ago
Looking at the PME tuning output in your log files, I see that the cut-off and PME grid are scaled by a factor 4.141. Normally we get factors between 1 and 1.4. But your extremely inefficient settings cause this large scaling factor. Such a coarse grid might also cause issues. We should probably put in a maximum scaling factor.
But again, I would suggest to use the Verlet cut-off scheme, as well as the proper cut-off setup for the CHARMM forcefield.
#3 Updated by Elizabeth Ploetz over 2 years ago
Thanks for your comments. I was trying to have accurate simulations, at the price of them being slow.
I switched most of my simulations to these strange settings (nstlist=nstcalcenergy=nsttcouple=nstpcouple=1) after this experience with polarizable simulations: https://redmine.gromacs.org/issues/1376 . Do you think it is actually wrong or just slow? Why would it not be accurate if there is no list and everything is updated every step?
Where are you seeing a list buffer of 0.05nm?
Yes, this is the DE Shaw modified version of CHARMM22*. According to the reference
S. Piana, K. Lindorff-Larsen, D.E. Shaw
Atomic-level description of ubiquitin folding
Proc. Natl. Acad. Sci. U. S. A., 110 (15) (2013), pp. 5915–5920
(page 5916, left hand side),
"Nonbonded interactions were truncated to 10.5 Å, and the Gaussian Split Ewald (GSE) method was used to account for long-range electrostatic interactions (60)."
I was trying to implement the force field as the authors did. Do you think it is wrong?
While we will be eventually switching to Verlet (due to the planned obsolescence of the Group scheme!), it is much slower for us -- we have a ton of explicit water in most of our systems (including those in this report)! Most of our current projects were started with Group, and we hope to finish them with Group. Of course, we want the most accurate simulations possible (which I guess means Verlet over Group), but switching means we cannot use a twin-ranged cut-off. This means we cannot actually implement the twin-range cut-off based force field we've been developing for about a decade (sorry, getting off topic from the issue at hand for this bug report!). So, we're in the process of seeing how switching to Verlet effects the force field we've been developing.
#4 Updated by Berk Hess over 2 years ago
The nst=1 settings you are using should not give wrong results.
What can affect the accuracy is the buffer of 0.05 nm. You have rcoulomb=rvdw=1, so a 1 nm interaction cut-off nad rlist=1.05, which thus gives a buffer of 1.05. In the (by now very obsolete and deprecated) group scheme there are charge groups, I think only the waters with CHARMM. And the diameter of a water molecule is much larger than 0.05 nm. With PME the effect of this is likely small, but could be measurable.
Another thing that might be an issue is the extreme PME grid scaling.
But from what I understand, you should be using rcoulomb=rvdw=1.05 to get the settings of the paper.
The Verlet scheme with nstlist=10 will be an order of magnitude faster than you current setup.
I don't understand what you mean with a twin-range cut-off based force-field. A force field should by itself have nothing to do with a twin-range setup. You could use a twin-range setup for any force field, if you code and simulation setup support it. GROMACS will not have a twin-range setup in the near future. But I don't see why you would want that. GROMACS support LJ-PME, so you could use no cut-off at all instead.
#7 Updated by Berk Hess over 2 years ago
Ah, I missed that.
But then you have a buffer of 0, which is bad with the group scheme. Then you might indeed get differences in pressure and volume with PME tuning. If you want the group scheme, use rlist=1.2 with potential-modifier=potential-shift for both vdw and coulomb.
But again, I would strongly suggest to use the Verlet cut-off scheme, for both accuracy and speed.
#9 Updated by Berk Hess over 2 years ago
With the group scheme there is a need for a buffer. The group scheme uses charge groups than can consist of mulitple atoms and they are put in the list based on distances between geometrical centers. A water molecule is a charge group. So for full accuracy, you need to set the buffer to the radius of a water molecule with nstlist=1.
But again, you should use the Verlet cut-off scheme to avoid all this mess.
#10 Updated by Elizabeth Ploetz over 2 years ago
Thanks for your help. I know your preferred solution is that we run with cutoff-scheme=Verlet. We have some new projects that are using Verlet, but older projects that are still ongoing have made use of the group scheme. So, we tested your other suggestion.
We tested cutoff-scheme=group with the defaults:
The other settings were the same as before.
We tested 4 runs of 2 ns each, with the continuations between each 2 ns chuck given by
gmx_mpi convert-tpr -s old.tpr -o new.tpr -extend 2000
mpiexec gmx_mpi mdrun -v -deffnm new -c new.pdb -cpi old.cpt -noappend -nice 0 >& myjob
and we still observed jumps.
These runs were with Gromacs 2016.
#15 Updated by Berk Hess over 2 years ago
- Subject changed from System volume "jumps" on exact continuations to PME tuning incorrect with the group scheme
- Status changed from In Progress to Fix uploaded
- Assignee set to Berk Hess
- Priority changed from Normal to High
- Target version set to 2016.4
In GROMACS 2016, when PME tuning with the group cut-off scheme, rlist is not updated. Thus interactions between charge-group pairs at distance between rlist and rcoulomb are missing. Since the non-bonded cost does not increase, PME tuning will push up rcoulomb and the grid spacing to the maximum possible.
GROMACS 5.1 or older still supported twin-range cut-offs with the group scheme. For those versions it seems that the Coulomb pair interactions between rlist and rcoulomb would be computed only at neighbor search steps and then updated correctly with multiple-time stepping. But the time step for these interactions is likely too large with the default nstlist=10. It seems that runs with 5.1 or before should be correct with nstlist=1, but I don't know if you can confirm this.
These issue are too complex to fix, considering that the group scheme is deprecated. As a solution I now pushed up a patch to the 2016 release branch that disables PME tuning with the group cut-off scheme.
#16 Updated by Elizabeth Ploetz over 2 years ago
As noted in my original post, we also observed the jumps in versions 4.6, 4.6.1, and 5.0 (and we had rlist=rcoulomb and nstlist=1). I would assume that other intermediate versions that we have not tested are also affected. I have a few old trajectories remaining from version 4.5.3, and I don't have any jumps in those; however, it is not a huge body of simulation data, unfortunately. See also [[https://email@example.com/msg26519.html]] .
#21 Updated by Berk Hess over 2 years ago
The issue for release 2016 is clear.
But I now realized that I have not found an issue with older versions yet. I have ran version 5.1 with group scheme nstlist=1 and >1 and pme tuning and have not observed issues (yet).
Could you attach at log file for a run with 5.0 that shows the issue?
#22 Updated by Berk Hess over 2 years ago
I think I have found the reason why 4.6, 5.0 and 5.1 are incorrect. The bug itself it slightly different, but the effect is exactly the same: all Coulomb interactions between charge groups at distance between rlist and rcoulomb are missing.
The issue in the code is that ir->rlistlong is not updated during PME tuning, only fr->rlonglist. This is strange, since it looks like this has always been the case, but someone must (or maybe better: should) have tested this code before it got into the distribution.
#23 Updated by Elizabeth Ploetz over 2 years ago
Sorry for the delay; I know you might not need these anymore. Here are two log and energy files (1-10ns, 10-20ns) from version 5.0 (system is native ubiquitin in water at 10kbar). There is a system volume jump between these two simulations, although it was an exact continuation. The "1-10ns" files should have been named "0-10ns," sorry.
#27 Updated by Berk Hess over 2 years ago
Some more details:
With 4.6, 5.0 and 5.1 the neighbor search actually does construct the additional long-range pair list with pairs between rlist and rcoulomb. Thus there is a cost to increasing the cut-off and spacing with PME tuning and the cut-off does not get very large. This is good in the sense that the artefacts will be more contained (and the group scheme already has some missing interactions when used in the usual setup without buffer). But it is bad in the sense that the issues might have easier gone unnoticed.
#29 Updated by Elizabeth Ploetz over 2 years ago
- with or without the twin-range cut-offs (I know twin-range is not supported in version 2016)
- with nstlist=nstcalcenergy=nsttcouple=nstpcouple=1 or with the defaults for these settings
as long as we use -notunepme, correct?
This should be the correct system volume (and energy, etc.) with all the interactions properly calculated, correct?
#30 Updated by Berk Hess over 2 years ago
You should never use PME tuning with the group scheme, in any version (but with the Verlet cut-off scheme it is fine).
Everything else works as intended.
Having said that, running any run with nstlist=1 is going to cost you a lot of performance. The group scheme still has some missing interactions close to the cut-off distance even with nstlist=1, as I tried to explain before (unless you use exact cut-offs and a pair-list buffer). The Verlet scheme has no such issues.