Discrepancies between 3.3.X and 4.0.X with energy/temperature drifts and premature termination of MD runs in 4.0.X
Gromacs 4.0.3 and 4.0.4 versions fail to reproduce MD equilibration runs on two
different systems (proteins solvated in SPC water + counterions), on which 3.3.1 or 3.3.3 succeeded (the latter compiled on the same cluster as 4.0.X, using Intel compiler, double precision and OpenMPI parallel support).
On one system (System1) all 4.0.X simulations terminated after about 2,000 steps, showing large fluctuations in temperature and a drift in total energy, until the system explodes (errors in LINCS or 1-4 energy calculation).
On the other system (System2) the simulation terminated without errors, but exhibited temp fluctuations and energy drift similar (only with lower intensity) to those observed in System1, that did not occurred in the corresponding runs performed with 3.3.1 or 3.3.3 on both systems.
The same problems where observed using either:
a) serial or OpenMPI with Infiniband support parallel versions of 4.0.X;
b) single- or double-precision;
c) GCC or Intel compilers;
d) old- (generated with Gromacs 3.3.1 for 8 cores) or new-version tpr topology files.
Only the exact number of time steps before termination changes among the different cases. Parallel calculations were run on 8 cores (corresponding to a single cluster node).
The cluster on which both 4.0.X and 3.3.3 Gromacs version were compiled and run is formed by Dual-CPU Quad-core Opteron nodes with Infiniband connectivity, and has the following configuration:
Gromacs 4.0.4 / 4.0.3
/ gcc 4.1.2 20070626 (Red Hat 4.1.2-14)
\ icc 10.1 (Build 20070913 Pack.ID: l_cc_p_10.1.008)
ofed131 - openmpi 1.2.6
The attached compressed tar file xbug.tar.gz contains:
New (4.0.3) format tpr file for system1
Old (3.3.1) format tpr file for system1 prepared for 8 cores
Old (3.3.1) format tpr file for system2 prepared for 8 cores
Log file of a 4.0.4 failed run on System1 (Intel,serial,double-prec)
nohup.log file corresponding to d), contains warnings and errors
Log file of a 3.3.3 successful run on System2 (Intel,parallel,double-prec)
Log file of a 4.0.4 completed run on System2 (Intel,parallel,double-prec)
Checkpoint file produced when 4.0.4 run corresponding to d) log crashed
Final checkpoint file for 4.0.4 run corresponding to g) log
Final pdb file for 3.3.3 run corresponding to f) log
#2 Updated by Pietro Amodeo over 10 years ago
1) Both for ingle prec versions configured with --enable-sse --enable-shared , double prec ones with --enable-sse2 --enable-shared options.
2) Starting energies do not exhibit substantial differences between 3.3.X and 4.0.X runs in all simulations/systems.
#3 Updated by Berk Hess over 10 years ago
My guess would be that this is not a bug, but the result
of a bug fix in 4.0.
In 3.3 and older version tau_p was scaled with the pressure
factor, which is 16.6.
I have corrected this in 4.0, which is also described in the release notes:
So to get to the same results in 4.0 as in 3.3, you should multiply
tau_p with 16.6: 0.5*16.6 = 8.3 ps.
Please try this and report back if this solved the problem or not.
#4 Updated by Pietro Amodeo over 10 years ago
As correctly guessed by Berk, instabilities in 4.0.X trajectories derived from tau_p value, that must be scaled to recover the 3.3.3 behaviour.
I'm sorry, but, although I read about the change in the release notes when 4.0 was released, I forgot this change months later when resubmitting a 3.3.X run!
Maybe a warning in grompp or mdrun or something like a "Major changes from previous release" section somewhere in Manuals could help in preventing such problems.