Project

General

Profile

Bug #96

intermediate restarts of simulations dont match

Added by no name over 13 years ago. Updated about 12 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

When I restart a simulation from an intermediate step of a finished simulation
I get different outcomes (trajectories, after the first common step). This was
evident from comparing the center of mass (COM) trajectories of a micelle in
the original and intermediate runs. I have tried both tpbconv and grompp w/ -
time option (always using full precision .trr and .edr files) to generate the
intermediate .tpr file for mdrun and both methods yield the exact same wrong
result.

I think I isolated the problem to the neighbor list update frequency. When
both original and intermediate runs had its nstlist=1, the micelle COM
trajectories matched exactly. When nstlist=10, the problem persisted. Also,
increasing rlist cutoff under grid search does not help when nstlist=10, as I
tried 1.2, 2.4, 3.6 . (the cutoff radius for all forces is 1.2). The
frequency to save coordinates, velocities, forces, energies, were multiples of
10, and I restarted at a step when such info was available. So the restart
should have the same neighbor list at its 1st step as the original at the
restart step, and thus should have the same result?

In my simulations I use Berendsen T coupling (1.0 at 300k), no P coupling or
other constraints, and all forces are cutoff at 1.2 . The COM is read
from .trr files for the micelle using a non-gromacs code that accounts for PBC.

topol.tpr (1.41 MB) topol.tpr .tpr for original run no name, 08/28/2006 06:09 PM
topoli.tpr (1.41 MB) topoli.tpr .tpr for intermediate no name, 08/28/2006 06:23 PM

History

#1 Updated by Berk Hess about 13 years ago

First I have to say that this should have been posted
to the gmx-developers mailing list. As you use modified
code this is not a Gromacs bug.

At neighborsearch steps not only the ns list is made,
but before that all charge groups are put into the box.
Therefore your restart results could be different with different nstlist,
depending on what, and especially where, you have made
modifications to the code.

From you report it is unclear what you are doing.
Are you modfying coordinates in Gromacs?
If so which coordinates?
Where in the code exactly are you doing this?

Berk.

#2 Updated by no name about 13 years ago

Berk,
Thanks for your attention. I have not made any modification to the Gromacs
code. I am simply restarting a simulation from an intermediate step. I
analyze the two output .trr files (from the original and the restart) for the
center of mass trajectories of one group (in my case a micelle) and I find that
they differ. This is not what I expected, given that two simulations with
exactly the same input information should have the same trajectories for a
relatively long time.
The 1st restart step of the simulation coincides with where the original
simulation updates the neighbor list, so both should have the same neighbor
list for every atom.
As an example, the original simulation runs from step 0 to 1500, with
nstlist=10, nst-x,v,f,energy-out=30 . The restart simulation starts at step
300 with the same nst parameters.

(In reply to comment #1)

First I have to say that this should have been posted
to the gmx-developers mailing list. As you use modified
code this is not a Gromacs bug.
At neighborsearch steps not only the ns list is made,
but before that all charge groups are put into the box.
Therefore your restart results could be different with different nstlist,
depending on what, and especially where, you have made
modifications to the code.
From you report it is unclear what you are doing.
Are you modfying coordinates in Gromacs?
If so which coordinates?
Where in the code exactly are you doing this?
Berk.

#3 Updated by Berk Hess about 13 years ago

OK, before I did not understand what you meant with COM trajectories.

Gromacs should give reproducible results in most cases,
however with some options it does not work and you also
have to prepare your tpr file appropriately.

Which Gromacs version is this?

Did you use grompp or tpbconv to create the restart tpr file?

Could you attach the original tpr file?

#4 Updated by Berk Hess about 13 years ago

Some more questions:
Are you running single cpu or in parallel?
Runs are only reproducable on the same number of cpu's.

Are you using PME?
If so, are have you compiled with fftw2 or fftw3?

#5 Updated by no name about 13 years ago

Created an attachment (id=69)
.tpr for original run

w/ comment #5

#6 Updated by no name about 13 years ago

Created an attachment (id=70)
.tpr for intermediate

Sorry for the confusion here. I also replied to your hotmail re: comment 3-4,
I dont know if it came through so I am replying here as well.

I am using Gromacs version 3.3, my OS is Linux, and the hardware
is x86-64. All the tests I did were on a single CPU that is part
of a 2 CPU node. I am not using PME (shift functions for coulomb
and vdw type). The .tpr file for an original run is on comment 5, and the
intermediate .tpr is here. For this test I
used grompp to make the restart .tpr, using the following inputs:
grompp -f -c -p -po -t traj.trr -e ener.edr -time 9.0
where ener.edr and traj.trr were from the original.

I have also tried using tpbconv and it gave the same exact wrong
result as with grompp.

#7 Updated by Berk Hess about 13 years ago

I was on holiday.

In the meantime I have found the source of the restart problem.
There are actually two different problems that can prevent
exact restarts.

Unfortunately solving both problem requires quite some changes
to the code and for one problem I (currently) don't now a good
solution, as it seems to require an extra communication step
when running in parallel, which is not what we want.

So I think we will not fix this for Gromacs 3.3.2,
but we will try to come up with a solution for 4.0.

What I have not understood is why your system consistently
fails to produce exact restarts, whereas it works correctly
for all the systems I have ever tried.

Berk.

#8 Updated by David van der Spoel about 12 years ago

Can we put this bug to sleep after one year?

#9 Updated by Berk Hess about 12 years ago

I would like to have it fixed at some point,
so it should not be removed.

I'll change the status to later.

Berk.

Also available in: Atom PDF