Project

General

Profile

Bug #2717

mdrun runs infinitely when checkpoint file is beyond the designated end point

Added by zhiyi wu 12 months ago. Updated 11 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Say I have a tpr file which I say in the mdp file for it to runs for 50 ns and I stopped the simulation at 31 ns.
If I then use convert-tpr to change the end time to 30 ns and use mdrun to continues the simulation,
The following message will be generated and the simulation runs forever.

WARNING: This run will generate roughly 4827577325564461056 Mb of data

starting mdrun 'GkApcT YneM POPE POPG in water'
15000000 steps, infinite ps (continuing from step 15083300, 30166.6 ps).

It will be nice if the mdrun checks if the current time is longer than the end time and either write gro file and quit or quit with a non-zero exit state.


Related issues

Related to GROMACS - Task #1781: re-design benchmarking functionalityAccepted
Related to GROMACS - Task #2569: announce deprecations in GROMACS 2019Closed
Related to GROMACS - Bug #2757: mdrun refuses to start with .cpt if nsteps is -1 in .tprClosed

Associated revisions

Revision 4dcb2a1a (diff)
Added by Paul Bauer 11 months ago

Issue fatal error if checkpoint does not suit the .tpr

If the step number in the checkpoint is out of range for that
described in the .tpr, then the user has provided mismatching
inputs. We do not intend then to be able to address this with the
mdrun -nsteps option.

Fixes #2717

Change-Id: I827bdc1b92ee69bf6287e2fd552ace7583b62028

Revision 61285613 (diff)
Added by Mark Abraham 11 months ago

Fix checkpoint restart of tpr with infinite step count

The recent fix of #2717 did not account for the way a user's .tpr file
can require an infinite number of steps by using the special value
-1. Such special values are difficult to remember when maintaining the
code, so we should tend to avoid introducing them.

Fixes #2757

Change-Id: I6570c4f4e7d63b2375dbb595a514c9e709f18856

History

#1 Updated by Mark Abraham 12 months ago

Thanks for the report, we'll look at fixing it.

#2 Updated by Gerrit Code Review Bot 12 months ago

Gerrit received a related patchset '1' for Issue #2717.
Uploader: Paul Bauer ()
Change-Id: gromacs~release-2018~I827bdc1b92ee69bf6287e2fd552ace7583b62028
Gerrit URL: https://gerrit.gromacs.org/8618

#3 Updated by Eric Irrgang 12 months ago

This will also resolve the confusing situation in which the `nsteps` command-line option is used to extend a simulation past the TPR end-point and then not used in a subsequent run. But shouldn't this be an error condition? If it just succeeds quietly with more steps than are specified in the TPR file, then the trajectory artifacts are not what was specified in the TPR file (they are longer).

#4 Updated by Erik Lindahl 12 months ago

I think we've had similar discussions a number of times in various redmines, and the core problem is that we still allow for half-a-dozen different ways to specify the number of steps or continue simulations. Each of them seems very useful in isolation, but as these repeated bugs show it leads to a combinatorial explosion of start/stop conditions that we are simply not able to handle.

While Paul's patch fixes the immediate error of this bug, IMHO the solution in master branch should not focus on adding more checks, but removing both convert-tpr and the nsteps option :-)

#5 Updated by Mark Abraham 12 months ago

Erik Lindahl wrote:

I think we've had similar discussions a number of times in various redmines, and the core problem is that we still allow for half-a-dozen different ways to specify the number of steps or continue simulations. Each of them seems very useful in isolation, but as these repeated bugs show it leads to a combinatorial explosion of start/stop conditions that we are simply not able to handle.

Then please also comment on the appropriate places, e.g. #1781 and #2569.

While Paul's patch fixes the immediate error of this bug, IMHO the solution in master branch should not focus on adding more checks, but removing both convert-tpr and the nsteps option :-)

I have long agreed that mdrun -nsteps must go. At #2569, I proposed gmx benchmark. It won't support checkpointing, so nsteps can go live there if people want. But the people who want benchmarking features don't seem to write patches that make mdrun more usable, so if it just gets left up to me (as has been the case for years now, see #1781) then probably gmx benchmark won't support anything other than running a .tpr :-)

I think convert-tpr -nsteps is appropriate to keep while the .tpr format is not otherwise editable. See e.g. https://redmine.gromacs.org/issues/1781#note-38 and discussion. If .tpr becomes JSON then the need for these things goes away.

#6 Updated by Mark Abraham 11 months ago

  • Related to Task #1781: re-design benchmarking functionality added

#7 Updated by Mark Abraham 11 months ago

  • Related to Task #2569: announce deprecations in GROMACS 2019 added

#8 Updated by Paul Bauer 11 months ago

  • Status changed from New to Resolved

#9 Updated by Mark Abraham 11 months ago

  • Related to Bug #2757: mdrun refuses to start with .cpt if nsteps is -1 in .tpr added

#10 Updated by Paul Bauer 11 months ago

  • Status changed from Resolved to Closed

#11 Updated by Gerrit Code Review Bot 11 months ago

Gerrit received a related patchset '1' for Issue #2717.
Uploader: Mark Abraham ()
Change-Id: gromacs~release-2018~I6570c4f4e7d63b2375dbb595a514c9e709f18856
Gerrit URL: https://gerrit.gromacs.org/8708

Also available in: Atom PDF