Project

General

Profile

Bug #2881

mdrun -nsteps option does not work correctly

Added by Berk Hess 6 months ago. Updated 4 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

I have a run with data up till and checkpoint at 200000 steps. Running mdrun -cpi -nsteps 300000 gives:
Program: gmx mdrun, version 2019.1-dev-20190124-085e8c261
Source file: src/gromacs/fileio/checkpoint.cpp (line 2775)

Fatal error:
The input requested 100000 steps, however the checkpoint file has already
reached step 200000. The simulation will not proceed, because either your
simulation is already complete, or your combination of input files don't
match.

So it first seems to check for: nsteps_requested - step_checkpoint < nsteps_requested
which doesn't make sense

Associated revisions

Revision abf2b0c7 (diff)
Added by Berk Hess 4 months ago

Make mdrun -nsteps work again.

The mdrun -nsteps option was only functional in some cases,
because it modified ir->nsteps after the checkpoint loading
modified it.
Note that this option is deprecated, but is should either work
or be removed.

Fixes #2881

Change-Id: Ib611b0e007171f8a351f61f3ecc9b1bda5248486

History

#1 Updated by Berk Hess 6 months ago

My guess was wrong.
It actually seems to check nsteps_tpr < step_checkout
which doesn't make sense and never allows -nsteps to work when you reached nsteps_tpr.

#2 Updated by Mark Abraham 6 months ago

  • Assignee deleted (Mark Abraham)

gmx convert-tpr -nsteps is the supported way to do this, which the error message could be improved to recommend.

As nobody has yet written down what nsteps is supposed to do in concert with the dozen things that it has to work well with, and written tests for it, we can't implement or maintain this feature. And we already deprecated it at #2569.

#3 Updated by Berk Hess 6 months ago

Sure, but then the feature should be removed instead of giving inconsistent error messages.

#4 Updated by Gerrit Code Review Bot 6 months ago

Gerrit received a related patchset '1' for Issue #2881.
Uploader: Berk Hess ()
Change-Id: gromacs~release-2019~Ib611b0e007171f8a351f61f3ecc9b1bda5248486
Gerrit URL: https://gerrit.gromacs.org/9277

#5 Updated by Berk Hess 6 months ago

  • Status changed from New to Fix uploaded
  • Assignee set to Berk Hess

I pushed up a fix.

I think that the description on the -nsteps option is clear from a user point of the view. The fundamental issue is that the checkpointing code modifies init_step and nsteps. That makes it difficult to reason about any meaning of these parameters in the code.

#6 Updated by Erik Lindahl 5 months ago

While the fix might be reasonable for 2019, I think this is a strong indication it's time to remove it entirely from master, because history says we are not smart enough to remember all the special cases :-)

#7 Updated by Mark Abraham 5 months ago

mdrun documents -nsteps as "Run this number of steps, overrides .mdp file option". This is slightly ambiguous, as it could refer to

1) the number of steps this invocation of mdrun should perform from the checkpoint configuration, or
2) it could mean to replace the number of steps described .tpr with the new value and simulate from the step in the checkpoint configuration up to the new required number of steps.

This ambiguity only arises when -cpi is used. Berk's bug report implies he intends the latter interpretation.

The failing test uses mdrun -nsteps 2 -cpi with a checkpoint file that will normally done more than 2 steps. When I wrote the test, I intended it to do two more steps ie. the former interpretation.

Note that if Berk had implemented his intent to do 100000 more steps according to the former interpretation (ie. with mdrun -nsteps 100000 -cpi), then he still would have met the error, so the code does not conform to my interpretation regardless of the value passed to mdrun -nsteps. Berk's fix works for the latter interpretation regardless of the size of mdrun -nsteps.

I think we should clarify the documentation regardless. If we want Berk's interpretation, then I suggest "Run this total number of steps, overrides .mdp nsteps" and to remove the call to -nsteps on line 83 of src/programs/mdrun/tests/terminationhelper.cpp

#8 Updated by Paul Bauer 4 months ago

What should we do now with the linked patch?

#9 Updated by Berk Hess 4 months ago

  • Status changed from Fix uploaded to Resolved

#10 Updated by Mark Abraham 4 months ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF