Project

General

Profile

Bug #3377

Parrinello-Rahman checkpoint restart fails using modular simulator

Added by Pascal Merz about 2 months ago. Updated about 1 month ago.

Status:
Closed
Priority:
High
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

The Parrinello-Rahman barostat using modular simulator does not allow restarts from checkpoint file.

Fatal error:
Cannot change a simulation algorithm during a checkpoint restart. Perhaps you
should make a new .tpr with grompp [...]

This error is thrown by the checkpoint loading routine. While the legacy implementation of the P-R barostat required the pressure at the previous step to be checkpointed, the modular implementation does not require this. load_checkpoint is, however, expecting this field to be present and throws an error.

While resolving this, another error was found. When initializing the modular simulator, the Parrinello-Rahman scaling might have happened one step too late under specific circumstances. Specifically, this could happen if the checkpoint was taken exactly on the scaling step, and later restarted from there. As checkpoint reading was not working, this is certain to never have happened, but needs to be fixed as well.

The steps to resolve this problem are as follows:

1. Move the decision on whether to use modular simulator before checkpoint reading. This change will be likely be useful very soon anyways, as checkpointing reading is planned to be modularized.
2. Fix the initialization of the scaling.
3. Fix the reading of checkpoints using Parrinello-Rahman.
4. Finally, this bug could only go unnoticed due to a lack of tests for the continuation using Parrinello-Rahman and md-vv (which was previously not implemented), so these need to be added.

Associated revisions

Revision 2078fd06 (diff)
Added by Pascal Merz about 2 months ago

Expose vsite counting

This allows to check whether vsites are present before the
respective object is created.

Refs #3377 (prepares point 1)

Change-Id: I8273daf38d46e2f052573f48323b5b6137965e9f

Revision 42ba62e1 (diff)
Added by Pascal Merz about 2 months ago

Move modular simulator decision before checkpoint loading

Currently, the decision on whether to use modular simulator is done
relatively late during the runner stage. This makes it impossible to
allow for different behavior at checkpoint loading time. The current
change therefore moves this decision before checkpoint loading time.
To achieve this, some adaptations were needed:

  • Use gmx_mtop_interaction_count to determine whether virtual sites
    will be used before the respective object is created.
  • The membrane embedding check via pointer is replaced by a boolean
    set earlier during the runner phase.
  • The essential dynamics check was split to catch command line inputs
    during the runner phase, and mismatching checkpointing data during
    the simulator phase (mirroring legacy behavior in do_md()).
  • Replace the ensemble restraint check by a low-level alternative
    for the early runner call (mimicking the distance restraint
    initialization), while keeping the current check for the
    simulator-level call. Note that as multi sims are disabled, this
    low-level test will effectively never fail, but the additional
    clarity is helpful in further development. The later test ensures
    that changes to the init_disres() don't make this check invalid -
    if they would ever get out of sync, the simulations would exit with
    a fatal error.

Refs #3377 (fixes point 1)

Change-Id: I635e033db51d6ecc8bf121c72730a121e04586dd

Revision 009ed957 (diff)
Added by Pascal Merz about 2 months ago

Fix Parrinello-Rahman scaling on initial step (modular simulator)

If Parrinello-Rahman scaling was requested on the first step, it was
not properly initialized. The setup routine would have correctly
(although non-obviously so) calculated the scaling matrix, but have
requested the propagator to use the scaling one step too late.

For new simulations, this never happens (since scaling happens on the
second step, not the first). It could, however, lead to slight
errors if restarting from a checkpoint occured exactly on a scaling
step. As restarting from Parrinello-Rahman simulations using modular
simulator was broken anyway, we can be sure that this has never
happened in practice.

This change fixes the bug, adds explanations of what happens on the
initial step, and makes the function calls more explicit (at the cost
of a very small amount of code duplication).

Refs #3377 (fixes point 2)

Change-Id: Ic3ba7ba078260a9d039d506fc0a87353f80d23dd

Revision ca8f9e41 (diff)
Added by Pascal Merz about 2 months ago

Fix reading of checkpoints with Parrinello-Rahman (modular simulator)

Using modular simulator, simulations using Parrinello-Rahman barostat
could not be read from checkpoint, throwing an error in the checkpoint
loading routine. While the legacy implementation of the P-R barostat
required the pressure at the previous step to be checkpointed, the
modular implementation does not require this. load_checkpoint is,
however, expecting this field to be present and throws an error.

This change fixes this by setting the globalState flags in dependence
of whether the modular simulator will be used, avoiding read_checkpoint
to expect this entry.

Note that tests ensuring this bug not to reappear are introduced in the
child change I3bcd0729.

Refs #3377 (fixes point 3)

Change-Id: If8afd294b8c79ceef66e71293d9d93cf2f7d0df8

Revision 3d9ea0b8 (diff)
Added by Pascal Merz about 2 months ago

Expand test coverage of exact continuation tests

The exact continuation tests were not covering the new
Parrinello-Rahman functionality of modular simulator, nor the
berendsen-berendsen NPT case using md-vv. This change fixes this.

Fixes #3377 (fixes point 4, last task on the list)

Change-Id: I3bcd072969259383dd1812d425dd7b3baee5bd85

History

#1 Updated by Anonymous about 2 months ago

  • Status changed from Fix uploaded to Resolved

#2 Updated by Paul Bauer about 1 month ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF