Project

General

Profile

Bug #3309

-reprod: checkpoint reading bug and general considerations

Added by Pascal Merz 2 months ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

-reprod: checkpoint reading bug and general considerations

The `-reprod` option is described in the manual as

-[no]reprod (no)
Try to avoid optimizations that affect binary reproducibility
On a first glance, it seems that in practice, this option
  • turns off dynamic load balancing / PME tuning
  • does not allow to stop between neighbor-searching steps
  • checks when reading in checkpoints if the number of total/PP/PME/PP&PME ranks is consistent between the current simulation and the checkpoint

(Am I missing something?)

Bug: Currently, the last check is not correctly implemented, as checkpoint reading is performed before domain decomposition (and hence set up of PP/PME duties). Checkpoint reading will always yield a "PME rank mismatch" if the number of PME ranks was not 0 in the checkpoint file, even when explicitly setting the same number of dedicated PME ranks via command line.

Further considerations: While the upper issue can easily be fixed (probably by shifting the check from checkpoint reading to DD set up), the broader question is whether the `-reprod` option is restrictive enough. Personally, I feel that the name and the description of the `-reprod` option promises a level of (binary) reproducibility which we can't and shouldn't guarantee except under very specific constraints. If my list above is complete, it seems that simulations would hardly be binary identical when running on multiple ranks or on GPUs.

Other mentioning of `-reprod` in the manual:

Section http://manual.gromacs.org/documentation/current/user-guide/mdrun-features.html#running-a-simulation-in-reproducible-mode

It is generally difficult to run an efficient parallel MD simulation that is based primarily on floating-point arithmetic and is fully reproducible. By default, gmx mdrun will observe how things are going and vary how the simulation is conducted in order to optimize throughput. However, there is a “reproducible mode” available with mdrun -reprod that will systematically eliminate all sources of variation within that run; repeated invocations on the same input and hardware will be binary identical. However, running in this mode on different hardware, or with a different compiler, etc. will not be reproducible. This should normally only be used when investigating possible problems.

Section http://manual.gromacs.org/documentation/current/user-guide/managing-simulations.html#reproducibility in general, and especially

Further, using `gmx mdrun -reprod` will eliminate all sources of non-reproducibility that it can, i.e. same executable + same hardware + same shared libraries + same run input file + same command line parameters will lead to reproducible results.

History

#1 Updated by Artem Zhmurov 2 months ago

I vote for allowing to use -reprod with only one rank.

#2 Updated by Pascal Merz 2 months ago

Summary of lunch discussion with Berk:

  • MPI simulations without load balancing are in principle reproducible (depending on implementation of MPI reduction), and in practice often are (we could implement a version that fixes the order of reduction if we wanted to be sure)
  • Simulations using GPU are currently not reproducible (would need some effort to write a fixed order version)
  • -reprod option is useful for debugging

next steps therefore include

  • make documentation unambiguous about what -reprod does
  • move comparison with checkpoint data to construction of domdec to fix bug
  • add assertion to disallow -reprod runs with GPU
  • add GPU flag to checkpoint to check that previous run did not use GPU either

#3 Updated by Erik Lindahl about 1 month ago

Update the documentation for now - the command line option docs are correct in that we try, but it's not perfect in all cases.

We have thought a while about implementing our own fixed point accumulators, which should likely solve the GPU issue, but likely not general load balancing. This is an option meant for debugging, not as a magical solution that hides the fact that we're dealing with floating point and doing dynamic ordering - there will always be a performance penalty.

Also available in: Atom PDF