Task #3422
Task #3418: Infrastructure improvements for modular simulator
Implement modular checkpointing for modular simulator
Description
In GROMACS 2020, checkpointing modularization is approximated by passing around a t_state object for writing, and restoring using the global state created by the runner. Moving forward, the checkpointing module should not need any knowledge about the details of the checkpointed data.
A possible design to achieve this is to simplify the checkpointing object to serialize and deserialize a key-value tree. In checkpoint reading, this tree can then be passed to the constructor / builder of the modules to restore a previous state. In checkpoint writing, the checkpointing object passes the tree to its clients, allowing them to write arbitrary data needed to restore themselves to their current state.
Related issues
Associated revisions
Introduce CheckpointData
CheckpointData exposes methods to read and write scalar values,
ArrayRefs, and tensors. It also allows to create a "sub-object" of
type CheckpointData which allows to have more complex members
implement their own checkpointing routines. All methods are templated
on the chosen operation, CheckpointDataOperation::Read or
CheckpointDataOperation::Write, allowing clients to use the same code
to read and write to checkpoint. Type traits and constness are used to
catch as many errors as possible at compile time. CheckpointData uses
a KV-tree to store the data internally. This is however never exposed
to the client. Having this abstraction layer gives freedom to change
the internal implementation in the future.
All CheckpointData objects are owned by a ReadCheckpointDataHolder or
WriteCheckpointDataHolder. These holder classes own the internal
KV-tree, and offer deserialize(ISerializer*) and
serialize(ISerializer*) functions, respectively, which allow to read
from / write to file. This separation clearly defines ownership and
separates the interface aimed at file IO from the interface aimed at
objects reading/writing checkpoints.
Prepare legacy checkpoint for modular simulator checkpointing
- Extend legacy checkpointing functionality to accept a CheckpointDataHolder
for reading and writing - Bump checkpoint version to reflect above change
- Turn off some checkpoint sanity checks when using modular simulator
- Pass CheckpointDataHolder object into checkpoint reading in runner, and
move this object in SimulatorBuilder and then ModularSimulator for element
setup
Implement modular checkpointing
Using the CheckpointData format introduced in a parent commit, this
rewrites checkpointing for the modular simulator to completely use
the new format.
The CheckpointHelper is now passing a CheckpointData object to its
clients (instead of a legacy t_state object). Clients are now stored
in a map, as they are identified by their unique key to be able to
assign the correct CheckpointData sub-objects at reading and writing.
If checkpoint reading occured, the newly introduced
CheckpointHelperBuilder receives the CheckpointData object read at the
runner level from the ModularSimulator. It then initializes its clients
with their respective, read-only CheckpointData subobjects.
The ICheckpointHelperClient interface is adapted to reflect above
changes.
The ModularSimulatorAlgorithmBuilder is slightly simplified thanks to
to the introduction of a proper builder for the CheckpointHelper.
The ComputeGlobalsElement is simplified, as it is not required to know
about the needs of communication of the EnergyData object which
depends on checkpoint reading.
Finally, all elements which are checkpoint clients are updated to
implement the new design. Note that they all introduce their own
checkpoint versioning, as the data being checkpointed is opaque to the
checkpointing infrastructure.
History
#1 Updated by Pascal Merz 11 months ago
- Category set to mdrun
- Assignee set to Pascal Merz
- Target version set to 2021-infrastructure-stable
#2 Updated by Eric Irrgang 11 months ago
- Related to Feature #3379: C++ API for simulation input and output added
#3 Updated by Anonymous 4 months ago
- Status changed from New to Resolved
Applied in changeset ac34f147f63fe2d377b104872d898861f926bfd2.
Make dd_collect_vec independent of t_state
This removes the explicit dependency of dd_collect_vec on the
t_state object.
This is a prerequisite for !440.
Refs #3422 #3419