Project

General

Profile

Task #3422

Task #3418: Infrastructure improvements for modular simulator

Implement modular checkpointing for modular simulator

Added by Pascal Merz 8 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
mdrun
Difficulty:
uncategorized
Close

Description

In GROMACS 2020, checkpointing modularization is approximated by passing around a t_state object for writing, and restoring using the global state created by the runner. Moving forward, the checkpointing module should not need any knowledge about the details of the checkpointed data.

A possible design to achieve this is to simplify the checkpointing object to serialize and deserialize a key-value tree. In checkpoint reading, this tree can then be passed to the constructor / builder of the modules to restore a previous state. In checkpoint writing, the checkpointing object passes the tree to its clients, allowing them to write arbitrary data needed to restore themselves to their current state.


Related issues

Related to GROMACS - Feature #3379: C++ API for simulation input and outputNew

Associated revisions

Revision 93fcbcf9 (diff)
Added by Pascal Merz about 2 months ago

Make dd_collect_vec independent of t_state

This removes the explicit dependency of dd_collect_vec on the
t_state object.

This is a prerequisite for !440.

Refs #3422 #3419

Revision 3216aceb (diff)
Added by Pascal Merz about 2 months ago

Introduce CheckpointData

CheckpointData exposes methods to read and write scalar values,
ArrayRefs, and tensors. It also allows to create a "sub-object" of
type CheckpointData which allows to have more complex members
implement their own checkpointing routines. All methods are templated
on the chosen operation, CheckpointDataOperation::Read or
CheckpointDataOperation::Write, allowing clients to use the same code
to read and write to checkpoint. Type traits and constness are used to
catch as many errors as possible at compile time. CheckpointData uses
a KV-tree to store the data internally. This is however never exposed
to the client. Having this abstraction layer gives freedom to change
the internal implementation in the future.

All CheckpointData objects are owned by a ReadCheckpointDataHolder or
WriteCheckpointDataHolder. These holder classes own the internal
KV-tree, and offer deserialize(ISerializer*) and
serialize(ISerializer*) functions, respectively, which allow to read
from / write to file. This separation clearly defines ownership and
separates the interface aimed at file IO from the interface aimed at
objects reading/writing checkpoints.

Refs #3517
Refs #3422
Refs #3419

Revision 8abc59d8 (diff)
Added by Pascal Merz about 2 months ago

Prepare legacy checkpoint for modular simulator checkpointing

  • Extend legacy checkpointing functionality to accept a CheckpointDataHolder
    for reading and writing
  • Bump checkpoint version to reflect above change
  • Turn off some checkpoint sanity checks when using modular simulator
  • Pass CheckpointDataHolder object into checkpoint reading in runner, and
    move this object in SimulatorBuilder and then ModularSimulator for element
    setup

Refs #3517
Refs #3422
Refs #3419

Revision f7ee824f (diff)
Added by Pascal Merz about 2 months ago

Make legacy energy elements use CheckpointData

This enables some legacy energy elements to write to CheckpointData,
namely

  • energyhistory_t
  • delta_h_history_t
  • ekinstate_t

Refs #3517
Refs #3422
Refs #3419

Revision ac34f147 (diff)
Added by Pascal Merz about 1 month ago

Implement modular checkpointing

Using the CheckpointData format introduced in a parent commit, this
rewrites checkpointing for the modular simulator to completely use
the new format.

The CheckpointHelper is now passing a CheckpointData object to its
clients (instead of a legacy t_state object). Clients are now stored
in a map, as they are identified by their unique key to be able to
assign the correct CheckpointData sub-objects at reading and writing.

If checkpoint reading occured, the newly introduced
CheckpointHelperBuilder receives the CheckpointData object read at the
runner level from the ModularSimulator. It then initializes its clients
with their respective, read-only CheckpointData subobjects.

The ICheckpointHelperClient interface is adapted to reflect above
changes.

The ModularSimulatorAlgorithmBuilder is slightly simplified thanks to
to the introduction of a proper builder for the CheckpointHelper.

The ComputeGlobalsElement is simplified, as it is not required to know
about the needs of communication of the EnergyData object which
depends on checkpoint reading.

Finally, all elements which are checkpoint clients are updated to
implement the new design. Note that they all introduce their own
checkpoint versioning, as the data being checkpointed is opaque to the
checkpointing infrastructure.

Closes #3517
Closes #3422
In partial fulfillment of #3419

History

#1 Updated by Pascal Merz 8 months ago

  • Category set to mdrun
  • Assignee set to Pascal Merz
  • Target version set to 2021-infrastructure-stable

#2 Updated by Eric Irrgang 8 months ago

  • Related to Feature #3379: C++ API for simulation input and output added

#3 Updated by Anonymous about 1 month ago

  • Status changed from New to Resolved

Also available in: Atom PDF