Project

General

Profile

Feature #3379

C++ API for simulation input and output

Added by Eric Irrgang 9 months ago. Updated 9 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
core library
Difficulty:
uncategorized
Close

Description

Functionalities such as (hybrid) Monte Carlo, simulation replicas, replica exchange, and input preparation/manipulation share a need for API access to simulation inputs and outputs.

Additionally, efforts to limit the responsibilities of individual tools (and separate out convenience options) warrant light-weight ways to connect tools, including ways to filter or manipulate trajectory output before it hits the filesystem. See, for instance, #3286.

This issue is intended to collect a roadmap for design and development.

Related efforts include

  • encapsulation, abstraction, and interface development under nb-lib
  • restructuring of simulator launch, collaborations, and data structures related to expansion of the ModularSimulator (links from Pascal? Paul?)
  • expansion of the MdModules framework (links from Christian? Others?)
  • evolution of modular input handling
  • evolution of the checkpoint facilities
  • clarifying simulator program state and invariants (#3325, #2375)

Use cases

To clarify the scope of this issue, define some use cases.

Application level

Features and tools enabled by the API functionality described in this issue.

Ensemble simulation / multi-sim

Temperature replica exchange

Hamiltonian replica exchange

Monte Carlo rejection of a trajectory segment

convert-tpr / gmxapi.modify_input

gmx dump

grompp

nb-lib translation

Filesystem-decoupled input preparation and simulation

Filesystem-decoupled simulation output handling

API level

API use cases driving features within this issue scope, supporting the scenarios expected within the application use cases above.

Obtain a reference to the output of a simulation segment.

Produce input for a simulation segment from the output of a simulation segment.

Obtain a modified SimulationInput from an "editing" operation.

Compose a SimulationInput

Decompose a SimulationInput (topology, microstate, simulation parameters, metadata, others?)

Fingerprint a SimulationInput (identify the trajectory of which it is a part and the segment that will be produced (uniquely to the point of reproducibility and/or scientific relevance))

Library level

Library-internal use cases included by the above API implementation scenarios, or connected to the accompanying (re)factoring.

Apply SimulationInput to consuming modules.

Initialize volatile data (internal state) from the (immutable) record of input.

Coordinate a Memento, or publish light-weight (opaque) handle to simulator output or checkpoint (don't bake in details of data locality or structure)

Module level

Interactions between GROMACS internal modules and the new API facilities or supporting infrastructure.

(Re)initialize internal state.

Dump internal state.

Confirm input validity.

Register information or collaboration dependencies.

Register, publish, or be able to describe available outputs.

Additional goals

Distinguish between (immutable) input and (mutable) program state (clarify stages of initialization, reform inputrec use cases).

Clarify the information hierarchy represented by SimulationInput (and SimulationOutput)

Maximize reusability of the MD runner
  • allow SimulationInput to be reapplied in a process lifetime
  • understand reusable resources or data structures that do not need reinitialization

Define SimulationState encapsulation, or coordinate with its road map.

Deferred

To further clarify the scope of this issue, identify related tasks that should have a more explicit road map, but which are (currently) considered beyond the scope of this feature topic.

  • Decouple Mdrunner collaborations from assumptions of file-based I/O (Remove the ArrayRef<const t_filenm> from gmx::Mdrunner.)
  • Modernize/unify run time simulation options handling (#2877)
  • clean up the mdrun call hierarchy and program flow (input aggregation, acquisition of run time resources, component initialization and binding, creation protocols, "runner" versus "simulator")
  • Decouple Mdrunner from membed and essential-dynamics implementation details.
  • Logging abstraction (#2999)

Tasks

  • SimulationInput abstraction (#3374): let the existence of TPR and checkpoint input files be client-level concerns, and encapsulate their handling from the rest of the mdrun call stack.
  • Read files during acquisition of the SimulationInput handle. (proposed during dev telco, 12 February 2020)
  • Optimize concrete SimulationInput for TPR/CPT serialization protocol. (proposed during dev telco, 12 February 2020)
  • Export SimulationInput bindings to the Python package.
  • Reimplement gmxapi.simulation operations in terms of SimulationInput.
  • Reimplement gmxapi.modify_input and convert-tpr in terms of SimulationInput. (also ref #3295)
  • more (please contribute)

Criteria for completion

This issue may remain open as long as it is a useful road map, but can likely be considered "resolved" when the API use cases to support the targeted applications are well understood, and either implemented or independently tracked on another road map.


Subtasks

Feature #3374: SimulationInput abstractionNew
Feature #3439: Optimize successive simulation segmentsNew

Related issues

Related to GROMACS - Feature #3286: Optionally skip initial coordinates from being written to output coordinatesRejected
Related to GROMACS - Feature #3285: Run simulations from the same tpr file with different random seedsResolved
Related to GROMACS - Feature #3433: Decide how to handle multisim with modular simulatorNew
Related to GROMACS - Task #3422: Implement modular checkpointing for modular simulatorResolved

History

#1 Updated by Eric Irrgang 9 months ago

  • Description updated (diff)

#2 Updated by Eric Irrgang 9 months ago

  • Description updated (diff)

#3 Updated by Eric Irrgang 9 months ago

  • Description updated (diff)

#4 Updated by Eric Irrgang 9 months ago

  • Related to Feature #3286: Optionally skip initial coordinates from being written to output coordinates added

#5 Updated by Eric Irrgang 9 months ago

  • Related to Feature #3285: Run simulations from the same tpr file with different random seeds added

#6 Updated by Eric Irrgang 9 months ago

  • Related to Feature #3433: Decide how to handle multisim with modular simulator added

#7 Updated by Eric Irrgang 9 months ago

  • Related to Task #3422: Implement modular checkpointing for modular simulator added

Also available in: Atom PDF