Project

General

Profile

Feature #3374

Feature #3379: C++ API for simulation input and output

SimulationInput abstraction

Added by Eric Irrgang 8 months ago. Updated 8 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
core library
Target version:
Difficulty:
uncategorized
Close

Description

MD simulation initialization has deeply-embedded dependencies on the TPR file and on the checkpoint file. This means there is not a clear seam for features like Monte Carlo state restoration or for decoupling replica exchange (and other multi-sim modes) from the core simulation. More generally, API features for simulation input preparation are blocked by missing abstractions from which simulator invariants can be established.

This is a proposal for a SimulationInput abstraction that
  • reduces coupling between Mdrunner initialization code and file-based input
  • encapsulates the aggregate of input that includes contributions from both TPR and checkpoint files
  • conveys all information needed to start a simulation run (except run time parameters that do not affect the meaning of the results).

The goal is to create an object which represents the entire simulation input, and is sufficient to start a simulation. In a first approach, it will simply be created from the TPR and checkpoint files, but it will later allow creation and manipulation of input independently from these files (e.g. through the API). It would add a layer of abstraction between the reading of input files and the creation of objects used throughout the simulation (such as t_state, inputrec, mtop), simplifying changes on either side of this layer.

Characteristics

SimulationInput is an opaque type that can be used to parameterize factories or Directors in use cases where read_tpx_state() and load_checkpoint() currently appear.

A SimulationInput handle is obtained by client code and provided to the gmx::Mdrunner.

The invariant of a SimulationInput is its ability to define the starting point of a trajectory segment. SimulationInput objects are logically immutable (implementation details may warrant internal statefulness).

Public interface

/*!
 * \brief Prescription for molecular simulation.
 *
 * Represent the complete and unique information needed to generate a simulation
 * trajectory segment. SimulationInput objects are opaque to the public API.
 * Ownership can be managed with SimulationInputHolder objects. Clients can
 * acquire owning references to SimulationInput objects (as SimulationInputHolder)
 * through makeSimulationInput() or from other SimulationInputHolders.
 *
 * A SimulationInput object represents an immutable source of data, and is safe
 * to share. A SimulationInput object may have internal state to support
 * performance optimizations when shared by multiple SimulationInputHolders.
 * The SimulationInput is guaranteed to live at least as long as any associated
 * SimulationInputHolders. The API does not specify whether it may persist
 * longer internally or be reused for later equivalent requests.
 *
 * \seealso SimulationInputHolder
 * \seealso makeSimulationInput()
 *
 * See also https://redmine.gromacs.org/issues/3379 for design and development road map.
 */
class SimulationInput;

/*!
 * \brief Owning handle to a SimulationInput object.
 *
 * SimulationInput objects are logically immutable, so ownership may be shared
 * by multiple SimulationInputHolders.
 *
 * Acquire a SimulationInputHolder with makeSimulationInput()
 *
 * \seealso https://redmine.gromacs.org/issues/3379
 */
class SimulationInputHolder{
public:
    SimulationInputHolder() = delete;
    ~SimulationInputHolder();

    /*! \cond internal
     * \brief Take ownership of private implementation object to produce a new public holder.
     */
    explicit SimulationInputHolder(std::unique_ptr<SimulationInputHolderImpl>&&);
    /*! \endcond */

    /*!
     * \brief Access opaque SimulationInput pointer.
     *
     * \return Borrowed access to the SimulationInput.
     */
    SimulationInput* get() const noexcept;
private:
    std::unique_ptr<SimulationInputHolderImpl, SimulationInputHolderImplDeleter> impl_;
};

Instead of passing “-s” and “-cpi” to gmx::Mdrunner, client code acquires a SimulationInput handle that can be used to initialize the simulation.

/*! \brief Direct the construction of a SimulationInput.
 *
 * Example:
 *     // After preparing a LegacyMdrunOptions and calling handleRestart()...
 *     SimulationInputBuilder builder;
 *     auto simulationInputHandle = makeSimulationInput(options, &builder);
 *
 *     // In addition to MdrunnerBuilder::addFiles(),
 *     mdrunnerBuilder.addInput(simulationInputHandle);
 *
 */
SimulationInputHandle makeSimulationInput(const LegacyMdrunOptions&, SimulationInputBuilder*);

Utilities

While we figure out what the public or object-oriented interface should look like, free function utilities handle existing use cases.

In the simulation runner, calls to read_tpx_state are replaced with calls that use the SimulationInput handle.

/*! \brief Get the global simulation input.
 *
 * Acquire global simulation data structures from the SimulationInput handle.
 * Note that global data is returned in the calling thread. In parallel
 * computing contexts, the client is responsible for calling only where needed.
 *
 * Example:
 *    if (SIMMASTER(cr))
 *    {
 *        /* Only the master rank has the global state */
 *        globalState = globalSimulationState(simulationInput);
 *
 *        /* Read (nearly) all data required for the simulation */
 *        applyGlobalInputRecord(simulationInput, inputrec);
 *        applyGlobalTopology(simulationInput, &mtop);
 *     }
 */
std::unique_ptr<t_state> globalSimulationState(const SimulationInput&);
void applyGlobalInputRecord(const SimulationInput&, t_inputrec*);
void applyGlobalTopology(const SimulationInput&, gmx_mtop_t*);


In the simulation runner, calls to load_checkpoint are replaced with calls that use the SimulationInput handle.
/*! \brief Initialize local stateful simulation data.
 *
 * Establish an invariant for the simulator at a trajectory point.
 * Call on all ranks (after domain decomposition and task assignments).
 *
 * After this call, the simulator has all of the information it will
 * receive in order to advance a trajectory from the current step.
 * Checkpoint information has been applied, if applicable, and stateful
 * data has been (re)initialized.
 *
 * \warning It is the caller’s responsibility to make sure that
 * preconditions are satisfied for the parameter objects.
 *
 * \seealso globalSimulationState()
 * \seealso applyGlobalInputRecord()
 * \seealso applyGlobalTopology()
 *
 * Example:
 *    applyLocalState(simulationInput,
 *               logFileHandle,
 *               cr, domdecOptions.numCells,
 *               inputrec, globalState.get(),
 *               &observablesHistory,
 *               mdrunOptions.reproducible);
 */
void applyLocalState(const SimulationInput&, t_fileio *logfio,
                    const t_commrec *cr, const ivec dd_nc,
                    t_inputrec *ir, t_state *state,
                    ObservablesHistory *observablesHistory,
                    gmx_bool reproducibilityRequested);

Future work

  • Allow a SimulationInput to be used to prototype the builder for a new SimulationInput, and use as the basis for gmxapi.modify_input / gmx_convert-tpr
  • Produce sufficient information from gmx::Mdrunner (or gmx::Mdrunner::mdrunner()) to build a new SimulationInput, corresponding to the final simulator state.
  • Define the composition of a SimulationInput. Define less opaque representations of SimulationInput components that can be extracted from or built into a SimulationInput.
  • adapters to write files out from a SimulationInput
  • serialization/deserialization?

Expansion of builder protocol

makeSimulationInput() can be extended or replaced as a SimulationInput can be composed in multiple ways.

/*! \brief Direct the construction of a SimulationInput.
 *
 * Example:
 *     // mdrun command implementation.
 *     SimulationInputBuilder builder;
 *     auto director = simulationInputDirector(options);
 *     director.construct(&builder);
 *     auto inputHandle = builder.build();
 *
 *     // 
 */
SimulationInputDirector simulationInputDirector(const LegacyMdrunOptions&);

Additionally, it is probably appropriate to refactor handleRestart as a step in directing the construction of the SimulationInput let startingBehavior be extracted from SimulationInput.

Deferred

This proposal is intended to support and complement other efforts that are explicitly beyond the scope of the present issue, such as
  • Define SimulationState.
  • Unify tpr and cpt file formats to a single simulation input / continuation file format.

Related issues

Related to GROMACS - Feature #2901: Declare external Resources in mdp / tpr files. Closed

Associated revisions

Revision 0bda8a13 (diff)
Added by M. Eric Irrgang about 2 months ago

Replace filename-based initialization functions.

Decouple Mdrunner from the details of `-s` and `-cpi`
filename arguments when establishing initial simulation state.

Declare gmx::SimulationInput and the library utilities needed to
replace direct reference to TPR and CPT files in Mdrunner.
Use SimulationInput and accompanying utilities to remove read_tpx_state
and load_checkpoint from Mdrunner.

Allows TPR and checkpoint file handling to be considered separately
from Mdrunner initialization in future development.

Refs #3374

Revision c4e2fbee (diff)
Added by M. Eric Irrgang about 1 month ago

Create and use SimulationInput module.

Allow Mdrunner to accept a handle to SimulationInput from the client.

This change adds an input method to the MdRunner builder to provide
access to resources provided through the new SimulationInput module.

Refs #3374

History

#1 Updated by Eric Irrgang 8 months ago

  • Description updated (diff)

#2 Updated by Eric Irrgang 8 months ago

  • Parent task set to #3379

In the developer teleconference yesterday, there seemed to be a request for more context and longer reaching road map, so I have created #3379 as a parent issue.

#3 Updated by Eric Irrgang 8 months ago

  • Description updated (diff)

#4 Updated by Joe Jordan 8 months ago

There a few further issues (at least!) that would block this implementation and thus also must be addressed to move towards the proposed model.

1) Currently, there is a possibility of file reading in Mdrunner::mdrunner() if table forces are requested. In this case, the call to init_forcerec will read the table xvg files provided as command line options. I am currently working on pulling this out of init_forcerec, but due to other obligations I cannot say when this will be done. Once t_fcdata is a separate object from forcerec, it could potentially be pulled out of the runner entirely. However, this would first require also pulling t_disresdata and t_oriresdata out of t_fcdata. This should not be too challenging since these objects are largely decoupled, but it will need to be done for file reading to all happen at a higher level. I don't think anyone has plans yet to work on this.

2) Both membed and essential dynamics need to be moved to become MdModules so that they manage their own file reading internally. I believe Christain and Pascal have tentative plans to work on this, but the timeline of when this would happen is still not clear.

3) The logger needs to be managed at the client level and given as input to mdrunner() as well as all other command line executables. Paul is working on modernizing the logger but there is still a need for some design discussion over how this would be managed.

I think these, in combination with moving tpr/cpt serialization to a higher level would get us most of the way to mdrunner working as described here.

#5 Updated by Eric Irrgang 8 months ago

Joe Jordan wrote:

There a few further issues (at least!) that would block this implementation and thus also must be addressed to move towards the proposed model.

Can you please clarify which issues block the interface proposed versus aspects of the longer term proposal? Please consider revising tasks or sub-issues of #3379.

1) Currently, there is a possibility of file reading in Mdrunner::mdrunner() ...
... file reading to all happen at a higher level.

My intention is to separate the interface specification from the implementation details. I propose that gmx::Mdrunner itself should be decoupled from explicit notions of file access. I don't think that initial commits to evolve the interfaces should be blocked by rearchitecture of all file handling. If necessary, we should refine the scope of this issue to minimally allow TPR and CPT input handling to be decoupled from gmx::Mdrunner in the next week or two, and address other details in separate issues.

It sounds like there is consensus that this abstraction is desirable. Are there design concerns related to the interface, or are we ready for code?

2) Both membed and essential dynamics need to be moved to become MdModules so that they manage their own file reading internally.

I think this concern is not only beyond the scope of the present issue, but of the parent issue as well. I added "Decouple Mdrunner from membed and essential-dynamics implementation details." to issue #3379 as a "deferred" task. Please comment if you have more perspective on road map constraints or priorities.

3) The logger needs to be managed at the client level and given as input to mdrunner()...

The logger is definitely outside of the scope of what characterizes a trajectory segment. I have made a note in the parent issue for this, as well.

I think these, in combination with moving tpr/cpt serialization to a higher level would get us most of the way to mdrunner working as described here.

The main purpose of this proposal is to establish that Mdrunner is agnostic to the source of TPR and checkpoint information. You have noted that some modules draw additional initialization information from other sources, but for the present purposes, I think it is sufficient that the participation of those modules is indicated by the contents of the TPR. Ultimately, implementation-specific input data can be opaquely held in the SimulationInput, and it is worth discussing in the context of Mdrunner and Simulator architecture, but I don't think I could comment further on that without seeing at least a few commits merged to support basic use cases.

I would be happy to contribute to or collaborate on a complete road map for the evolution of Mdrunner and the Simulator, but I'm still trying to understand the related work that is already underway, and other developer priorities.

#6 Updated by Christian Blau 8 months ago

Have a look at previous discussion at "Declare external Resources in mdp / tpr files." #2901

I believe, essential dynamics and membed both can move forward without mdmodularisation (though that would help), by giving them some file-handling wrapper structure instead of filenames to operate on.

The Densityfitting Module complicates things a bit, as it, completely unbeknownst to the rest of the code, stores it's own reference density file name and opens it, when the simulation starts.

I believe that the only way around this is to have it announce that is wants to read a certain data source, attempting to open all the data sources and calling back the modules with handles to the different data sources. The data sources could easily be identified by file names, so that not much of the interface would have to change.

#7 Updated by Eric Irrgang 8 months ago

  • Related to Feature #2901: Declare external Resources in mdp / tpr files. added

#8 Updated by Eric Irrgang 8 months ago

  • Description updated (diff)

description updated with some interface details and corrections.

#9 Updated by Eric Irrgang 8 months ago

I don't know how extensive the logical coupling is between gmx_mtop_t, t_state, and inputrec (or embedded atom data structures). It may not be appropriate to treat them separately. If any of these are not actually decoupled, we should consider adjusting the interface to reflect that, possibly just with std::pair and/or std::tie. But I would like to reflect future plans for the encapsulation of these data. Please advise.

Also available in: Atom PDF