Project

General

Profile

Task #2375

Clarify execution phases for MD simulation

Added by Eric Irrgang 6 months ago. Updated 6 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Difficulty:
uncategorized
Close

Description

In support of a roadmap to an API layer between the user interface and the MD simulation machinery, one early set of tasks is to tighten up various changes of execution state and clarify the seams for client-library interaction.

Basic features of API-driven simulation are to

  • allow calling code (client) to provide or modify initial state
  • provide or modify simulation parameters and runtime parameters
  • directly access trajectory data during or after a simulation without expensive filesystem I/O * API abstraction for trajectory output and/or checkpointing * API-provided MDModule or call-back access to system snapshots * abstract or at least convertible representation of initial and final simulation state before and after performing the specified MD integration.

The above can be extracted to a parent issue at some point, but the current issue is intended to address the first bullet: define a roadmap to allow API client code to provide and/or clearly understand the initial state of a simulation. It is also an excuse to start clarifying phases of program execution such that non-user-interface aspects of mdrun can be compartmentalized into the library, allowing consistent semantics between CLI and other API-driven work. This probably involves work to encapsulate or reconsider the ownership relationships of things like modules, command-line options, and initialization defaults.

To start discussion, I would propose the following sequence of changes to submit.

0. Minor updates when state is loaded by `read_tpx` to more informatively indicate it is preliminary, pending checkpoint loads.
1. Remove or rework dependent code that requires late checkpoint loads. Potentially includes changes to what is included in checkpoint, such as parallelism runtime details.
2. Move checkpoint load earlier and clearly establish initial state.

Other changes to be addressed in separate Redmine issues include modernizing and extracting the command-line arguments (a whole other can of worms) behind something like the MdpOptionsProvider interface, though, as above with the checkpoint data, these may require discussion of what is a simulation parameter versus an execution parameter.

History

#1 Updated by Erik Lindahl 6 months ago

For the first step, I think it will be much simpler to move/modify existing code so that all reading of input files and possible modifications of the initial state happens before we call a main entry point.

There is also initial load balancing and other things affecting parameters, but there I think we should separate things properly into:

a) Things that we need to allow to change during any part in a simulation (e.g. redoing load balancing). For these there have to be proper re-initialization routines.

b) Things that we only want to alter as part of some initial testing/balancing. Here I think we should rather use special calls and make sure we can set up completely new simulations very fast.

This way we will be able to move to a cleaner "start simulation" API, and start by having a simple call where we simply obey the settings provided by the user, rather than hoping to redesign everything right away.

#2 Updated by Eric Irrgang 6 months ago

Erik Lindahl wrote:

For the first step, I think it will be much simpler to move/modify existing code so that all reading of input files and possible modifications of the initial state happens before we call a main entry point.

I would like that, but that's a very big step that I am trying to break down into smaller steps.

b) Things that we only want to alter as part of some initial testing/balancing. Here I think we should rather use special calls and make sure we can set up completely new simulations very fast.

I am trying to tackle something like "b" first. I am basically separating the tasks of managing parameters and managing data. Managing data seems simpler and more immediately useful, but will require some shuffling of where and how some parameters are managed. I will start one or more separate Redmine issues regarding managing options / parameters, etc.

#3 Updated by Mark Abraham 6 months ago

I have a patch series in preparation that should lead to being able to load the checkpoint immediately after the .tpr is read. This is more than mere beautification or nice organization - the checkpoint coordinates are needed for DD to do a good job.

As a side effect, we will be able to handle appending restarts better, in particular for handling opening output files more simply.

Also available in: Atom PDF