The simulation restartability from gmxapi 0.0.7 is specific to MD work and has outstanding issues from the old GitHub project. More importantly, it is based on assumptions about gmx::mdrun behavior and not generalizable to multi-operation work graphs. We need to develop the metadata and framework by which a gmxapi Context implementation can discover the state of complete or incomplete work from past invocations to avoid duplicating data or re-executing work. We also need to develop the interaction between high level and lower-level tools in order to establish a well-defined graph state when individual nodes are partially executed but recoverable.
Note that this feature interacts with #3146 and other issues.
- Establish reproducible/deterministic labeling of work graph invariants/immutables (defined work and data) and of filesystem artifacts.
- Extend deserialization of a work graph with initialized data sources.
- Design and implement lower level API bridge for library internal or operation-specific checkpointing facilities.
- Negotiate an achievable shared state.
- Make low-level checkpointing more pluggable. Allow deference of the library to gmxapi abstractions so that we can make bridges to different data/work management software or particular environments.
- More... Various issues and considerations from the gmxapi GitHub project should be migrated to the GROMACS issue tracking system.
- Prioritize and sequence the tasks and release targets.