While a gmxapi typing system and C++ data model was delayed, we adopted a combination of native Python types and trivial custom types to define operation inputs, provide static type hinting for users, drive the generation of wrapper code, perform input compatibility checks, and inform implicit handling of data shape transformations. Current software state and works in progress include scatter() and gather() helper functions, NDArray and EnsembleDataSource classes, and incomplete static type hinting.
To clarify a snapshot of syntax and semantics from which to build from, a series of commits under review revises the above-mentioned entities and, in many cases, requires Python iterables to be explicitly wrapped with scatter(), gather(), or ndarray() to clarify user intent for the API. These interface revisions may cause confusion to users, though.
This issue asserts that we can preempt a harmfully confusing interim interface with a user interface that, while imperfect, allows us to separate the underlying issues and provide consistent semantics moving forward.
Inputs to gmxapi operations must be references to gmxapi objects. We can refine heuristics for implicit conversions in the future, but the immediate fix will be to require native Python data to be explicitly converted to a gmxapi data object before using as input to a gmxapi operation. The helper function for creating such an object consolidates the user interface for defining data shape and parallelizability.
We can minimize some dependency on updated type hinting machinery by first addressing #3140 and converting the existing function-annotation-based operation input expression.
A single helper function can return an object interpretable by the API for a Python literal or object provided by the caller. Constraints on the interpretation of the data can be expressed with optional key word arguments. Constraints include data shape, data type, parallelizability and allowable conversions.
We can provide additional helper functions for basic transformations, if necessary, but a function name containing "as", like
gmxapi.as_data(), implies the common convention that conversions may be performed on the input object.
#4 Updated by Eric Irrgang 8 months ago
- Target version changed from 2020-beta2 to 2020-beta3
Full resolution of this issue occurs in several phases, with ties to other issues.
Approximate task sequence:
Milestone (#3130): support well-defined 1-dimensional data edge shape without exposing NDArray or related helpers to users.¶
- Normalize behavior: Require data edge shape / topology to be clearly specified with
- Encapsulate data type and shape in the UI: Users can call as_data to put data sources onto the current work graph (at any time), obtaining a Future reference (for use as operation input). (The stuff we want to remove is still there, but hidden under the hood.)
- Normalize explicit expression of operation inputs. Decouple operation input description from function type hint annotations. (#3140)
- All Futures have "shape"
- All operation inputs have "shape"
- All operation inputs must be Future references.
- Normalize operation inputs to Futures in the current Context (use as_data?).
- gmxapi.datamodel.NDArray is removed (replaced with data shaping helpers)
- EnsembleDataSource is removed.
Milestone: Encapsulate data shape and underlying type in the library implementation.¶
- Can we replace "dict" sources with ResourceCollection containers?
- Get rid of the "member" sequence behavior of ResourceManagers and Futures. Split the execution and data shaping responsibilities of ResourceManager. (#3136)
- Update existing operations to resolve static type-checker warnings. Requires updates to the ABCs and generics / constrained TypeVars.
- Migrate to common C++ based gmxapi data (#2993).