Project

General

Profile

Bug #3150

Task #2045: API design and language bindings

Feature #2993: Scalar and structured type expression and definitions for API

gmxapi data type annotations are confusing and inadequate

Added by Eric Irrgang about 1 month ago. Updated 29 days ago.

Status:
New
Priority:
Normal
Assignee:
Category:
gmxapi
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

To address issues #3130 and #3136, it is necessary to revise both the way API input and output data is described and the way users interact with function arguments.

Background

While a gmxapi typing system and C++ data model was delayed, we adopted a combination of native Python types and trivial custom types to define operation inputs, provide static type hinting for users, drive the generation of wrapper code, perform input compatibility checks, and inform implicit handling of data shape transformations. Current software state and works in progress include scatter() and gather() helper functions, NDArray and EnsembleDataSource classes, and incomplete static type hinting.

To clarify a snapshot of syntax and semantics from which to build from, a series of commits under review revises the above-mentioned entities and, in many cases, requires Python iterables to be explicitly wrapped with scatter(), gather(), or ndarray() to clarify user intent for the API. These interface revisions may cause confusion to users, though.

This issue asserts that we can preempt a harmfully confusing interim interface with a user interface that, while imperfect, allows us to separate the underlying issues and provide consistent semantics moving forward.

Proposal

Inputs to gmxapi operations must be references to gmxapi objects. We can refine heuristics for implicit conversions in the future, but the immediate fix will be to require native Python data to be explicitly converted to a gmxapi data object before using as input to a gmxapi operation. The helper function for creating such an object consolidates the user interface for defining data shape and parallelizability.

We can minimize some dependency on updated type hinting machinery by first addressing #3140 and converting the existing function-annotation-based operation input expression.

Syntax

A single helper function can return an object interpretable by the API for a Python literal or object provided by the caller. Constraints on the interpretation of the data can be expressed with optional key word arguments. Constraints include data shape, data type, parallelizability and allowable conversions.

We can provide additional helper functions for basic transformations, if necessary, but a function name containing "as", like gmxapi.as_data(), implies the common convention that conversions may be performed on the input object.


Related issues

Related to GROMACS - Feature #3140: Allow explicit input definition for gmxapi.operation function wrapperNew
Related to GROMACS - Task #3130: Interim handling of gmxapi data references.New
Related to GROMACS - Bug #3136: gmxapi.operation data flow topology unclear or incompleteNew

History

#1 Updated by Eric Irrgang about 1 month ago

  • Related to Feature #3140: Allow explicit input definition for gmxapi.operation function wrapper added

#2 Updated by Eric Irrgang about 1 month ago

  • Related to Task #3130: Interim handling of gmxapi data references. added

#3 Updated by Eric Irrgang about 1 month ago

  • Related to Bug #3136: gmxapi.operation data flow topology unclear or incomplete added

#4 Updated by Eric Irrgang 29 days ago

  • Target version changed from 2020-beta2 to 2020-beta3

Full resolution of this issue occurs in several phases, with ties to other issues.

Tasks

Approximate task sequence:

Milestone (#3130): support well-defined 1-dimensional data edge shape without exposing NDArray or related helpers to users.

  • Normalize behavior: Require data edge shape / topology to be clearly specified with scatter() or ndarray() wrappers
  • Encapsulate data type and shape in the UI: Users can call as_data to put data sources onto the current work graph (at any time), obtaining a Future reference (for use as operation input). (The stuff we want to remove is still there, but hidden under the hood.)
  • Normalize explicit expression of operation inputs. Decouple operation input description from function type hint annotations. (#3140)

Milestone (#3136): Unify internal data shaping to support well-defined N-dimensional data edges.

  • All Futures have "shape"
  • All operation inputs have "shape"
  • All operation inputs must be Future references.
  • Normalize operation inputs to Futures in the current Context (use as_data?).
  • gmxapi.datamodel.NDArray is removed (replaced with data shaping helpers)
  • EnsembleDataSource is removed.

Milestone: Encapsulate data shape and underlying type in the library implementation.

  • Can we replace "dict" sources with ResourceCollection containers?
  • Get rid of the "member" sequence behavior of ResourceManagers and Futures. Split the execution and data shaping responsibilities of ResourceManager. (#3136)
  • Update existing operations to resolve static type-checker warnings. Requires updates to the ABCs and generics / constrained TypeVars.
  • Migrate to common C++ based gmxapi data (#2993).

#5 Updated by Eric Irrgang 29 days ago

  • Private changed from Yes to No

#6 Updated by Eric Irrgang 29 days ago

  • Target version changed from 2020-beta3 to 2021-infrastructure-stable

Also available in: Atom PDF