The ensemble data flow use cases implied by the example scripts are ambiguously or incompletely implemented. This is particularly apparent with the gmxapi.commandline_operation tool.
Part of the problem is a combinatoric explosion of complexity between scalar data, array data, data Futures, and implicit or explicit ensembles. The best solution is to finally implement the (C++ based) gmxapi typing system and data interface, in which all data has shape and supports the Future interface, but there are nearer-term tasks.
Initial resolution steps are the triage described in issue #3130 and documentation updates for gmxapi.operation and gmxapi.commandline.
- Separate the user experience from the data model implementation details. We will first require user data input to be wrapped / annotated with a single function, clarifying that operations take "gmxapi data objects" as input, and consolidating details such as "which dimensions can be scattered from or parallelized?". Then the discussion can proceed regarding the interface of such a helper function, and the heuristics for implicit conversion of native Python types (or non-gmxapi-native objects) to gmxapi data objects.
- Create the core gmxapi data object and move future development to C++, separating the implementation from the interface used in the gmxapi package implementation. (May be beyond the scope of the bug fix.)
This issue can be considered resolved when both UI and API are well defined and behaving as documented for a known upcoming gmxapi version number.