Feature #1193

Updated by Teemu Murtola over 7 years ago

* Provide a GROMACS component that can manage the process of reading
frames from a trajectory file
* Use standard C++ idioms (e.g. design patterns) to achieve
flexibility and extensibility - adding support for a new kind of I/O,
trajectory format or trajectory frame object should be just a matter
of subclassing an interface and coding up the details
* Provide an interface such that client code does not need to be aware
of any low-level details of file reading or trajectory formats
* Roughly, replace the functionality of read_first_frame() and
read_next_frame() by the creation of a management object and looping
over calls to its readNextFrame() method

* Client code will have responsibility for creating selections after
trajectory frames have been read (I hope that's consistent with the
current form of the selection code!)
* New memory will be allocated for each new frame read, and stored in
a shared_ptr
* Memory for old frames will be deallocated when the last shared_ptr
referring to it goes out of scope (best practice: use only named

See attached graphic for draft class diagram. Methods and members have
'+', '-' and '#' prefixes for public, private and protected visbility.
Unfortunately, there's apparently no visual indicator for virtual or
static methods. Explanations of how things might work follows.

* is intended for use in tools and mdrun -rerun
* contains handles to various objects (FileIOManager,
TrajectoryReaderStrategy, CoordinateFrame)
** for taking care of details that vary with context
** that are constructed by methods that implement the Factory Method
design pattern (createReader(), createFileManager)
* uses the objects' interfaces to manage the control and data flow
between objects (but exception to that principle below)
* has a method readNextFrame() that
** implements the Template design pattern for the high-level
frame-reading algorithm (read header, apply conditions, read body,
apply conditions
** calls concrete protected methods/hooks to do the work that will
permit it to return
*** a correct frame in the form the client code expects,
*** an end-of-file event, or
*** exceptions that only a client can handle
** handles all exceptions generated in this component (even if only
to re-throw some other exception)
** has hooks so that future subclasses can override them to
specialize readNextFrame() if there is need
** builds a coordinate frame suitable for use by the client by using
the Prototype design pattern - that is, by cloning a prototype
supplied by the client. (That prototype should have empty
containers). This avoids needing some kind of enumeration of types of
CoordinateFrame just so that client code can tell this code what to
construct - the object type is its own identifier. So a serial tool
would just build an empty PrimitiveFrame to pass in (and
PrimitiveFrame has a static method to do just that).

* returns only coordinate frames that satisfy conditions specified by
the client (e.g. frames every 2ns) (This could be a seperate
responsibility in a separate class. Probably we want the feature of
skipping reading the body of a frame whose header failed the
condition. Implementing that anywhere outside TrajectoryReaderStrategy
requires that it make available the partly-filled temporary
PrimitiveFrame. Is keeping "condition testing" as a "management role"
despite violating encapsulation a lesser evil than
** making a new class for condition testing (also violates
encapsulation), or
** adding that responsibility to the TrajectoryReaderStrategy?
I'm not bothered by the "violation of encapsulation" because the
violation is internal to the component, and the purpose is making the
implementation work smoothly.)

* is an interface class with all pure virtual functions
* is built by a factory method of TrajectoryReadManager based on the
file type (this is the only place where code that queries the file
type exists)
* implements the Strategy design pattern for reading trajectories in
different formats
* declares an interface that permits an implementation to support
reading only a header before deciding whether to read the body or skip
* is provided by TrajectoryReadManager with a handle to
a FileIOManager object
* has a shared_ptr to a PrimitiveFrame object, which is renewed for
each frame
* provides a getter so that TrajectoryReadManager can access the product
* may or may not have an acceptable class name!

Subclasses of TrajectoryReaderStrategy
* implement reading of particular trajectory formats (e.g. PDB, GRO,
* call the FileIOManager object to get chunks of data
* provide the logic for handling chunks of data in the context of the
intended format
* provide the logic for handling checks for consistency between frames
* throw exceptions about valid data formats
* fill the PrimitiveFrame with the supplied data (so, even if we're
doing parallel I/O and setting up a domain decomposition, this object
knows nothing about that)
* the TNG-reading class will have to have some way for client code
to get access to the arbitrary data it could (also) contain. Magnus,
there's your job at some point!

* is an interface class with all pure virtual functions
* declares pure virtual methods to open and close files, query file
status, and for reading chunks of data
* does not have an interface that client code can access (maybe? With
this, we inhibit crude hacks that read some new file format. If
there's no interface, people might get the idea they're not supposed
to want one.)

Subclasses of FileIOManager
* implement reading of files under particular protocols (e.g. stdio,
XDR, MPI-I/O, mock reads for tests)
* thow exceptions about I/O conditions

* is an interface class with all pure virtual functions
* has containers of data like t_trxframe (should it?)
* declares getters and setters for the contained data
* declares a method to fill itself with data from a PrimitiveFrame
(This provides a hook so that code that does the low-level reading
doesn't have to know what representation the client wants to use when
reading it. For example, if we were doing some kind of parallel I/O
that built a PrimitiveFrame object on each process such that each
contained some known fraction of the full trajectory frame, and then
we wanted to do some kind of domain decomposition on that, a subclass
of CoordinateFrame would provide inside this method the logic to do
the necessary re-organization and communication. Client code would
just instantiate an object of that subclass as the prototype for
TrajectoryReadManager object, and it calls this method without having
to know about the details. For serial I/O when the prototype is a
PrimitiveFrame, then the implementation will just do a shallow copy of
the shared_ptr.)
* declares a clone() method (so that a TrajectoryReaderManager object
can implement the Prototype design pattern)

* implements CoordinateFrame pretty much just like t_trxframe (for the
moment, anyway)
* is used as the intermediate container by TrajectoryReaderStrategy
* might have to have data copied if the final CoordinateFrame object
is not either a PrimitiveFrame or contains similar containers of
primitives, but that is the price of having low-level I/O code not
having to know about the final data format)
* has a static method to construct an empty PrimitiveFrame (e.g. for
use by client code as a prototype to pass to TrajectoryReadManager)

* would implement CoordinateFrame for some hypothetical
distributed-data analysis tool

* would maybe implement CoordinateFrame, but there's a lot of unique
infrastructure required to permit client code to get arbitrary data
that might not even be present