Rework TPR reading to allow reading of raw bytes from disk and communication of complete information at setup time
While reading the TPR file, the fields are currently populated one by one while the raw bytes are read from disk.
It would make sense to instead read the whole size of the TPR as a raw byte stream into a buffer and then populate the datastructures by reading from this buffer.
This would also allow the communication of the whole TPR as one message during simulation setup, with the individual nodes populating the fields from the buffer instead of communicating the fields individually.
To reach this point, the current TPR reading and writing code first needs to be adapted to use the serializer interface.
This can be followed by changes that introduces a new version of the TPR format that contains information about the total number of bytes to read in the TPR file header.
Old versions will still need to be read directly from disk to know the total number of bytes first before reading the information into the buffer.
A final change can then communicate the byte buffer instead of the individual fields.
Extend ISerializer functionality
Add datatypes to ISerializer that are needed to read TPR files but were
Split tpr header reading reading from tpr body
Split the low level functions for TPR file header and file body reading
into fully separate parts to allow reading the main part of the file
without having to read the header again.
Also gave the header datastructure a new name in line with naming
conventions and default initialized all fields.
Use ISerializer for TPR file IO
Change all the function calls in do_tpx and friends to use the
ISerializer instead of the previous t_fileio pointer.
This is intended to prepare for the change where the datastructures
get populated from a byte buffer instead of reading them from disk one
#4 Updated by Paul Bauer 2 months ago
It was decided to not focus on this for GROMACS 2020.
The reason is that it might happen that the allocations needed for communicating the full TPR and populating a dummy state might exceed limits of available memory on hardware that has a low memory per possible hardware thread. The default setting of launching one (thread-) MPI thread per available hardware thread could then lead to cases where not enough memory available to allocate all datastructures.
Testing with a system of 648000 water molecules shows that when using 4 threads, about 25 MB are needed for each thread during maximum memory load, before the transitional datastructures are deallocated again.