Project

General

Profile

Task #2971

Rework TPR reading to allow reading of raw bytes from disk and communication of complete information at setup time

Added by Paul Bauer 3 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
core library
Target version:
-
Difficulty:
hard
Close

Description

While reading the TPR file, the fields are currently populated one by one while the raw bytes are read from disk.
It would make sense to instead read the whole size of the TPR as a raw byte stream into a buffer and then populate the datastructures by reading from this buffer.
This would also allow the communication of the whole TPR as one message during simulation setup, with the individual nodes populating the fields from the buffer instead of communicating the fields individually.

To reach this point, the current TPR reading and writing code first needs to be adapted to use the serializer interface.
This can be followed by changes that introduces a new version of the TPR format that contains information about the total number of bytes to read in the TPR file header.
Old versions will still need to be read directly from disk to know the total number of bytes first before reading the information into the buffer.
A final change can then communicate the byte buffer instead of the individual fields.


Related issues

Related to GROMACS - Task #1729: Resolve whether and how to resolve "state" variables stored in .tprNew05/12/2015

Associated revisions

Revision e887207f (diff)
Added by Paul Bauer 2 months ago

Extend ISerializer functionality

Add datatypes to ISerializer that are needed to read TPR files but were
missing.

Refs #2971

Change-Id: I5d5e7f1f91c533a079cb287b018fa1d579c4f3f9

Revision 21d45dd2 (diff)
Added by Paul Bauer 2 months ago

Split tpr header reading reading from tpr body

Split the low level functions for TPR file header and file body reading
into fully separate parts to allow reading the main part of the file
without having to read the header again.

Also gave the header datastructure a new name in line with naming
conventions and default initialized all fields.

Refs #2971

Change-Id: I110fb80cf19d9d2e59df1576e50c64806f532e00

Revision 1e6316f4 (diff)
Added by Paul Bauer 2 months ago

Use ISerializer for TPR file IO

Change all the function calls in do_tpx and friends to use the
ISerializer instead of the previous t_fileio pointer.
This is intended to prepare for the change where the datastructures
get populated from a byte buffer instead of reading them from disk one
by one.

Refs #2971

Change-Id: I9c2d51c4af0cad5a14da7026d58ecbe053e8efb7

History

#1 Updated by Paul Bauer 3 months ago

  • Related to Task #1729: Resolve whether and how to resolve "state" variables stored in .tpr added

#2 Updated by Paul Bauer 3 months ago

  • Private changed from Yes to No

#3 Updated by Mark Abraham 3 months ago

This will also remove the need to serialize the string symbol table. And make aspects of the code easier testable and faster to test.

#4 Updated by Paul Bauer 2 months ago

It was decided to not focus on this for GROMACS 2020.
The reason is that it might happen that the allocations needed for communicating the full TPR and populating a dummy state might exceed limits of available memory on hardware that has a low memory per possible hardware thread. The default setting of launching one (thread-) MPI thread per available hardware thread could then lead to cases where not enough memory available to allocate all datastructures.

Testing with a system of 648000 water molecules shows that when using 4 threads, about 25 MB are needed for each thread during maximum memory load, before the transitional datastructures are deallocated again.

Also available in: Atom PDF