Improving serialization of data structures prior to communication
I was recently irked by the inability to do a reliable exact restart of a parallel sd simulation using the V-rescale thermostat, because the state of the RNG is not preserved properly. Both use the same RNG. Despite both requiring Gaussian random numbers, the former algorithm uses single numbers from the RNG via a table lookup, and the latter uses the usual numerical trick to convert pairs of uniform random numbers into pairs of Gaussians. Whether the other member of a pair has been used is not preserved across restarts.
Also, GROMACS can't do stochastic dynamics that is "in-principle" reproducible on different numbers of processors, because of the way we initialize the RNGs on different processors using the seed + processor number. That's going to make a sensible test set hard.
Also, REMD is not reproducible across restarts, because the RNG state is not preserved at all.
I'm not sure to what extent the above effects interact. None affects accuracy - merely reproducibility.
I suspect the problems stem in part from the way we do communication of dynamically allocated structures (e.g. in src/gmxlib/mvdata.c or the PD and DD codes). We do lots and lots of separate communication calls, when what is really wanted is a way to serialize a data structure properly before communicating. If we can do the latter, then we'll avoid eventually getting parallel performance affected by the former.
Fixing the above two minor issues isn't really worth doing until we have a proper C++ communication library organized. However, because I want it for another project, I've implemented a temporary C version off release-4-5-patches. I expect that it will convert to C++ fairly readily once the landscape becomes more clear to me as a non-central developer... :) It can be found as branch gmx_packed_t in the main repo.
First draft of gmx_packed_t
This new mechanism serializes dynamically-allocated data structures
using MPI_Pack in a way that allows for minimal calls to the
communication library without the developer needing to know too
much about the details of the packing.
This will allow for better implementations of some areas of the code
that are currently awkward - such as preserving states of RNGs.
#2 Updated by Teemu Murtola over 7 years ago
- Tracker changed from Feature to Task
- Category deleted (
- Target version set to 5.0
Any ideas from here should be included as part of the implementation/discussion for #996, as the intent is very similar. Marked that task as a related task and set the target version as 5.0.