Task #2395: break up commrec
General interface for communication between simulation ranks
Currently, communication between different ranks of a simulation is handled differently depending on at what stage the communication is done, and what is being communicated.
There are also several different communicator objects, with most of them being part of the legacy t_commrec datastructure that is used as a catch all solution and passed to methods that need to communicate data between ranks.
It is proposed to use a general class interface for this communication, with the individual communicators being specialized for their needs in terms of construction and communication behaviour.
A different approach could be to have just the minimal methods that are the same for all communicators, with the differences only done during construction time.
#1 Updated by Mark Abraham about 1 month ago
MpiCommunicator type to manage the lifetime and very simple use of an MPI_Comm makes sense. Most of the current
PhysicalNodeCommunicator is a good model for that. Other objects/functions with higher-level functionality should own one of these, not derive from one of those. Prefer composition to inheritance: https://en.wikipedia.org/wiki/Composition_over_inheritance.
PhysicalNodeCommunicator should own an
MpiCommunicator (and hopefully encapsulate it, so that clients don't need to see the MPI_Comm type) and take responsibility for organizing the API call whose return is kept in the
MpiCommunicator. It should also have a high-level method for e.g.
int numThreadsOnThisNode(int numThreadsOnThisRank), whose return value is used to implement
checkHardwareOversubscription. It should probably have all the behaviours of
findAllGpuTasksOnThisNode. It should probably have a method to replace
analyzeThreadsOnThisNode (and maybe whatever calls that). How to rework
GpuTaskAssignmentsBuilder::build is a bit harder to see offhand. It should probably have members that are non-owning handles to device info object(s). By now, the name of the composite object is more like
NodeTaskOrganizer which helps make clear that such a thing should have an
MpiCommunicator, not be one.
Helper (template) methods like
gatherv() etc could be methods of
MpiCommunicator, so that we get to re-use them when building