Feature #3307

Task #2395: break up commrec

General interface for communication between simulation ranks

Added by Paul Bauer 12 months ago. Updated 12 months ago.

core library
Target version:


Currently, communication between different ranks of a simulation is handled differently depending on at what stage the communication is done, and what is being communicated.
There are also several different communicator objects, with most of them being part of the legacy t_commrec datastructure that is used as a catch all solution and passed to methods that need to communicate data between ranks.

It is proposed to use a general class interface for this communication, with the individual communicators being specialized for their needs in terms of construction and communication behaviour.

A different approach could be to have just the minimal methods that are the same for all communicators, with the differences only done during construction time.


#1 Updated by Mark Abraham 12 months ago

An MpiCommunicator type to manage the lifetime and very simple use of an MPI_Comm makes sense. Most of the current PhysicalNodeCommunicator is a good model for that. Other objects/functions with higher-level functionality should own one of these, not derive from one of those. Prefer composition to inheritance:

PhysicalNodeCommunicator should own an MpiCommunicator (and hopefully encapsulate it, so that clients don't need to see the MPI_Comm type) and take responsibility for organizing the API call whose return is kept in the MpiCommunicator. It should also have a high-level method for e.g. int numThreadsOnThisNode(int numThreadsOnThisRank), whose return value is used to implement checkHardwareOversubscription. It should probably have all the behaviours of findAllGpuTasksOnThisNode. It should probably have a method to replace analyzeThreadsOnThisNode (and maybe whatever calls that). How to rework GpuTaskAssignmentsBuilder::build is a bit harder to see offhand. It should probably have members that are non-owning handles to device info object(s). By now, the name of the composite object is more like NodeTaskOrganizer which helps make clear that such a thing should have an MpiCommunicator, not be one.

Helper (template) methods like barrier(), gatherv() etc could be methods of MpiCommunicator, so that we get to re-use them when building MultiSimCommunicator and PpCommunicator, etc.

Also available in: Atom PDF