Task #2395
break up commrec
Description
t_commrec currently handles lots of things and goes lots of places. We should break it into pieces as suggested in the checklist.
This will make it much easier to deploy modules that might be exposed by an API, or implement standard interfaces for command-line options, etc.
Checklist
- multi-simulation handler, including communicator
- physical node communicator - removing various places where we make temporary ones
- DD aspects
- duty aspects - PP vs PME vs both
- DOMAINDECOMP should mostly be repalced by havePPDomainDecomposition
Subtasks
Related issues
Associated revisions
Document DOMAINDECOMP correctly
It is likely there are numerous cases where this is used mistakenly
when havePPDomainDecomposition expresses the real intent. If so,
runs with 1 PP and 1 PME rank may have buggy behaviour.
Refs #2395
Change-Id: I07be73a6c690887b3043140a2a78ae6fe6bb17f1
Require explicit MPI_COMM for gmx_bcast and gmx_barrier
This changes gmx_bcast and gmx_barrier to take the MPI communicator
explicitly instead of taking a pointer to t_commrec and using
mpi_comm_mygroup. This also allows to remove gmx_bcast_sim and leave
the responsibility of passing the right communicator to the caller.
This is a first step in breaking up t_commrec. These functions
are the subset of low-level networking functions which are used
before domain decomposition (and hence PP/PME ranks) is set up.
Refs #2395
Make init_dires independent of t_commrec
init_disres was requesting a full pointer to the commrec,
but only uses a single communicator and checks for master rank
and whether the run is parallel. This information is now passed
in explicitly, simplifying the planned splitting of t_commrec.
Note that passing a nullptr for commrec was (mis)used by
gmx_disre only - effectively signalling that init_disres was
called from an analysis tool and not from mdrun. This has
been made explicit.
Refs #2395
Make boxdeformation independent of t_commrec
boxdeformation requested a pointer to the full t_commrec,
but only used on communicator and information on whether
the current rank is master and whether the simulation
is run in parallel. This has been made explicit, simplifying
subsequent changes splitting up t_commrec.
Refs #2395
Divide default communicator from DD communicators
The communicators mpi_comm_mysim and mpi_comm_mygroup inside
t_commrec got initialized in init_commrec (to MPI_COMM_WORLD
if no multisim, to a subset otherwise). These communicators
were then used in subsequent setup work, before they got
reassigned during the construction of the DDBuilder object
and the construction of the actual domain decomposition object.
Effectively, this means that the same communicators (and, hence,
identical function calls) do very different things depending on
whether they get used before or after the setup of domain
decomposition. It also means that before DD set up, mpi_comm_mysim
and mpi_comm_mygroup are identical.
This change introduces an additional communicator within
t_commrec, mpiDefaulCommunicator, which helps to make these
implicit assumptions explicit. Consequently, this also redefines
PAR, MASTER, and SIMMASTER.
This change will allow to move the sim and group communicators,
which are now only created at DD time, into the DD object,
logically separating the DD object from t_commrec.
Refs #2395
Divide default communicator from DD communicators
The communicators mpi_comm_mysim and mpi_comm_mygroup inside
t_commrec got initialized in init_commrec (to MPI_COMM_WORLD
if no multisim, to a subset otherwise). These communicators
were then used in subsequent setup work, before they got
reassigned during the construction of the DDBuilder object
and the construction of the actual domain decomposition object.
Effectively, this means that the same communicators (and, hence,
identical function calls) do very different things depending on
whether they get used before or after the setup of domain
decomposition. It also means that before DD set up, mpi_comm_mysim
and mpi_comm_mygroup are identical.
This change introduces an additional communicator within
t_commrec, mpiDefaulCommunicator, which helps to make these
implicit assumptions explicit. Consequently, this also redefines
PAR, MASTER, and SIMMASTER.
This change will allow to move the sim and group communicators,
which are now only created at DD time, into the DD object,
logically separating the DD object from t_commrec.
Refs #2395
History
#1 Updated by Mark Abraham about 3 years ago
f746a4a4aedb76995 already started on this effort
#2 Updated by Gerrit Code Review Bot about 3 years ago
Gerrit received a related patchset '1' for Issue #2395.
Uploader: Mark Abraham (mark.j.abraham@gmail.com)
Change-Id: gromacs~master~I03af65805bd14515a0213d511ae8cdb627c2f05c
Gerrit URL: https://gerrit.gromacs.org/7531
#3 Updated by Mark Abraham almost 3 years ago
4868388f24ecee03d75d and 9a2e38a91c0621d2ecbf1 also made progress here
#4 Updated by Mark Abraham almost 3 years ago
- Description updated (diff)
- Status changed from New to In Progress
#5 Updated by Mark Abraham over 2 years ago
- Target version changed from 2019 to 2020
#6 Updated by Paul Bauer about 1 year ago
- Target version changed from 2020 to 2021-infrastructure-stable
#7 Updated by Mark Abraham about 1 year ago
I suspect most devs don't realise that DOMAINDECOMP(cr)
is true when there is 1 PP and 1 PME rank, so we should go through the uses and replace them by what is actually intended, which is mostly havePPDomainDecomposition.
And document DOMAINDECOMP(cr)
correctly.
#8 Updated by Mark Abraham about 1 year ago
Mark Abraham wrote:
I suspect most devs don't realise that
DOMAINDECOMP(cr)
is true when there is 1 PP and 1 PME rank, so we should go through the uses and replace them by what is actually intended, which is mostly havePPDomainDecomposition.And document
DOMAINDECOMP(cr)
correctly.
Eg #3241
#9 Updated by Mark Abraham about 1 year ago
- Related to Bug #3241: Bonded GPU kernel launched in the wrong stream with 1 PP + 1 PME rank added
Remove commrec from hardware detection
This is preparatory refactoring for aspects of #2395
The OpenCL logic was ineffective, because duty is not yet decided, and
anyway we might soon want the detection on PME-only ranks.
Replaced the thread-MPI single-rank assertion with a more direct
implementation.
Minimized contents of detecthardware.h
Refs #2395
Change-Id: I03af65805bd14515a0213d511ae8cdb627c2f05c