Project

General

Profile

Task #2395

break up commrec

Added by Mark Abraham over 2 years ago. Updated 10 months ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
mdrun
Difficulty:
uncategorized
Close

Description

t_commrec currently handles lots of things and goes lots of places. We should break it into pieces as suggested in the checklist.

This will make it much easier to deploy modules that might be exposed by an API, or implement standard interfaces for command-line options, etc.


Checklist

  • multi-simulation handler, including communicator
  • physical node communicator - removing various places where we make temporary ones
  • DD aspects
  • duty aspects - PP vs PME vs both
  • DOMAINDECOMP should mostly be repalced by havePPDomainDecomposition

Subtasks

Feature #3307: General interface for communication between simulation ranksNew

Related issues

Related to GROMACS - Bug #3241: Bonded GPU kernel launched in the wrong stream with 1 PP + 1 PME rankClosed

Associated revisions

Revision f8937dc1 (diff)
Added by Mark Abraham over 2 years ago

Remove commrec from hardware detection

This is preparatory refactoring for aspects of #2395

The OpenCL logic was ineffective, because duty is not yet decided, and
anyway we might soon want the detection on PME-only ranks.

Replaced the thread-MPI single-rank assertion with a more direct
implementation.

Minimized contents of detecthardware.h

Refs #2395

Change-Id: I03af65805bd14515a0213d511ae8cdb627c2f05c

Revision 65aaa064 (diff)
Added by Mark Abraham 10 months ago

Document DOMAINDECOMP correctly

It is likely there are numerous cases where this is used mistakenly
when havePPDomainDecomposition expresses the real intent. If so,
runs with 1 PP and 1 PME rank may have buggy behaviour.

Refs #2395

Change-Id: I07be73a6c690887b3043140a2a78ae6fe6bb17f1

Revision 59e622e1 (diff)
Added by Pascal Merz 6 months ago

Require explicit MPI_COMM for gmx_bcast and gmx_barrier

This changes gmx_bcast and gmx_barrier to take the MPI communicator
explicitly instead of taking a pointer to t_commrec and using
mpi_comm_mygroup. This also allows to remove gmx_bcast_sim and leave
the responsibility of passing the right communicator to the caller.

This is a first step in breaking up t_commrec. These functions
are the subset of low-level networking functions which are used
before domain decomposition (and hence PP/PME ranks) is set up.

Refs #2395

Revision 0a48bcd0 (diff)
Added by Pascal Merz 5 months ago

Make init_dires independent of t_commrec

init_disres was requesting a full pointer to the commrec,
but only uses a single communicator and checks for master rank
and whether the run is parallel. This information is now passed
in explicitly, simplifying the planned splitting of t_commrec.

Note that passing a nullptr for commrec was (mis)used by
gmx_disre only - effectively signalling that init_disres was
called from an analysis tool and not from mdrun. This has
been made explicit.

Refs #2395

Revision ff03803b (diff)
Added by Pascal Merz 5 months ago

Make boxdeformation independent of t_commrec

boxdeformation requested a pointer to the full t_commrec,
but only used on communicator and information on whether
the current rank is master and whether the simulation
is run in parallel. This has been made explicit, simplifying
subsequent changes splitting up t_commrec.

Refs #2395

Revision 5734027b (diff)
Added by Pascal Merz 3 months ago

Divide default communicator from DD communicators

The communicators mpi_comm_mysim and mpi_comm_mygroup inside
t_commrec got initialized in init_commrec (to MPI_COMM_WORLD
if no multisim, to a subset otherwise). These communicators
were then used in subsequent setup work, before they got
reassigned during the construction of the DDBuilder object
and the construction of the actual domain decomposition object.
Effectively, this means that the same communicators (and, hence,
identical function calls) do very different things depending on
whether they get used before or after the setup of domain
decomposition. It also means that before DD set up, mpi_comm_mysim
and mpi_comm_mygroup are identical.

This change introduces an additional communicator within
t_commrec, mpiDefaulCommunicator, which helps to make these
implicit assumptions explicit. Consequently, this also redefines
PAR, MASTER, and SIMMASTER.

This change will allow to move the sim and group communicators,
which are now only created at DD time, into the DD object,
logically separating the DD object from t_commrec.

Refs #2395

Revision e0f481ae (diff)
Added by Mark Abraham 2 months ago

Divide default communicator from DD communicators

The communicators mpi_comm_mysim and mpi_comm_mygroup inside
t_commrec got initialized in init_commrec (to MPI_COMM_WORLD
if no multisim, to a subset otherwise). These communicators
were then used in subsequent setup work, before they got
reassigned during the construction of the DDBuilder object
and the construction of the actual domain decomposition object.
Effectively, this means that the same communicators (and, hence,
identical function calls) do very different things depending on
whether they get used before or after the setup of domain
decomposition. It also means that before DD set up, mpi_comm_mysim
and mpi_comm_mygroup are identical.

This change introduces an additional communicator within
t_commrec, mpiDefaulCommunicator, which helps to make these
implicit assumptions explicit. Consequently, this also redefines
PAR, MASTER, and SIMMASTER.

This change will allow to move the sim and group communicators,
which are now only created at DD time, into the DD object,
logically separating the DD object from t_commrec.

Refs #2395

History

#1 Updated by Mark Abraham over 2 years ago

f746a4a4aedb76995 already started on this effort

#2 Updated by Gerrit Code Review Bot over 2 years ago

Gerrit received a related patchset '1' for Issue #2395.
Uploader: Mark Abraham ()
Change-Id: gromacs~master~I03af65805bd14515a0213d511ae8cdb627c2f05c
Gerrit URL: https://gerrit.gromacs.org/7531

#3 Updated by Mark Abraham over 2 years ago

4868388f24ecee03d75d and 9a2e38a91c0621d2ecbf1 also made progress here

#4 Updated by Mark Abraham over 2 years ago

  • Description updated (diff)
  • Status changed from New to In Progress

#5 Updated by Mark Abraham about 2 years ago

  • Target version changed from 2019 to 2020

#6 Updated by Paul Bauer 10 months ago

  • Target version changed from 2020 to 2021-infrastructure-stable

#7 Updated by Mark Abraham 10 months ago

I suspect most devs don't realise that DOMAINDECOMP(cr) is true when there is 1 PP and 1 PME rank, so we should go through the uses and replace them by what is actually intended, which is mostly havePPDomainDecomposition.

And document DOMAINDECOMP(cr) correctly.

#8 Updated by Mark Abraham 10 months ago

Mark Abraham wrote:

I suspect most devs don't realise that DOMAINDECOMP(cr) is true when there is 1 PP and 1 PME rank, so we should go through the uses and replace them by what is actually intended, which is mostly havePPDomainDecomposition.

And document DOMAINDECOMP(cr) correctly.

Eg #3241

#9 Updated by Mark Abraham 10 months ago

  • Related to Bug #3241: Bonded GPU kernel launched in the wrong stream with 1 PP + 1 PME rank added

Also available in: Atom PDF