Project

General

Profile

Bug #2403

Multidir sim with #mpiranks = 2*#sim fails

Added by Christoph Junghans over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Running a twofold multidir sim on 4 mpi ranks - something like:

mpirun -np 4 /usr/lib64/openmpi/bin/mdrun_openmpi_d -s topol.tpr -c confout.gro -o traj.trr -x traj.xtc -multidir sim1 sim2 -nsteps 500 -v

fails with
*** An error occurred in MPI_Barrier

while
mpirun -np 2 /usr/lib64/openmpi/bin/mdrun_openmpi_d -s topol.tpr -c confout.gro -o traj.trr -x traj.xtc -multidir sim1 sim2 -nsteps 500 -v

works!
(Note: Using same tpr files for all sims for testing)

Additionally, it prints the following confusing message:

No option -multi

I believe all of this worked before the -multi option was removed.


Related issues

Related to GROMACS - Task #2425: testing multisim with multiple ranks per simulationNew
Has duplicate GROMACS - Bug #2414: -multidir not working due to erroneous use of MPI_Comm_createClosed

Associated revisions

Revision fa3593ec (diff)
Added by Berk Hess over 1 year ago

Correct multisim MPI barrier

An MPI barrier used with multisim was called on incorrect ranks
and a barrier was missing.

Fixes #2403

Change-Id: I06a66167afc8dca9cdd4ca9b4f9a806984a6ec7a

Revision 6358ce73 (diff)
Added by Berk Hess over 1 year ago

Fix mdrun multisim with separate PME ranks

When running multiple simulations using separate PME ranks,
mdrun would call MPI_Barrier with MPI_COMM_NULL on some ranks.
Note: no new release note needed, since this fixes a fix.

Refs #2403
Fixes #2431
Fixes #2432

Change-Id: I7fe96a15b5030dea3e093d12850f89f00ccc9f48

History

#1 Updated by Mark Abraham over 1 year ago

The -multi option is still present in 2018. The print issue is fixed in release-2018 already. I don't know whether your case works, but an almost identical integration test does run, using -multi. That case has been ported to -multisim in master, so I don't know why this might arise.

#2 Updated by Gerrit Code Review Bot over 1 year ago

Gerrit received a related patchset '1' for Issue #2403.
Uploader: Berk Hess ()
Change-Id: gromacs~release-2018~I06a66167afc8dca9cdd4ca9b4f9a806984a6ec7a
Gerrit URL: https://gerrit.gromacs.org/7549

#3 Updated by Christoph Junghans over 1 year ago

Mark Abraham wrote:

The -multi option is still present in 2018. The print issue is fixed in release-2018 already.

Thanks, it was #2377.

#4 Updated by Mark Abraham over 1 year ago

Mark Abraham wrote:

I don't know whether your case works, but an almost identical integration test does run, using -multi. That case has been ported to -multisim in master, so I don't know why this might arise.

Those tests use a single rank per simulation, so didn't hit the barrier in the problematic case. I am working on better test coverage for master branch

#5 Updated by Mark Abraham over 1 year ago

  • Status changed from New to In Progress

#6 Updated by Berk Hess over 1 year ago

  • Category set to mdrun
  • Status changed from In Progress to Fix uploaded
  • Target version set to 2018.1

#7 Updated by Berk Hess over 1 year ago

  • Status changed from Fix uploaded to Resolved

#8 Updated by Mark Abraham over 1 year ago

  • Has duplicate Bug #2414: -multidir not working due to erroneous use of MPI_Comm_create added

#9 Updated by Mark Abraham over 1 year ago

  • Status changed from Resolved to Closed

#10 Updated by Mark Abraham over 1 year ago

  • Related to Task #2425: testing multisim with multiple ranks per simulation added

#11 Updated by Gerrit Code Review Bot over 1 year ago

Gerrit received a related patchset '1' for Issue #2403.
Uploader: Berk Hess ()
Change-Id: gromacs~release-2018~I7fe96a15b5030dea3e093d12850f89f00ccc9f48
Gerrit URL: https://gerrit.gromacs.org/7632

Also available in: Atom PDF