Bug #2403
Multidir sim with #mpiranks = 2*#sim fails
Description
Running a twofold multidir sim on 4 mpi ranks - something like:
mpirun -np 4 /usr/lib64/openmpi/bin/mdrun_openmpi_d -s topol.tpr -c confout.gro -o traj.trr -x traj.xtc -multidir sim1 sim2 -nsteps 500 -v
fails with
*** An error occurred in MPI_Barrier
while
mpirun -np 2 /usr/lib64/openmpi/bin/mdrun_openmpi_d -s topol.tpr -c confout.gro -o traj.trr -x traj.xtc -multidir sim1 sim2 -nsteps 500 -v
works!
(Note: Using same tpr files for all sims for testing)
Additionally, it prints the following confusing message:
No option -multi
I believe all of this worked before the -multi option was removed.
Related issues
Associated revisions
Fix mdrun multisim with separate PME ranks
When running multiple simulations using separate PME ranks,
mdrun would call MPI_Barrier with MPI_COMM_NULL on some ranks.
Note: no new release note needed, since this fixes a fix.
Refs #2403
Fixes #2431
Fixes #2432
Change-Id: I7fe96a15b5030dea3e093d12850f89f00ccc9f48
History
#1 Updated by Mark Abraham almost 3 years ago
The -multi option is still present in 2018. The print issue is fixed in release-2018 already. I don't know whether your case works, but an almost identical integration test does run, using -multi. That case has been ported to -multisim in master, so I don't know why this might arise.
#2 Updated by Gerrit Code Review Bot almost 3 years ago
Gerrit received a related patchset '1' for Issue #2403.
Uploader: Berk Hess (hess@kth.se)
Change-Id: gromacs~release-2018~I06a66167afc8dca9cdd4ca9b4f9a806984a6ec7a
Gerrit URL: https://gerrit.gromacs.org/7549
#3 Updated by Christoph Junghans almost 3 years ago
Mark Abraham wrote:
The -multi option is still present in 2018. The print issue is fixed in release-2018 already.
Thanks, it was #2377.
#4 Updated by Mark Abraham almost 3 years ago
Mark Abraham wrote:
I don't know whether your case works, but an almost identical integration test does run, using -multi. That case has been ported to -multisim in master, so I don't know why this might arise.
Those tests use a single rank per simulation, so didn't hit the barrier in the problematic case. I am working on better test coverage for master branch
#5 Updated by Mark Abraham almost 3 years ago
- Status changed from New to In Progress
#6 Updated by Berk Hess almost 3 years ago
- Category set to mdrun
- Status changed from In Progress to Fix uploaded
- Target version set to 2018.1
#7 Updated by Berk Hess almost 3 years ago
- Status changed from Fix uploaded to Resolved
Applied in changeset fa3593ec6b33ee61eaa51938891dadc6435a0534.
#8 Updated by Mark Abraham almost 3 years ago
- Has duplicate Bug #2414: -multidir not working due to erroneous use of MPI_Comm_create added
#9 Updated by Mark Abraham almost 3 years ago
- Status changed from Resolved to Closed
#10 Updated by Mark Abraham almost 3 years ago
- Related to Task #2425: testing multisim with multiple ranks per simulation added
#11 Updated by Gerrit Code Review Bot almost 3 years ago
Gerrit received a related patchset '1' for Issue #2403.
Uploader: Berk Hess (hess@kth.se)
Change-Id: gromacs~release-2018~I7fe96a15b5030dea3e093d12850f89f00ccc9f48
Gerrit URL: https://gerrit.gromacs.org/7632
Correct multisim MPI barrier
An MPI barrier used with multisim was called on incorrect ranks
and a barrier was missing.
Fixes #2403
Change-Id: I06a66167afc8dca9cdd4ca9b4f9a806984a6ec7a