Project

General

Profile

Bug #2432

mdrun multidir fails with separate PME ranks

Added by Viveca Lindahl almost 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

I am running gmx mdrun -multidir dir1 dir2 on the supercomputer Beskow at PDC. On one node (sbatch -N 1 -n 32) this works. If I double the number of nodes (sbatch -N 2 -n 64) I get an error

Rank 3 [Wed Feb 28 20:43:27 2018] [c4-1c0s0n0] Fatal error in PMPI_Barrier: Invalid communicator, error stack:
PMPI_Barrier(439): MPI_Barrier(MPI_COMM_NULL) failed
PMPI_Barrier(400): Null communicator

The slurm output and the mdrun log files are in the attached archive.

multidir-multinode-debug.tgz (1.16 MB) multidir-multinode-debug.tgz Archive with runs using different numbers (N=1, 2) of compute nodes Viveca Lindahl, 02/28/2018 08:49 PM

Associated revisions

Revision 6358ce73 (diff)
Added by Berk Hess almost 2 years ago

Fix mdrun multisim with separate PME ranks

When running multiple simulations using separate PME ranks,
mdrun would call MPI_Barrier with MPI_COMM_NULL on some ranks.
Note: no new release note needed, since this fixes a fix.

Refs #2403
Fixes #2431
Fixes #2432

Change-Id: I7fe96a15b5030dea3e093d12850f89f00ccc9f48

History

#1 Updated by Berk Hess almost 2 years ago

  • Status changed from New to Resolved
  • Target version set to 2018.1
  • Affected version changed from 2018.1 to 2018

Duplicate of #2403

#2 Updated by Viveca Lindahl almost 2 years ago

Berk Hess wrote:

Duplicate of #2403

Note that I in these tests use a binary (GROMACS version: 2018.1-dev-20180227-8223d564e) that includes the related fix...?

#3 Updated by Berk Hess almost 2 years ago

  • Subject changed from mdrun multidir fails when using 2 instead of 1 nodes to mdrun multidir fails with separate PME ranks
  • Status changed from Resolved to Accepted

I didn't realize you were using the latest version of release-2018.
I think this issue is triggered with separate PME ranks. Our MASTER macro is really bugprone.
Refs #2403

#4 Updated by Gerrit Code Review Bot almost 2 years ago

Gerrit received a related patchset '1' for Issue #2432.
Uploader: Berk Hess ()
Change-Id: gromacs~release-2018~I7fe96a15b5030dea3e093d12850f89f00ccc9f48
Gerrit URL: https://gerrit.gromacs.org/7632

#5 Updated by Berk Hess almost 2 years ago

  • Status changed from Accepted to Fix uploaded
  • Assignee set to Berk Hess

#6 Updated by Viveca Lindahl almost 2 years ago

Berk Hess wrote:

I didn't realize you were using the latest version of release-2018.
I think this issue is triggered with separate PME ranks. Our MASTER macro is really bugprone.
Refs #2403

I just verified that setting -npme 0 avoids the error.

#7 Updated by Berk Hess almost 2 years ago

  • Status changed from Fix uploaded to Resolved

#8 Updated by Mark Abraham almost 2 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF