mdrun multidir fails with separate PME ranks
I am running
gmx mdrun -multidir dir1 dir2 on the supercomputer Beskow at PDC. On one node (
sbatch -N 1 -n 32) this works. If I double the number of nodes (
sbatch -N 2 -n 64) I get an error
Rank 3 [Wed Feb 28 20:43:27 2018] [c4-1c0s0n0] Fatal error in PMPI_Barrier: Invalid communicator, error stack: PMPI_Barrier(439): MPI_Barrier(MPI_COMM_NULL) failed PMPI_Barrier(400): Null communicator
The slurm output and the mdrun log files are in the attached archive.
Fix mdrun multisim with separate PME ranks
When running multiple simulations using separate PME ranks,
mdrun would call MPI_Barrier with MPI_COMM_NULL on some ranks.
Note: no new release note needed, since this fixes a fix.
#3 Updated by Berk Hess over 2 years ago
- Subject changed from mdrun multidir fails when using 2 instead of 1 nodes to mdrun multidir fails with separate PME ranks
- Status changed from Resolved to Accepted
I didn't realize you were using the latest version of release-2018.
I think this issue is triggered with separate PME ranks. Our MASTER macro is really bugprone.