Project

General

Profile

Bug #2540

Thread-MPI error in GROMACS-2018

Added by Siva Dasetty 11 months ago. Updated 10 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Hello,

I have come across an error that causes GROMACS (2018/2018.1) to crash. The message is:

"tMPI error: Receive buffer size too small for transmission (in valid comm)
Aborted"

The error seems to only occur immediately following a LINCS or SETTLE warning. The error is reproducible across different systems. A simple example system is running an energy minimization on a box of 1000 rigid TIP4P/Ice water molecules generated with gmx solvate. When SETTLE is used as the constraint algorithm, there are several SETTLE warnings in the early steps of the energy minimization, and GROMACS will crash with the above error message. If I replace SETTLE with LINCS, GROMACS crashes with the same error message following a LINCS warning. Other systems that have produced this error are -OH terminated self assembled monolayer surfaces (h-bonds constrained by LINCS), and mica surfaces (h-bonds constrained by LINCS). Naturally, reducing -ntmpi to 1 eliminates the error for all cases.

The problem does appear to be hardware dependent. Specifically, the tested node(s) on the cluster contains K20/K40 GPUs with Intel Xeon E5-2680v3 processor (20/24 cores). I used GCC/5.4.0 and CUDA/8.0.44 compilers for installing GROMACS. An installation on my desktop machine with with very similar options does not have the thread MPI error.

Example of procedure that causes error:
  1. Node contains 24 cores and 2 K40 GPUs
    gmx solvate -cs tip4p -o box.gro -box 3.2 3.2 3.2 -maxsol 1000
    gmx grompp -f em.mdp -c box.gro -p tip4pice.top -o em
    export OMP_NUM_THREADS=6
    gmx mdrun -v -deffnm em -ntmpi 4 -ntomp 6 -pin on

Attached are the relevant topology (tip4pice.top), mdp (em.mdp), tpr (em.tpr), and log (em.log) files. In addition tip4p.gro and box.gro files are included.

Thanks in advance for any ideas as to what might be causing this problem,
Siva Dasetty

tip4p.gro (58.3 KB) tip4p.gro input .gro file used in gmx solvate. Siva Dasetty, 06/01/2018 05:36 PM
box.gro (176 KB) box.gro .gro file obtained with gmx solvate. Siva Dasetty, 06/01/2018 05:36 PM
em.log (19.7 KB) em.log .log file obtained during energy minimization. Siva Dasetty, 06/01/2018 05:36 PM
tip4pice.top (1.28 KB) tip4pice.top TIP4P/Ice topology file. Siva Dasetty, 06/01/2018 05:36 PM
em.mdp (481 Bytes) em.mdp energy minimization parameter file. Siva Dasetty, 06/01/2018 05:36 PM
em.tpr (96.3 KB) em.tpr .tpr file (energy minimization of TIP4P/Ice water) Siva Dasetty, 06/01/2018 05:36 PM

Associated revisions

Revision dce23f77 (diff)
Added by Berk Hess 11 months ago

Fix MPI inconsistency in EM after constraint failure

Fixes issue #2540

Change-Id: Id18c17af82f80917388c11fc776b79bf4966a4ac

History

#1 Updated by Gerrit Code Review Bot 11 months ago

Gerrit received a related patchset '1' for Issue #2540.
Uploader: Berk Hess ()
Change-Id: gromacs~release-2018~Id18c17af82f80917388c11fc776b79bf4966a4ac
Gerrit URL: https://gerrit.gromacs.org/7979

#2 Updated by Berk Hess 11 months ago

  • Category set to mdrun
  • Status changed from New to Fix uploaded
  • Assignee set to Berk Hess
  • Target version set to 2018.2

#3 Updated by Berk Hess 10 months ago

  • Status changed from Fix uploaded to Resolved

#4 Updated by Mark Abraham 10 months ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF