Thread-MPI error in GROMACS-2018
I have come across an error that causes GROMACS (2018/2018.1) to crash. The message is:
"tMPI error: Receive buffer size too small for transmission (in valid comm)
The error seems to only occur immediately following a LINCS or SETTLE warning. The error is reproducible across different systems. A simple example system is running an energy minimization on a box of 1000 rigid TIP4P/Ice water molecules generated with gmx solvate. When SETTLE is used as the constraint algorithm, there are several SETTLE warnings in the early steps of the energy minimization, and GROMACS will crash with the above error message. If I replace SETTLE with LINCS, GROMACS crashes with the same error message following a LINCS warning. Other systems that have produced this error are -OH terminated self assembled monolayer surfaces (h-bonds constrained by LINCS), and mica surfaces (h-bonds constrained by LINCS). Naturally, reducing -ntmpi to 1 eliminates the error for all cases.
The problem does appear to be hardware dependent. Specifically, the tested node(s) on the cluster contains K20/K40 GPUs with Intel Xeon E5-2680v3 processor (20/24 cores). I used GCC/5.4.0 and CUDA/8.0.44 compilers for installing GROMACS. An installation on my desktop machine with with very similar options does not have the thread MPI error.Example of procedure that causes error:
- Node contains 24 cores and 2 K40 GPUs
gmx solvate -cs tip4p -o box.gro -box 3.2 3.2 3.2 -maxsol 1000
gmx grompp -f em.mdp -c box.gro -p tip4pice.top -o em
gmx mdrun -v -deffnm em -ntmpi 4 -ntomp 6 -pin on
Attached are the relevant topology (tip4pice.top), mdp (em.mdp), tpr (em.tpr), and log (em.log) files. In addition tip4p.gro and box.gro files are included.
Thanks in advance for any ideas as to what might be causing this problem,