Project

General

Profile

Bug #1581

Small systems crash with -nb cpu on tcbs28

Added by Erik Lindahl about 5 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Another bug found when using Michael Shirt's ethanol system from the virginia tutorial. The system has 2694 atoms and works fine when setting the number of openMP threads automatically, or when using GPUs. However, if we force cpu execution with "-nb cpu", mdrun crashes with an error about atoms that have moved too far.

For this system and machine, Gromacs picked 16 MPI processes each with two OpenMP threads.

redmine_1581.tgz (112 KB) redmine_1581.tgz Erik Lindahl, 09/02/2014 12:17 PM

History

#1 Updated by Roland Schulz about 5 years ago

Could you attach a TPR? You say "works fine when setting the number of openMP threads automatically" and "Gromacs picked 16 MPI processes each with two OpenMP threads". If 2 is automatically picked and the automatically picked works, what number of OpenMP threads causes the problem?

#2 Updated by Mark Abraham about 5 years ago

  • Target version changed from 5.0.1 to 5.0.2

#3 Updated by Erik Lindahl about 5 years ago

Sorry, didn't have much time in Bangalore last week.

After creating The attached system crashes after one step when run (on tcbs28) with the output

Program mdrun, VERSION 5.0.1-dev-20140820-bad36c3
Source code file: /nethome/lindahl/Code/gmx/git/50/gromacs/src/gromacs/mdlib/pme.c, line: 754

Fatal error:
33 particles communicated to PME rank 11 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension y.

When run with the command:

"mdrun -v -deffnm test -nb cpu -nt 32 -ntomp 2"

When using "mdrun -v -deffnm test -nb cpu -nt 32 -ntomp 1" it works fine.

#4 Updated by Erik Lindahl about 5 years ago

Update: Sorry, the working command should be "mdrun -v -deffnm test -nb cpu -nt 16 -ntomp 1", so the same decomposition without OpenMP crashes.

#5 Updated by Erik Lindahl about 5 years ago

... and that should be that the decomposition combined with OpenMP crashes, but without OpenMP it seems to work fine ;-) I'll stay quiet now and hope I finally got everything right...

#6 Updated by Erik Lindahl about 5 years ago

  • Status changed from New to Closed
  • Target version changed from 5.0.2 to 5.0.1

Fixed by https://gerrit.gromacs.org/#/c/3896/ . See redmine #1578 for details.

Also available in: Atom PDF