Bug #1581
Small systems crash with -nb cpu on tcbs28
Description
Another bug found when using Michael Shirt's ethanol system from the virginia tutorial. The system has 2694 atoms and works fine when setting the number of openMP threads automatically, or when using GPUs. However, if we force cpu execution with "-nb cpu", mdrun crashes with an error about atoms that have moved too far.
For this system and machine, Gromacs picked 16 MPI processes each with two OpenMP threads.
History
#1 Updated by Roland Schulz over 6 years ago
Could you attach a TPR? You say "works fine when setting the number of openMP threads automatically" and "Gromacs picked 16 MPI processes each with two OpenMP threads". If 2 is automatically picked and the automatically picked works, what number of OpenMP threads causes the problem?
#2 Updated by Mark Abraham over 6 years ago
- Target version changed from 5.0.1 to 5.0.2
#3 Updated by Erik Lindahl over 6 years ago
- File redmine_1581.tgz redmine_1581.tgz added
Sorry, didn't have much time in Bangalore last week.
After creating The attached system crashes after one step when run (on tcbs28) with the output
Program mdrun, VERSION 5.0.1-dev-20140820-bad36c3
Source code file: /nethome/lindahl/Code/gmx/git/50/gromacs/src/gromacs/mdlib/pme.c, line: 754
Fatal error:
33 particles communicated to PME rank 11 are more than 2/3 times the cut-off out of the domain decomposition cell of their charge group in dimension y.
When run with the command:
"mdrun -v -deffnm test -nb cpu -nt 32 -ntomp 2"
When using "mdrun -v -deffnm test -nb cpu -nt 32 -ntomp 1" it works fine.
#4 Updated by Erik Lindahl over 6 years ago
Update: Sorry, the working command should be "mdrun -v -deffnm test -nb cpu -nt 16 -ntomp 1", so the same decomposition without OpenMP crashes.
#5 Updated by Erik Lindahl over 6 years ago
... and that should be that the decomposition combined with OpenMP crashes, but without OpenMP it seems to work fine ;-) I'll stay quiet now and hope I finally got everything right...
#6 Updated by Erik Lindahl over 6 years ago
- Status changed from New to Closed
- Target version changed from 5.0.2 to 5.0.1
Fixed by https://gerrit.gromacs.org/#/c/3896/ . See redmine #1578 for details.