mdrun crashes with segmentation fault if started with more than 32 OpenMP threads.
I build GROMACS 2016.1 with CUDA 8 and GCC 5.4.0 using the following configure command:
CC=gcc CXX=g++ cmake ../gromacs-2016.1 -DGMX_GPU=ON -DGMX_BUILD_OWN_FFTW=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=...
With more than 32 OpenMP threads mdrun crashes on water/0000.96:
gmx mdrun -pin on -ntmpi 1 -ntomp 33 -nb gpu -noconfout -resethway -v -maxh 0.08333 -nsteps 100000 [0000.96]$ gmx mdrun -pin on -ntmpi 1 -ntomp 33 -nb gpu -noconfout -resethway -v -maxh 0.08333 -nsteps 100000 :-) GROMACS - gmx mdrun, 2016.1 (-: [...] snip starting mdrun 'Water' 100000 steps, 200.0 ps. Segmentation fault
With 32 threads it runs:
[0000.96]$ gmx mdrun -pin on -ntmpi 1 -ntomp 32 -nb gpu -noconfout -resethway -v -maxh 0.08333 -nsteps 100000 :-) GROMACS - gmx mdrun, 2016.1 (-: [...] snip Core t (s) Wall t (s) (%) Time: 437.159 13.661 3200.0 (ns/day) (hour/ns) Performance: 632.459 0.038 GROMACS reminds you: "Hangout In the Suburbs If You've Got the Guts" (Urban Dance Squad) [0000.96]$
Even it if does not make sense to use that many OpenMP threads mdrun should not crash.
Add bonded #thread runtime check
Replaced a debug assertion on the number of OpenMP threads not being
larger than GMX_OPENMP_MAX_THREADS by fatal error.
But since the listed forces reduction is actually not required with
listed forces, these are now conditional and mdrun can run with more
than GMX_OPENMP_MAX_THREADS threads.
#1 Updated by Szilárd Páll over 3 years ago
Indeed, it should not crash. By design
GMX_OPENMP_MAX_THREADS needs to be set at compile time to be able to use >32 threads/rank (this sets #bits used by the the masked sparse reduction in the bondeds). I'm not sure if the check broke recently or something else is wrong. ( IIRC this worked last time I tested on Power 8 a few weeks ago. )
#3 Updated by Jiri Kraus over 3 years ago
Berk Hess wrote:
I can't reproduce this with GMX_OPENMP_MAX_THREADS=64 on x86.
Could you run a debugger to see where it segfaults?
As Szilard said in comment #1 I would say its expected that the error does not reproduce with GMX_OPENMP_MAX_THREADS=64. I did not file this issue because I expected mdrun to work with more than 32 threads if GMX_OPENMP_MAX_THREADS is not changed. But if the number of OpenMP threads requested on the command line exceeds GMX_OPENMP_MAX_THREADS mdrun should either exit with a understandable error message or cap the number of OpenMP threads to GMX_OPENMP_MAX_THREADS and print a warning. Does that make sense?
#6 Updated by Berk Hess over 3 years ago
- Status changed from Accepted to Fix uploaded
- Target version set to 2016.2
I uploaded a fix that not only adds the check, but also skips the OpenMP bonded thread reduction for cases without bondeds. So now the water system can run on 33 threads with GMX_OPENMP_MAX_THREADS=32.