confusing error message when OMP_NUM_THREADS is used with GPUs
The r2018 code does not allow setting only the OpenMP thread count in a GPU run (in tMPI builds), but as the OpenMP thread count handling was changed and part of the reporting seems short-circuited (the env var-related reporting from the
omp_nthreads module does not happen), this leads to potentially confusing error messages that lack context.
$ OMP_NUM_THREADS=2 gmx mdrun -nsteps 0
GROMACS: gmx mdrun, version 2018 Executable: /opt/tcbsys/gromacs/2018/AVX2_256/bin/gmx Data prefix: /opt/tcbsys/gromacs/2018/AVX2_256 Working dir: /home/pszilard/projects/gromacs/testing/water-048k Command line: gmx mdrun -nsteps 0 Back Off! I just backed up md.log to ./#md.log.95# Reading file topol.tpr, VERSION 4.6-beta3-dev-20121222-492378e (single precision) Note: file tpx version 82, software tpx version 112 The number of OpenMP threads was set by environment variable OMP_NUM_THREADS to 2 ------------------------------------------------------- Program: gmx mdrun, version 2018 Source file: src/gromacs/taskassignment/resourcedivision.cpp (line 224) Fatal error: When using GPUs, setting the number of OpenMP threads without specifying the number of ranks can lead to conflicting demands. Please specify the number of thread-MPI ranks as well (option -ntmpi). For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors -------------------------------------------------------
In contrast, in r2016, besides there being no error, it is pretty clear that the environment variable's value is used (that may have not been set by the user / at the time of mdrun invocation):
GROMACS: gmx mdrun, version 2016 Executable: /opt/tcbsys/gromacs/2016/AVX2_256/bin/gmx Data prefix: /opt/tcbsys/gromacs/2016/AVX2_256 Working dir: /home/pszilard/projects/gromacs/testing/water-048k Command line: gmx mdrun -nsteps 0 Running on 1 node with total 4 cores, 8 logical cores, 2 compatible GPUs Hardware detected: CPU info: Vendor: Intel Brand: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz SIMD instructions most likely to fit this hardware: AVX2_256 SIMD instructions selected at GROMACS compile time: AVX2_256 Hardware topology: Basic GPU info: Number of GPUs detected: 2 #0: NVIDIA GeForce GTX 1080, compute cap.: 6.1, ECC: no, stat: compatible #1: NVIDIA GeForce GTX 960, compute cap.: 5.2, ECC: no, stat: compatible Reading file topol.tpr, VERSION 4.6-beta3-dev-20121222-492378e (single precision) Note: file tpx version 82, software tpx version 110 Changing nstlist from 10 to 40, rlist from 1 to 1.101 The number of OpenMP threads was set by environment variable OMP_NUM_THREADS to 2 Overriding nsteps with value passed on the command line: 0 steps, 0 ps Using 2 MPI threads Using 2 OpenMP threads per tMPI thread 2 compatible GPUs are present, with IDs 0,1 2 GPUs auto-selected for this run. Mapping of GPU IDs to the 2 PP ranks in this node: 0,1
Also issue OMP_NUM_THREADS reading note to the log
The note that was meant to inform users that OMP_NUM_THREADS was setting
the number of threads in their run (as this value can be inherited by
the env) has not been logged. It was also printed right after the tpx
reading statues making it hard to notice. Removed stderr output now
that this is no longer required.
This change makes the note easier to notice prepending a newline and
issues it to the log file too.
#2 Updated by Szilárd Páll over 2 years ago
- Description updated (diff)
- Status changed from New to In Progress
Note that as the hw_opt reads the env var in
check_and_update_hw_opt_1() by calling
gmx_omp_nthreads_read_env() and the report that's not issued anymore with 2018 should be in fact printed, so something else is wrong.
#4 Updated by Szilárd Páll over 2 years ago
Will try to add the message in question to the log.