Project

General

Profile

Bug #2472

confusing error message when OMP_NUM_THREADS is used with GPUs

Added by Szilárd Páll about 1 year ago. Updated 10 months ago.

Status:
Closed
Priority:
Normal
Category:
mdrun
Target version:
Affected version - extra info:
2018.x
Affected version:
Difficulty:
uncategorized
Close

Description

The r2018 code does not allow setting only the OpenMP thread count in a GPU run (in tMPI builds), but as the OpenMP thread count handling was changed and part of the reporting seems short-circuited (the env var-related reporting from the omp_nthreads module does not happen), this leads to potentially confusing error messages that lack context.

$ OMP_NUM_THREADS=2 gmx mdrun -nsteps 0
[...]

GROMACS:      gmx mdrun, version 2018
Executable:   /opt/tcbsys/gromacs/2018/AVX2_256/bin/gmx
Data prefix:  /opt/tcbsys/gromacs/2018/AVX2_256
Working dir:  /home/pszilard/projects/gromacs/testing/water-048k
Command line:
  gmx mdrun -nsteps 0

Back Off! I just backed up md.log to ./#md.log.95#
Reading file topol.tpr, VERSION 4.6-beta3-dev-20121222-492378e (single precision)
Note: file tpx version 82, software tpx version 112
The number of OpenMP threads was set by environment variable OMP_NUM_THREADS to 2

-------------------------------------------------------
Program:     gmx mdrun, version 2018
Source file: src/gromacs/taskassignment/resourcedivision.cpp (line 224)

Fatal error:
When using GPUs, setting the number of OpenMP threads without specifying the
number of ranks can lead to conflicting demands. Please specify the number of
thread-MPI ranks as well (option -ntmpi).

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

In contrast, in r2016, besides there being no error, it is pretty clear that the environment variable's value is used (that may have not been set by the user / at the time of mdrun invocation):

GROMACS:      gmx mdrun, version 2016
Executable:   /opt/tcbsys/gromacs/2016/AVX2_256/bin/gmx
Data prefix:  /opt/tcbsys/gromacs/2016/AVX2_256
Working dir:  /home/pszilard/projects/gromacs/testing/water-048k
Command line:
  gmx mdrun -nsteps 0

Running on 1 node with total 4 cores, 8 logical cores, 2 compatible GPUs
Hardware detected:
  CPU info:
    Vendor: Intel
    Brand:  Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
    SIMD instructions most likely to fit this hardware: AVX2_256
    SIMD instructions selected at GROMACS compile time: AVX2_256

  Hardware topology: Basic
  GPU info:
    Number of GPUs detected: 2
    #0: NVIDIA GeForce GTX 1080, compute cap.: 6.1, ECC:  no, stat: compatible
    #1: NVIDIA GeForce GTX 960, compute cap.: 5.2, ECC:  no, stat: compatible

Reading file topol.tpr, VERSION 4.6-beta3-dev-20121222-492378e (single precision)
Note: file tpx version 82, software tpx version 110
Changing nstlist from 10 to 40, rlist from 1 to 1.101

The number of OpenMP threads was set by environment variable OMP_NUM_THREADS to 2

Overriding nsteps with value passed on the command line: 0 steps, 0 ps

Using 2 MPI threads
Using 2 OpenMP threads per tMPI thread

2 compatible GPUs are present, with IDs 0,1
2 GPUs auto-selected for this run.
Mapping of GPU IDs to the 2 PP ranks in this node: 0,1

Associated revisions

Revision f0c98f46 (diff)
Added by Szilárd Páll 10 months ago

Also issue OMP_NUM_THREADS reading note to the log

The note that was meant to inform users that OMP_NUM_THREADS was setting
the number of threads in their run (as this value can be inherited by
the env) has not been logged. It was also printed right after the tpx
reading statues making it hard to notice. Removed stderr output now
that this is no longer required.

This change makes the note easier to notice prepending a newline and
issues it to the log file too.

Refs #2472

Change-Id: I73fc9de5e9d747f9d7a094c6678ffc1547481b94

History

#1 Updated by Szilárd Páll about 1 year ago

  • Description updated (diff)

#2 Updated by Szilárd Páll about 1 year ago

  • Description updated (diff)
  • Status changed from New to In Progress

Note that as the hw_opt reads the env var in check_and_update_hw_opt_1() by calling gmx_omp_nthreads_read_env() and the report that's not issued anymore with 2018 should be in fact printed, so something else is wrong.

#3 Updated by Szilárd Páll about 1 year ago

  • Status changed from In Progress to Rejected

Wow, I'm blind it seems. Only the newline went missing -- and we'd be better off logging this as well as printing a note.

#4 Updated by Szilárd Páll about 1 year ago

Ref user report:
https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2018-March/119506.html

Will try to add the message in question to the log.

#5 Updated by Gerrit Code Review Bot about 1 year ago

Gerrit received a related patchset '1' for Issue #2472.
Uploader: Szilárd Páll ()
Change-Id: gromacs~release-2018~I73fc9de5e9d747f9d7a094c6678ffc1547481b94
Gerrit URL: https://gerrit.gromacs.org/7741

#6 Updated by Mark Abraham about 1 year ago

Putting the full context in the error message requires that hw_opt keep track of the reason why nthreads_omp has the value that it does. That would be wise to do in general, but probably not in the release branch.

#7 Updated by Mark Abraham 10 months ago

  • Status changed from Rejected to Fix uploaded
  • Assignee set to Szilárd Páll

#8 Updated by Mark Abraham 10 months ago

  • Status changed from Fix uploaded to Resolved

#9 Updated by Mark Abraham 10 months ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF