Project

General

Profile

Bug #3178

Fatal Error when launching mdrun on host with busy/unavailable GPU(s)

Added by Artem Shekhovtsov about 1 month ago. Updated about 1 month ago.

Status:
Accepted
Priority:
Normal
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

The launch of mdrun that does not require video cards exit with a fatal error if at least one video card is busy on the host at that time.

gmx grompp -f test.mdp -c spc216.gro -p topol.top -o test.tpr
gmx mdrun -deffnm test -ntmpi 1 -ntomp 1 -nb cpu -bonded cpu

Result:
-------------------------------------------------------
Program: gmx mdrun, version 2019.2
Source file: src/gromacs/gpu_utils/gpu_utils.cu (line 100)

Fatal error:
cudaFuncGetAttributes failed: all CUDA-capable devices are busy or
unavailable

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

I have this error in 2019.2, 2019.3, 2020.beta.
Version - 2018.6 is not affected.
All version builds with the same flags.
cmake .. -DGMX_BUILD_OWN_FFTW=ON -DREGRESSIONTEST_DOWNLOAD=ON -DGMX_GPU=on -DCMAKE_INSTALL_PREFIX=/data/user/shehovtsov/SOFTWARE/GROMACS/2019.2_test

test.mdp (2.37 KB) test.mdp Artem Shekhovtsov, 10/25/2019 04:48 PM
spc216.gro (28.6 KB) spc216.gro Artem Shekhovtsov, 10/25/2019 04:48 PM
topol.top (368 Bytes) topol.top Artem Shekhovtsov, 10/25/2019 04:48 PM
test.log (3.65 KB) test.log Artem Shekhovtsov, 10/25/2019 04:49 PM
check_log (333 KB) check_log Artem Shekhovtsov, 10/25/2019 04:52 PM
cmake_log (12.2 KB) cmake_log Artem Shekhovtsov, 10/25/2019 04:52 PM
make_log (873 KB) make_log Artem Shekhovtsov, 10/25/2019 04:52 PM
printenv (5.34 KB) printenv Artem Shekhovtsov, 10/25/2019 04:55 PM

History

#1 Updated by Szilárd Páll about 1 month ago

  • Subject changed from Fatal Error when launching gromacs 2019.2 on host with GPU. to Fatal Error when launching mdrun on host with busy/unavailable GPU(s)

This is caused by the sanity checking expecting cudaSuccess to be returned by all API calls whereas some of the sanity checking steps may be prevented by devices being in exclusive mode.

#2 Updated by Szilárd Páll about 1 month ago

  • Category set to mdrun
  • Status changed from New to Accepted
  • Assignee set to Szilárd Páll
  • Target version set to 2019.5

Also available in: Atom PDF