Project

General

Profile

Bug #2374

GPU detection claims fallback despite this clashing with command line user request

Added by Szilárd Páll 10 months ago. Updated 4 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

$gmx mdrun $opts -nsteps 0 -nb gpu 
        :-) GROMACS - gmx mdrun, 2018-rc1-dev-20180103-fa7f480-dirty (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov  Herman J.C. Berendsen    Par Bjelkmar   
 Aldert van Buuren   Rudi van Drunen     Anton Feenstra    Gerrit Groenhof  
 Christoph Junghans   Anca Hamuraru    Vincent Hindriksen Dimitrios Karkoulis
    Peter Kasson        Jiri Kraus      Carsten Kutzner      Per Larsson    
  Justin A. Lemkul    Viveca Lindahl    Magnus Lundborg   Pieter Meulenhoff 
   Erik Marklund      Teemu Murtola       Szilard Pall       Sander Pronk   
   Roland Schulz     Alexey Shvetsov     Michael Shirts     Alfons Sijbers  
   Peter Tieleman    Teemu Virolainen  Christian Wennberg    Maarten Wolf   
                           and the project leaders:
        Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2017, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS:      gmx mdrun, version 2018-rc1-dev-20180103-fa7f480-dirty
Executable:   /nethome/pszilard-projects/gromacs/gromacs-18/build_hsw_gcc71_cuda9.0/bin/gmx
Data prefix:  /nethome/pszilard/projects/gromacs/gromacs-18 (source tree)
Working dir:  /nethome/pszilard-projects/gromacs/testing/water-048k
Command line:
  gmx mdrun -v -resethway -noconfout -pin on -nsteps 0 -nb gpu

Back Off! I just backed up md.log to ./#md.log.38#
NOTE: GPUs cannot be detected:
      CUDA driver version is insufficient for CUDA runtime version
      Can not use GPU acceleration, will fall back to CPU kernels.
Reading file topol.tpr, VERSION 5.0.2-dev-20140905-f878c88 (single precision)
Note: file tpx version 100, software tpx version 112

Overriding nsteps with value passed on the command line: 0 steps, 0 ps
Changing nstlist from 20 to 80, rlist from 1.031 to 1.147

Using 32 MPI threads
Using 1 OpenMP thread per tMPI thread

-------------------------------------------------------
Program:     gmx mdrun, version 2018-rc1-dev-20180103-fa7f480-dirty
Source file: src/programs/mdrun/runner.cpp (line 997)
MPI rank:    6 (out of 32)

Fatal error:
Cannot run short-ranged nonbonded interactions on a GPU because there is none
detected.

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

-------------------------------------------------------
Program:     gmx mdrun, version 2018-rc1-dev-20180103-fa7f480-dirty
Source file: src/programs/mdrun/runner.cpp (line 997)
MPI rank:    22 (out of 32)

Fatal error:
Cannot run short-ranged nonbonded interactions on a GPU because there is none
detected.

Associated revisions

Revision 8daff31c (diff)
Added by Mark Abraham 4 months ago

Fix premature reporting of action when GPUs are not detected

The decision about what to do when GPUs could not be detected is a
responsbility taken at a higher level, so the detection handling code
should just report the facts.

Clarified some of the intended usage in the documentation

Fixes #2374

Change-Id: I63b569e053dd88d66351640019efc758e91bbdff

History

#1 Updated by Aleksei Iupinov 10 months ago

CUDA_VISIBLE_DEVICES="" ../../master/build-debug-cuda-8.0/bin/gmx mdrun -ntmpi 1 -pme gpu -nsteps 4

Back Off! I just backed up md.log to ./#md.log.44#
NOTE: GPUs cannot be detected:
      no CUDA-capable device is detected
      Can not use GPU acceleration, will fall back to CPU kernels.
Reading file topol.tpr, VERSION 5.1.2 (single precision)
Note: file tpx version 103, software tpx version 112

-------------------------------------------------------
Program:     gmx mdrun, version 2018-rc1-dev-20180105-82afac4aa-dirty
Source file: src/gromacs/taskassignment/decidegpuusage.cpp (line 341)
Function:    bool gmx::decideWhetherToUseGpusForPme(bool, gmx::TaskTarget, const std::vector<int>&, bool, int, int, bool)

Feature not implemented:
The PME on the GPU is only supported when nonbonded interactions run on GPUs
also.

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
-------------------------------------------------------

Relevant code from gmx_detect_gpus():

        gpusCanBeDetected = canDetectGpus(&errorMessage);
        if (!gpusCanBeDetected)
        {
            GMX_LOG(mdlog.warning).asParagraph().appendTextFormatted(
                    "NOTE: GPUs cannot be detected:\n" 
                    "      %s\n" 
                    "      Can not use GPU acceleration, will fall back to CPU kernels.",
                    errorMessage.c_str());
        }

Clearly gmx_detect_gpus() should have no final say in what task will fall back where.
That's the least of the problems though :-)

#2 Updated by Mark Abraham 6 months ago

  • Assignee set to Mark Abraham
  • Target version set to 2018.2

Targeting 2018.2, assuming this is still an issue we should fix.

#3 Updated by Gerrit Code Review Bot 4 months ago

Gerrit received a related patchset '1' for Issue #2374.
Uploader: Mark Abraham ()
Change-Id: gromacs~release-2018~I73de0e43ffa534424d60468e7bb656a5f9db9e2f
Gerrit URL: https://gerrit.gromacs.org/7997

#4 Updated by Mark Abraham 4 months ago

Note that the old message was accurate in 2016 and earlier release series, inasmuch as the former implementation of gmx mdrun -nb gpu would erroneously fall back to using the CPU kernels when GPUs were not detected, despite the explicit request from the user (tested in at least 5.1.3 and 2016.5).

Note that this message is also written to stderr in release-2018 when the user just runs gmx mdrun or even gmx mdrun -nb cpu - that's not critical information worth writing to stderr, but fixing it is a separate problem.

#5 Updated by Mark Abraham 4 months ago

  • Status changed from New to Fix uploaded

#6 Updated by Gerrit Code Review Bot 4 months ago

Gerrit received a related patchset '1' for Issue #2374.
Uploader: Mark Abraham ()
Change-Id: gromacs~release-2018~I63b569e053dd88d66351640019efc758e91bbdff
Gerrit URL: https://gerrit.gromacs.org/7999

#7 Updated by Mark Abraham 4 months ago

  • Status changed from Fix uploaded to Resolved

#8 Updated by Mark Abraham 4 months ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF