Project

General

Profile

Bug #1985

CUDA build system refactoring awaiting review

Added by Mark Abraham over 3 years ago. Updated about 1 year ago.

Status:
Fix uploaded
Priority:
Low
Assignee:
-
Category:
build system
Target version:
Affected version - extra info:
all CUDA versions
Affected version:
Difficulty:
uncategorized
Close

Description

While working on an unrelated issue on a non-GPU machine,

[mic1 r2016 ((f9ecd80...))] $ (cd build-cmake-icc-debug-mic1; cmake .. --debug-trycompile -UGMX_DETECTSIMD_RUN -UGMX_DETECTSIMD_COMPILED)
debug trycompile on
CUDA_TOOLKIT_ROOT_DIR not found or specified
-- Could NOT find CUDA (missing:  CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY) (Required is at least version "5.0")
-- No compatible CUDA toolkit found (v5.0+), disabling native GPU acceleration
-- Detecting best SIMD instructions for this CPU

Clearly FindCUDA is simply noisy, but cmake .. -DGMX_GPU=off did not fix the issue and cmake .. -DGMX_GPU_AUTO=off did, so we have something to fix.


Related issues

Related to GROMACS - Bug #2357: GMX_GPU=no doesn't work if initially set to autoClosed

Associated revisions

Revision 6f7d2e9a (diff)
Added by Aleksei Iupinov almost 2 years ago

Set GMX_GPU_AUTO to FALSE with GMX_GPU defined

Refs #1985, #2357

Change-Id: I5cada97015ee94717ea6eb988b3a84a351f11293

History

#1 Updated by Szilárd Páll over 3 years ago

The idea is that detection is in "auto" mode by default; in this case "CUDA_FIND_QUIETLY" is not set and FindCUDA in this case emits the first line -- which is the annoying message you're referring to, I assume. I don't think we can do much without either making FindCUDA silent or modifying the FindCUDA module.

#2 Updated by Mark Abraham over 3 years ago

We can run the detection the first cmake call, and not keeping running it every time (or at least run it in quiet mode thereafter).

And GMX_GPU=off must override GMX_GPU_AUTO=on, when the latter was set in a previous run of cmake.

#3 Updated by Szilárd Páll over 3 years ago

Something is ideed broken because this should be handled correctly as the quiet detection is intended to kick in:
http://redmine.gromacs.org/projects/gromacs/repository/revisions/master/entry/cmake/gmxManageGPU.cmake#L75

Looks like GMX_GPU_AUTO is stateful and if set once it will remain set, turning it off if detection fails solves the issue, but that's just a hack. Not sure what the proper solution is, but will try to come up with something.

#4 Updated by Teemu Murtola over 3 years ago

Respecting GMX_GPU=off like Mark is asking is impossible in the current design in all cases, since there is no way to tell whether the user set it if it was already set earlier automatically to the same value...

https://gerrit.gromacs.org/#/c/5586/ can solve the issue (if it already doesn't), but it has other issues as identified by code review comments nearly half a year ago...

#5 Updated by Szilárd Páll over 3 years ago

Thanks for the note Teemu, I did not notice the change as it does not ref this issue.

#6 Updated by Mark Abraham over 3 years ago

@Teemu Indeed, I'd like to get back to that
@Szilard Sure, that patch pre-dates this issue by about 5 months ;-)

#7 Updated by Gerrit Code Review Bot over 3 years ago

Gerrit received a related patchset '2' for Issue #1985.
Uploader: Mark Abraham ()
Change-Id: I7df81dea738da3ec9cd3971ad3507298a9f97dff
Gerrit URL: https://gerrit.gromacs.org/5586

#8 Updated by Gerrit Code Review Bot almost 3 years ago

Gerrit received a related patchset '6' for Issue #1985.
Uploader: Mark Abraham ()
Change-Id: gromacs~master~I7df81dea738da3ec9cd3971ad3507298a9f97dff
Gerrit URL: https://gerrit.gromacs.org/5586

#9 Updated by Erik Lindahl almost 2 years ago

This too seems to work fine in the present code. If anybody wants to keep it open, please specify exactly why, and change the subject to something more concrete.

#10 Updated by Mark Abraham almost 2 years ago

  • Subject changed from CUDA build system is annoying to CUDA build system refactoring awaiting review
  • Status changed from New to Fix uploaded
  • Target version set to 2019

#11 Updated by Mark Abraham almost 2 years ago

There are issues remaining, e.g. run cmake on a machine that finds gcc and cuda, then run it again and see

-- Found CUDA: /usr/local/cuda (found suitable version "9.0", minimum required is "6.5") 
-- Found OpenMP_C: -fopenmp  
-- Found OpenMP_CXX: -fopenmp  
-- Configuring done
-- Generating done
-- Build files have been written to: /home/mabraham/git/r2018/build-cmake-gcc-gpu-debug

But if you start from a fresh build tree and do the first cmake with -DGMX_GPU=on, then the second cmake only does

-- Found OpenMP_C: -fopenmp  
-- Found OpenMP_CXX: -fopenmp  
-- Configuring done
-- Generating done
-- Build files have been written to: /home/mabraham/git/r2018/build-cmake-gcc-gpu-debug

That's minor, of course, but both of those cases don't follow our general expectation that repeat invocations of cmake are quiet. They indicate that the way we use the cache in the current implementation should be improved, which my https://gerrit.gromacs.org/5586 proposes. I strongly suggest we review that (after releasing 2018) before attempting any further work on the GPU support in our build system.

#12 Updated by Mark Abraham almost 2 years ago

More examples, this time using clang, where distros versions are often still without openmp support (e.g. clang-4.0.1 in ubuntu 17.10(:

$ cmake .. -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DGMX_GPU=on -DGMX_USE_OPENCL=on

...

-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - found
-- Found OPENCL: /usr/lib/x86_64-linux-gnu/libOpenCL.so (found version "2.0") 
-- Could NOT find OpenMP_C (missing: OpenMP_C_FLAGS OpenMP_C_LIB_NAMES) (found ve
rsion "1.0")
-- Could NOT find OpenMP_CXX (missing: OpenMP_CXX_FLAGS OpenMP_CXX_LIB_NAMES) (fo
und version "1.0")
CMake Warning at cmake/gmxManageOpenMP.cmake:65 (message):
  The compiler you are using does not support OpenMP parallelism.  This might
  hurt your performance a lot, in particular with GPUs.  Try using a more
  recent version, or a different compiler.  For now, we are proceeding by
  turning off OpenMP.
Call Stack (most recent call first):
  CMakeLists.txt:328 (include)

...

-- Performing Test PTHREAD_SETAFFINITY
-- Performing Test PTHREAD_SETAFFINITY - Success
CMake Warning at cmake/gmxManageOpenCL.cmake:72 (message):
  To use GPU acceleration efficiently, mdrun requires OpenMP multi-threading.
  Without OpenMP a single CPU core can be used with a GPU which is not
  optimal.  Note that with MPI multiple processes can be forced to use a
  single GPU, but this is typically inefficient.  You need to set both C and
  C++ compilers that support OpenMP (CC and CXX environment variables,
  respectively) when using GPUs.
Call Stack (most recent call first):
  CMakeLists.txt:582 (gmx_gpu_setup)

...

-- Configuring done
-- Generating done

then when you immediately do

$ cmake ..
CMake Warning at cmake/gmxManageOpenCL.cmake:72 (message):
  To use GPU acceleration efficiently, mdrun requires OpenMP multi-threading.
  Without OpenMP a single CPU core can be used with a GPU which is not
  optimal.  Note that with MPI multiple processes can be forced to use a
  single GPU, but this is typically inefficient.  You need to set both C and
  C++ compilers that support OpenMP (CC and CXX environment variables,
  respectively) when using GPUs.
Call Stack (most recent call first):
  CMakeLists.txt:582 (gmx_gpu_setup)

-- Configuring done
-- Generating done

which you cannot even suppress with -DGMX_OPENMP=off. Such warnings made more sense when people might have had ancient gcc that didn't have openmp, but that's no longer a relevant consideration. IMO the right time to make observations about performance is during the run (to a log file). We could consider writing some CMake code to make suggestions at the end of the first run of cmake, but that needs buy in from multiple people prepared to write, review, and test the cmake code.

#13 Updated by Teemu Murtola almost 2 years ago

  • Related to Bug #2357: GMX_GPU=no doesn't work if initially set to auto added

#14 Updated by Mark Abraham almost 2 years ago

Another example: Run cmake, detecting CUDA. Run ccmake to turn GMX_GPU=off. Run ccmake to turn GMX_DOUBLE=on, and get warned that GPU support is not availble in double precision.

#15 Updated by Gerrit Code Review Bot almost 2 years ago

Gerrit received a related patchset '1' for Issue #1985.
Uploader: Aleksei Iupinov ()
Change-Id: gromacs~release-2018~I5cada97015ee94717ea6eb988b3a84a351f11293
Gerrit URL: https://gerrit.gromacs.org/7377

#16 Updated by Szilárd Páll almost 2 years ago

Mark Abraham wrote:

More examples, this time using clang, where distros versions are often still without openmp support (e.g. clang-4.0.1 in ubuntu 17.10(:

[...]

then when you immediately do

[...]

which you cannot even suppress with -DGMX_OPENMP=off. Such warnings made more sense when people might have had ancient gcc that didn't have openmp, but that's no longer a relevant consideration. IMO the right time to make observations about performance is during the run (to a log file). We could consider writing some CMake code to make suggestions at the end of the first run of cmake, but that needs buy in from multiple people prepared to write, review, and test the cmake code.

I think this latter is mostly a dev-concern, so it would be nice to have, but for users not that important: most users configure and build once, so the message won't keep reappearing (and it's still relevant especially as these days it's rare that a compiler does not support OpenMP at all).

#17 Updated by Mark Abraham about 1 year ago

  • Target version changed from 2019 to 2020

We will probably need to rework the GPU build system some time, but strategy is unclear and resources to review my proposed changes are not available.

Also available in: Atom PDF