Project

General

Profile

Bug #2390

older GROMACS can't build with CUDA 9.0

Added by Mark Abraham 4 months ago. Updated 4 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
build system
Target version:
-
Affected version - extra info:
5.x master
Affected version:
Difficulty:
uncategorized
Close

Description

We specify many cross-compilation targets in our default build, including for CC 2.0, for which NVIDIA removed support in CUDA 9.0. Thus our users might see

"nvcc fatal   : Unsupported gpu architecture 'compute_20'
nvcc fatal   : Unsupported gpu architecture 'compute_20'
CMake Error at
libgromacs_generated_nbnxn_cuda_data_mgmt.cu.o.Release.cmake:218 (message):" 

which I have seen reported for 5.1.5 and 2016.3.

The minimal fix (present in 2016.4) is to remove sm_20 (and compute_20?) from our nvcc invocations for CUDA 9.0.

However, we should probably invest in better CUDA support in our CMake. Rather than checking CUDA version numbers, we should check for feature support, as we do for other compiler flags, so that we will not require the use of compiler flags that we have not checked will work.

For a further example, the CUDA 9.1.85 release notes mention that CC 7.0 support still does not work on MacOS, so there is no version of GROMACS that works with CUDA 9.0 on that platform - in all released versions, either we require sm_20 or require sm_70, and compilation fails for different reasons.

Fortunately, the recent addition to 2018 that checks that nvcc works will be a useful template for fixing it in that branch. Maybe we could consider back-porting that to 2016 branch.


Related issues

Related to GROMACS - Feature #2126: implement native CUDA support in CMakeNew
Related to GROMACS - Task #2505: consider bumping cmake reqirement for GROMACS 2019New

History

#1 Updated by Szilárd Páll 4 months ago

Mark Abraham wrote:

We specify many cross-compilation targets in our default build, including for CC 2.0, for which NVIDIA removed support in CUDA 9.0. Thus our users might see

[...]

which I have seen reported for 5.1.5 and 2016.3.

The minimal fix (present in 2016.4) is to remove sm_20 (and compute_20?) from our nvcc invocations for CUDA 9.0.

It should work, it was fixed by Jiri in the initial CUDA 9 support change in 97f9f399.

However, we should probably invest in better CUDA support in our CMake. Rather than checking CUDA version numbers, we should check for feature support, as we do for other compiler flags, so that we will not require the use of compiler flags that we have not checked will work.

For a further example, the CUDA 9.1.85 release notes mention that CC 7.0 support still does not work on MacOS, so there is no version of GROMACS that works with CUDA 9.0 on that platform - in all released versions, either we require sm_20 or require sm_70, and compilation fails for different reasons.

OK, that's annoying but hardly an important target (unless I"m mistaken Apple hasn't shipped any NVIDIA GPU in the last few years).

Fortunately, the recent addition to 2018 that checks that nvcc works will be a useful template for fixing it in that branch. Maybe we could consider back-porting that to 2016 branch.

Sure, but it's ugly and fairly high LOC for for something that we'd ideally have better CMake support for via try_compile (and the native CUDA support might actually be able to do that).

#2 Updated by Mark Abraham 4 months ago

Szilárd Páll wrote:

Mark Abraham wrote:

We specify many cross-compilation targets in our default build, including for CC 2.0, for which NVIDIA removed support in CUDA 9.0. Thus our users might see

[...]

which I have seen reported for 5.1.5 and 2016.3.

The minimal fix (present in 2016.4) is to remove sm_20 (and compute_20?) from our nvcc invocations for CUDA 9.0.

It should work, it was fixed by Jiri in the initial CUDA 9 support change in 97f9f399.

However, we should probably invest in better CUDA support in our CMake. Rather than checking CUDA version numbers, we should check for feature support, as we do for other compiler flags, so that we will not require the use of compiler flags that we have not checked will work.

For a further example, the CUDA 9.1.85 release notes mention that CC 7.0 support still does not work on MacOS, so there is no version of GROMACS that works with CUDA 9.0 on that platform - in all released versions, either we require sm_20 or require sm_70, and compilation fails for different reasons.

OK, that's annoying but hardly an important target (unless I"m mistaken Apple hasn't shipped any NVIDIA GPU in the last few years).

Sure, but it shows that our approach risks being broken anywhere that we don't have Jenkins or devs testing - and we don't have CUDA 9 in Jenkins at all, yet.

Fortunately, the recent addition to 2018 that checks that nvcc works will be a useful template for fixing it in that branch. Maybe we could consider back-porting that to 2016 branch.

Sure, but it's ugly and fairly high LOC for for something that we'd ideally have better CMake support for via try_compile (and the native CUDA support might actually be able to do that).

https://redmine.gromacs.org/projects/gromacs/repository/revisions/master/entry/cmake/gmxManageGPU.cmake#L275 already does most of the work of calling execute_process(). All we need is to wrap that into gmx_nvcc_try_compile(RESULT_VAR SRC_FILE | SRC_STRING) that picks up the variables that FindCUDA.cmake manages.

(and the native CUDA support might actually be able to do that).

Likely it does. But that isn't effective unless we'd choose to require cmake 3.8 for (at least) CUDA support in GROMACS 2019. We could do that, but we'd want to identify a few useful things, and check that at least some LTS/stable distro releases have that cmake version. (Note that FindCUDA.cmake and the native support are totally different things.) It would be extra work for us to support both in the same branch, but if we did support native compilation via cmake 3.8 in GROMACS 2019, then we could say to a hypothetical user of 2019 who's also got e.g. CUDA from 2020 that doesn't support CC 3.0 that they now have options: use the latest GROMACS, use a not-latest CUDA, or update cmake for better automation.

Meanwhile, 2018 and earlier versions will rot the moment NVIDIA de-supports CC 3.0, for example. (We should ask NVIDIA what their current thoughts are for that - if they have already decided that they are dropping CC 3.0 for CUDA 10, then we have more reason to attempt a proper fix in 2018 branch.)

Also, questions about whether CC=5.3 is or is not only available on Tegra or should be a compilation target get a bit easier to handle - ask the compiler and if the user left us a choice, compile for whatever is possible.

#3 Updated by Mark Abraham 3 months ago

  • Related to Feature #2126: implement native CUDA support in CMake added

#4 Updated by Mark Abraham 4 days ago

  • Related to Task #2505: consider bumping cmake reqirement for GROMACS 2019 added

Also available in: Atom PDF