Project

General

Profile

Task #1937

stop supporting changing CUDA host compiler

Added by Mark Abraham over 3 years ago. Updated 10 months ago.

Status:
New
Priority:
Low
Assignee:
Category:
build system
Target version:
Difficulty:
uncategorized
Close

Description

cmake/gmxManageNvccConfig.cmake supports changing the nvcc host compiler. This is problematic because the main compilers are constant in CMake, and the only way we can be sure of binary compatibility of code compiled with nvcc is to use the same host compiler.

We should use the main compiler as the host compiler, either by checking that FindCUDA agrees on which compiler to use, or setting it ourselves.

History

#1 Updated by Szilárd Páll over 3 years ago

I'm definitely no expert on this, but there must be alternatives to ensure binary compatiblity and avoiding getting teleported one into the C++ hell.

The reason why I find it potentially very limiting to not be able to reasonably freely pick a host compiler is that I don't trust NVIDIA keeping up with compilers. They got better, but consider this: assuming we'd introduce this restriction now, with CUDA we could not use anything newer than gcc 4.8 (IIRC). Plus, if CUDA 8.0 happens to introduce some stupid regression (which was the case e.g. with 7.0), one would be stuck with gcc 4.8 instead of v5-6 for another year or two.

#2 Updated by Alexey Shvetsov over 3 years ago

Its bad idea. For example on cluster at SpbSTU we have gcc-5.3 gcc-4.8 and icc-2016.1. Also there are some number of gpu nodes with K40. Cuda version is 7.5.

Only way to compile cuda code is to use gcc-4.8 for it as backend for nvcc (and disable propagating host flags to nvcc) since gcc-5.3 and icc-2016.1 cannot be used for cuda =\

#3 Updated by Mark Abraham over 3 years ago

For this to ever be a useful user-space feature, we need a plan for how we're going to manage
  • testing a reasonable matrix of ABI and standard library potential incompatibility (and why this is a priority for our time)
  • documenting what a user would do, and
  • how an appreciable fraction of users will know that they should bother to do so.
    If not, then it's just a toy for cognoscenti, and surely we have better things to do.

Note that after three years we have barely even documented what CUDA_HOST_COMPILER does.

#4 Updated by Alexey Shvetsov over 3 years ago

We can document this feature =)

list of supported compilers from cuda/host_config.h from cuda-7.5

=icc-15
=<gcc-4.9
=pgc++-15.4
=xlC-13.1
MSVC-{2010,2012,2013}

So if support for defining CUDA_HOST_COMPILER this list will be only supported compilers =/

#5 Updated by Gerrit Code Review Bot over 3 years ago

Gerrit received a related patchset '1' for Issue #1937.
Uploader: Alexey Shvetsov ()
Change-Id: Iceb62ed67718ed7796f593f9939b451dd0e4dc48
Gerrit URL: https://gerrit.gromacs.org/5787

#6 Updated by Szilárd Páll over 3 years ago

Mark Abraham wrote:

If not, then it's just a toy for cognoscenti, and surely we have better things to do.

Note that after three years we have barely even documented what CUDA_HOST_COMPILER does.

I agree that some basic documentation is useful. At the same time, your above comment seems to assume that the GROMACS project has the responsibility of documenting this CUDA/CMake feature which sounds strange.

The CUDA host compiler is a CUDA feature exposed straight by the CMake module through the CUDA_HOST_COMPILER variable. Interested users can always read up in the CUDA documentation if they run into the issue of compiler hell. We provide no documentation for plenty of other CMAke cache variables, and I see no reason to provide more than very basic description (similar to what Alexey uploaded) and pointer to the NVIDIA docs (and my original intent was to automate things in CMake so the user most does not have to care). That will of course only tell users how to possibly create more trouble while trying to solve an issue, but hey, there's no free lunch.

I believe a decently complete documentation (self-sufficient with good coverage of use-cases) that's also up-to-date requires more or less the same amount of maintenance as the code I jut removed from CMake. If you prefer to help users more pointing to external resources, we may as well have consistency checks and hints directly in the CMake script along a very basic documentation.

BTW, one real issue is the flag propagation, so some minimal support for filtering/overriding the C++ flags passed would be useful.

#7 Updated by Alexey Shvetsov over 3 years ago

Problem is that cmake related cuda options are poorly documented. So if we have some basic howto fix (or br0ke at all) will be helpfull for users.

#8 Updated by Mark Abraham over 3 years ago

Documenting CUDA_HOST_COMPILER is the smallest part of this issue (and yeah, I'd much prefer we contribute something useful upstream and link to it).

This issue is about removing support for changing the host compiler, which should never be a thing because we don't have the resources to test anything relevant (even if it was clear what is relevant).

#9 Updated by Szilárd Páll over 3 years ago

Alexey Shvetsov wrote:

Problem is that cmake related cuda options are poorly documented. So if we have some basic howto fix (or br0ke at all) will be helpfull for users.

Some documentation is useful and I'll make sure to consider what more is relevant to add. Which options do you think users should care about? To be honest, off the top of my head I can't think of any except the host compiler that would be of interest to users.

However, it is in the scope of GROMACS to document CMake options, for the users can go and check the CMake documentation. We should probably link to this in our docs.

Mark Abraham wrote:

This issue is about removing support for changing the host compiler, which should never be a thing because we don't have the resources to test anything relevant (even if it was clear what is relevant).

We don't "remove support" e.g. for using different versions (or even vendors) of C and C++ compilers and preprocessor, nor do we block (or warn about) non-standard assemblers, and there are probably plenty of other ways the users can shoot themselves in the foot if they tried hard enough.

Hence, I don't think it is in the scope of our build scripts to prevent users from changing the CUDA host compiler (unless and until this proves to be a real issue). Whether/how we endorse and document this workaround for the known nvcc + C++ compiler incompatibilities is the topic that I would prefer focusing on.

And BTW, I did not document this exactly because I was warned a few years ago that technically it is not sound to mix C++ compilers.

#10 Updated by Mark Abraham about 3 years ago

Szilárd Páll wrote:

Alexey Shvetsov wrote:

Problem is that cmake related cuda options are poorly documented. So if we have some basic howto fix (or br0ke at all) will be helpfull for users.

Some documentation is useful and I'll make sure to consider what more is relevant to add. Which options do you think users should care about? To be honest, off the top of my head I can't think of any except the host compiler that would be of interest to users.

Agree, and obviously I think even that should not be an option for users.

However, it is in the scope of GROMACS to document CMake options, for the users can go and check the CMake documentation. We should probably link to this in our docs.

Yes someone could link to that.

Mark Abraham wrote:

This issue is about removing support for changing the host compiler, which should never be a thing because we don't have the resources to test anything relevant (even if it was clear what is relevant).

We don't "remove support" e.g. for using different versions (or even vendors) of C and C++ compilers and preprocessor, nor do we block (or warn about) non-standard assemblers, and there are probably plenty of other ways the users can shoot themselves in the foot if they tried hard enough.

True, we haven't warned about mixing compiler vendors or versions, but that problem is going away as we migrate everything to C++ compiler (TNG hasn't been done yet, but that's the main TODO item left).

Hence, I don't think it is in the scope of our build scripts to prevent users from changing the CUDA host compiler (unless and until this proves to be a real issue). Whether/how we endorse and document this workaround for the known nvcc + C++ compiler incompatibilities is the topic that I would prefer focusing on.

It is a real issue. There is no ABI compatibility of C++ compilers anywhere, and until there is, we will have no ability to test well enough to recommend it to users. So however the NVCC host compiler gets set, it should never change, and we should not have CMake code that reacts to whether it has changed.

And BTW, I did not document this exactly because I was warned a few years ago that technically it is not sound to mix C++ compilers.

Sure.

#11 Updated by Mark Abraham over 1 year ago

So, I conclude that we will
  • stop having code that facilitates changing the host compiler in a given build tree - devs should either clean it out or make another one
  • not support compiling binaries from different compilers, as no compiler or GPU vendor supports this use case in terms of ABI compatibility

#12 Updated by Mark Abraham over 1 year ago

  • Assignee set to Mark Abraham
  • Target version set to 2019

#13 Updated by Mark Abraham 10 months ago

  • Target version changed from 2019 to future

GPU build system is clearly not a priority.

I still think this is functionality that I would vote to reject if someone proposed adding it.

Also available in: Atom PDF