nvcc host compiler sanity checks break with cmake >=2.8.10
The FindCUDA module included in CMake >=2.8.10 sets by default the CUDA_HOST_COMPILER variable without doing any sanity checks to make sure that the host compiler used is actually compatible with nvcc. With CUDA_HOST_COMPILER set our checks done in source:cmake/gmxManageNvccConfig.cmake get disabled. Therefore, with CMake >=2.8.10 the user does not get a warning about incompatible host compiler anymore.
Remove CUDA host compiler consistency checks
Since CMake 2.8.10 the host compiler is set by CMake which effectively
broke our consistency checks. However, these checks are hard to
maintain, and even though CMake does not do any checks we are better off
without this code.
This commit removed the checks, unconditionally sets the
CUDA_HOST_COMPILER variable for CMake 2.8.9 and earlier - code that
should be removed when CMAke 2.8.10 is required.
#1 Updated by Szilárd Páll over 7 years ago
The current checks do issue a warning along a lengthy explanation on why are we not auto-setting the nvcc host compiler (see gmxManageNvccConfig.cmake).
However, after some annoyingly lengthy debugging, I realized that FindCUDA included in CMake version 2.8.10 and later during the CUDA detection process automatically sets CUDA_HOST_COMPILER - which is essentially equivalent with the user passing this manually - which disables the checks. I don't have a good solution for this except checking whether the user passed CUDA_HOST_COMPILER before calling FindCUDA and handling the case of CMake >=2.8.10 separately. Ugly, but it should work.
#3 Updated by Szilárd Páll about 7 years ago
The problem is that as FindCUDA doesn't do any of the sanity checks on the compiler that we do and and it will happily accept compilers that surely don't work with nvcc (e.g. clang) as well as MPI compiler wrappers - which can be problematic in some cases. If we want to keep the verbose user feedback, we need to decide whether we keep the FindCUDA >2.8.10 behavior or discard the CUDA_HOST_COMPILER value and leave it unset in the cases when we do this ATM. Not sure what's best, but I don't think we should pull in CMake source code (again) just for the sake of this rather minor detail.
Note to self: we could potentially look for OMPI_CC and similar environment variables which can indicate what is the C compiler behind the MPI wrapper.
#10 Updated by Szilárd Páll about 7 years ago
- Priority changed from Normal to Low
No, it did not, it made things work with the new cmake, but as a side-effect, with cmake >=v2.8.10 we don't check the host compiler (as with additional changes we can't distinguish between CUDA_HOST_COMPILER set by the user or by FindCUDA).
Not an big issue IMHO, so I'm dropping this to low priority.
#16 Updated by Erik Lindahl over 4 years ago
- Status changed from Fix uploaded to Accepted
This has now been open for three years, and nobody appears to have looked into it after we decided my quick-and-dirty fix didn't work (and had worse side effects).
At some point I guess we need to decide where we draw the line between the responsibility of GROMACS, Cmake, the operating system and the user.
I fear we will end up with a huge mess of Cmake configurations that we then struggle to keep up to date if we try to write our own detection of the entire
CUDA compiler configuration, so my preference would be that we drop this and simply tell the user it is their responsibility to pick a compatible cuda host compiler.
Looking at OS X, NVIDIA has deprecated gcc and only supports the default system clang compiler, so I suspect their future answer to this question is that they
only support the default compiler on each system.
Thus, I would vote for dropping this issue and leaving it to the CMake module.
#17 Updated by Szilárd Páll over 4 years ago
I agree, we need to make our life easier by offloading more burden on the user. I put a lot of work into checks, detection, warnings etc. to help users as much as possible in the initial GPU release, but indeed, this part of the code could use simplification.
However, CMake sets those variables without checking so the issue won't just go away without changing some code to remove the inconsistent way we handle things. I considered it too much effort/code complexity to implement the existing checks for the CMAke >=2.8.10 and now I would prefer to not have to maintain these checks further - and these are getting outdated anyway.
The only solution I see is to remove the checks and just set the host compiler to whatever is detected. I'll keep the icc compatibility mode flag, although it leads to warnings with v15, but I don't have a better option right now (or am I missing something?).