GMX_GPU=no doesn't work if initially set to auto
On WSL build fails with
CMake Error at cmake/FindNVML.cmake:103 (if): if given arguments: "VERSION_LESS" "8.0"
It might be a good idea to check CUDA_VERSION_STRING before assuming CUDA is available (even if CUDA_FOUND is true).
More importantly. If CUDA_FOUND is true then setting GMX_GPU=off after it was initially auto doesn't disable it. Thus one is stuck there without first removing the cache.
#1 Updated by Aleksei Iupinov 10 months ago
But CUDA_FOUND is only a result of CUDA toolkit detection, why should it be touched with GMX_GPU=OFF? CUDA's not getting used at all.
I don't know much of cmake, but I don't understand how it ends up in the FindNVML script if the precondition for that call in gmxManageGPU.cmake is "if (GMX_GPU)".
The only interesting thing I can see after passing GMX_GPU=OFF is that GMX_GPU_AUTO is still ON - maybe that's the reason, and it should always be set appropriately to the value of (NOT DEFINED GMX_GPU) at the beginning of gmxManageGPU. Can you try that?
#2 Updated by Mark Abraham 10 months ago
gmx_option_trivalue in cmake/gmxOptionUtilities.cmake is the relevant piece of gear. It does look like
if(GMX_GPU) should work correctly, but that if a trivalue gets changed after being set, I don't think any of the supporting logic runs. Yet I don't see how that leads to the wrong behaviour.
#4 Updated by Teemu Murtola 10 months ago
This code is not using gmx_option_trivalue(), so that is unrelated. It just happens to use similar variable names for its internal state...
What seems to be the problem is that there is complex state management involving multiple cached variables (at least GMX_GPU, GMX_GPU_AUTO, and GMX_GPU_DETECTION_DONE). The exact problem here seems to be that if the first CMake run sets GMX_GPU_AUTO, but then fails later, GMX_GPU_DETECTION_DONE does not get set. And in this case, the auto mode remains in effect even if the user sets GMX_GPU=OFF explicitly. And the auto mode overrides the user-provided value if the detection variable is not set...
The fix for #1985 may fix this as well, or at least takes things towards a more sane behavior where this would be easier to fix.