work around broken NVIDIA JIT caching
The NVIDIA JIT compiler's binary caching is simply broken. It manifests in the kernels not getting compiled even though the source changes. As a result old kernel binaries can be used when recompilation would be required and this can often result in weird errors or incorrect results that are hard to explain or debug.
This mostly affects devs, but users can be affected too if e.g. they pull a bugfix and the kernels don't get recompiled; possibly even mismatching GROMACS binary/kernel combinations can happen too.
No better idea than forcing JIT off by setting the CUDA_CACHE_DISABLE env. var. Ugly, but seems warranted.
Disable NVIDIA JIT cache with OpenCL
The NVIDIA JIT caching is known to be broken with OpenCL compilation in
the case when the kernel source changes but the path does not change
(e.g. kernels get overwritten). Therefore we disable the JIT caching on
#2 Updated by Szilárd Páll over 3 years ago
Mark Abraham wrote:
Is this still needing action?
Not sure, I have not tested NVIDA OpenCL much lately, but I doubt much has changed. I'll try to see if I can still reproduce.
All we could do is document, and perhaps export it from GMXRC
We could use
#3 Updated by Szilárd Páll over 3 years ago
- Target version changed from 5.1.3 to 2016
Just tested and if the kernel file changes I can reproduce the incorrect cache reuse with a 364.19 driver. However, it seems that the path changes, e.g. if a new installation is used, the cache. Hence, ACAICT it will affect only devs, so document this and avoiding other hacks might be enough. So perhaps it's not even worth targeting 5.1.
#6 Updated by Szilárd Páll over 3 years ago
Sorry for the late feedback. There is a slight danger that if one installs into the same location, e.g. the default one in
/usr/local or without patch version
/opt/gromacs-16, things can go wrong.
Not highly important, but I've a fix, in case if wanted, it can be merged either to rel-2016 or master.