Bug #2810
running on Fermi throws cryptic error
Description
The consistency / support checks have not been updated and when we removed Fermi support we left no consistency checks that can detect the lack of support for an architecture. As a result, it neither defaults to CPU nor does it emit a message that users can understand.
$ gmx mdrun -quiet -nb gpu -gpu_id 0 ------------------------------------------------------- Program: gmx mdrun, version 2019-rc1-dev-20181217-eeda455 Source file: src/gromacs/gpu_utils/cudautils.cuh (line 347) Function: void launchGpuKernel(void (*)(Args ...), const KernelLaunchConfig&, CommandEvent*, const char*, const std::array<void*, sizeof... (Args)>&) [with Args = {}; CommandEvent = void] Internal error (bug): GPU kernel (Dummy kernel) failed to launch: invalid device function For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors -------------------------------------------------------
Related issues
Associated revisions
History
#1 Updated by Szilárd Páll about 2 years ago
- Related to Task #2665: remove fermi support added
#2 Updated by Szilárd Páll about 2 years ago
- Description updated (diff)
- Priority changed from Normal to Low
#3 Updated by Szilárd Páll about 2 years ago
PS: the original check was also misplaced and should have been called in is_gmx_supported_gpu_id()
rather than at initialization as otherwise it can't detect the error that would occur already when first executing the dummy kernel.
#4 Updated by Szilárd Páll about 2 years ago
- Related to Bug #2811: CUDA binary target support check can't work added
#5 Updated by Gerrit Code Review Bot about 2 years ago
Gerrit received a related patchset '1' for Issue #2810.
Uploader: Szilárd Páll (pall.szilard@gmail.com)
Change-Id: gromacs~release-2019~I81bf12cca43a2a5e16d48d9faf4b9fc9627e4452
Gerrit URL: https://gerrit.gromacs.org/8844
#6 Updated by Szilárd Páll about 2 years ago
- Status changed from New to Fix uploaded
#7 Updated by Szilárd Páll about 2 years ago
- Status changed from Fix uploaded to Resolved
Applied in changeset ba3ab1703239c208f78ec98ea817e0a737ff264e.
#8 Updated by Paul Bauer about 2 years ago
- Status changed from Resolved to Closed
Correct CUDA compatibility check
The CUDA compatibility check became ineffective after the deprecation of
Fermi as it could not detect and flag these GPUs correctly as
"incompatible" but instead was throwing a kernel execution error when
trying to launch the sanity checker kernel.
This change does the minimal necessary corrections for the now
deprecated Fermi arch to be correctly detected. As expected, CPU
fallback can now automatically selected (unless the user request a GPU).
Minor refactoring was also necessary, but was kept at minimum.
Fixes #2810
Change-Id: I81bf12cca43a2a5e16d48d9faf4b9fc9627e4452