Project

General

Profile

Bug #2420

OpenCL implementation not doing device sanity checks

Added by Aleksei Iupinov over 1 year ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
core library
Target version:
Affected version - extra info:
probably all versions before that as well
Affected version:
Difficulty:
uncategorized
Close

Description

is_gmx_supported_gpu_id() calls do_sanity_checks() in CUDA implementation, even launching a dummy GPU kernel, but does nothing of the kind in OpenCL, only looking at device vendor and OS version.

Therefore, disabling devices with "sudo nvidia-smi -c 2" causes OpenCL mdrun to fail with CL_DEVICE_NOT_AVAILABLE on context creation.


Related issues

Related to GROMACS - Bug #2405: improve gpu_utils-testClosed
Related to GROMACS - Task #2515: clFFT RocM compatibility problemClosed

Associated revisions

Revision 383240b1 (diff)
Added by Aleksei Iupinov 6 months ago

Add a sanity check for OpenCL devices

Introduced some type traits to support RAII types for OpenCL handles,
so that we can safely free resources when responding to OpenCL API
error codes, and do so with usefully descriptive error messages.

The new infrastructure is used to implement a check that an OpenCL GPU
can execute a dummy kernel.

Fixed some broken docs, and updated some function naming for new
style.

Fixes #2420

Change-Id: Id99786d24f77b4b56669b5cfcd3a39aa0116cfca

History

#1 Updated by Aleksei Iupinov over 1 year ago

  • Related to Bug #2405: improve gpu_utils-test added

#2 Updated by Gerrit Code Review Bot over 1 year ago

Gerrit received a related patchset '1' for Issue #2420.
Uploader: Aleksei Iupinov ()
Change-Id: gromacs~master~I7f234e67787bc1815973027621abdc162501d6fe
Gerrit URL: https://gerrit.gromacs.org/7626

#3 Updated by Gerrit Code Review Bot about 1 year ago

Gerrit received a related patchset '1' for Issue #2420.
Uploader: Mark Abraham ()
Change-Id: gromacs~master~Id99786d24f77b4b56669b5cfcd3a39aa0116cfca
Gerrit URL: https://gerrit.gromacs.org/7786

#4 Updated by Aleksei Iupinov about 1 year ago

  • Related to Task #2515: clFFT RocM compatibility problem added

#5 Updated by Mark Abraham about 1 year ago

  • Target version changed from 2018.2 to 2018.3

The two fixes waiting in gerrit for follow up are on master branch, so I don't know offhand where we might fix this. It might not be important enough for release-2018.

#6 Updated by Szilárd Páll 11 months ago

  • Target version changed from 2018.3 to 2019

Mark Abraham wrote:

The two fixes waiting in gerrit for follow up are on master branch, so I don't know offhand where we might fix this. It might not be important enough for release-2018.

Agreed, if we find an easy-to-backport solution we can consider, but otherwise, it's only an improvement in terms of mdrun robustness that doesn't greatly affect its functionality.

#7 Updated by Gerrit Code Review Bot 6 months ago

Gerrit received a related patchset '1' for Issue #2420.
Uploader: Szilárd Páll ()
Change-Id: gromacs~release-2019~Id99786d24f77b4b56669b5cfcd3a39aa0116cfca
Gerrit URL: https://gerrit.gromacs.org/8839

#8 Updated by Szilárd Páll 6 months ago

  • Status changed from New to In Progress

#9 Updated by Aleksei Iupinov 6 months ago

  • Status changed from In Progress to Resolved

#10 Updated by Paul Bauer 6 months ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF