Project

General

Profile

Bug #1990

LJ-PME unstable with OpenCL

Added by Berk Hess over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Jenkins tests sometimes, but not always, fail with OpenCL for complex.nbnxn-ljpme-LB-geometric:
gmx: /home/jenkins/workspace/Gromacs_Gerrit_2016_presubmit/fdad3ebf/gromacs/src/gromacs/mdlib/nbnxn_ocl/nbnxn_ocl_data_mgmt.cpp:842: void nbnxn_ocl_clear_f(gmx_nbnxn_ocl_t*, int): Assertion `cl_error == 0' failed.
Aborted (core dumped)


Related issues

Related to GROMACS - Bug #1693: Jenkins Tests seldomly failingClosed
Related to GROMACS - Bug #1871: segfaults in three regressiontests with NVIDIA OpenCL multi-GPU runsClosed
Related to GROMACS - Bug #2502: nonbonded interactions go missing with GPU when an empty domain goes non-emptyClosed

Associated revisions

Revision c71bc506 (diff)
Added by Berk Hess over 3 years ago

Fix OpenCL error with empty domains

We now don't call the force clearing when there are zero elements
to clear, as can happen with an empty domain with DD.
Also simplified the clearing thread count calculation.

Fixes #1990.

Change-Id: Idc3e42140ac73714475af0918febbf4cac8e43f7

History

#1 Updated by Mark Abraham over 3 years ago

  • Related to Bug #1693: Jenkins Tests seldomly failing added

#2 Updated by Berk Hess over 3 years ago

Note that there is only on ljpme test using OpenCL, the other two use the group scheme. So I suspect all ljpme runs are affected by this issue.

The code looks fine to me. The only thing I can't see is if the force buffer reallocation is guaranteed to happen after the force clearing is finished. If not, this could explain the failure. Note that in that case all assertion failures should occur at DD (=nstlist) steps.

#3 Updated by Gerrit Code Review Bot over 3 years ago

Gerrit received a related patchset '1' for Issue #1990.
Uploader: Berk Hess ()
Change-Id: Ia632eda2376c4c376df7cb2eb7a4fc95d77216af
Gerrit URL: https://gerrit.gromacs.org/5952

#4 Updated by Gerrit Code Review Bot over 3 years ago

Gerrit received a related DRAFT patchset '1' for Issue #1990.
Uploader: Berk Hess ()
Change-Id: Iebd6a363e86f9b2ccfbec9ed349499a2e8ea9b49
Gerrit URL: https://gerrit.gromacs.org/5953

#5 Updated by Gerrit Code Review Bot over 3 years ago

Gerrit received a related patchset '1' for Issue #1990.
Uploader: Berk Hess ()
Change-Id: Idc3e42140ac73714475af0918febbf4cac8e43f7
Gerrit URL: https://gerrit.gromacs.org/5955

#6 Updated by Berk Hess over 3 years ago

  • Status changed from New to Fix uploaded
  • Target version changed from 2016 to 5.1.3
  • Affected version changed from 2016 to 5.1.2

This was a simple issue: the force clearing kernel was called with 0 total theads, which is not allowed.
Uploaded a fix to release-5-1.

#7 Updated by Berk Hess over 3 years ago

  • Status changed from Fix uploaded to Resolved

#8 Updated by Mark Abraham over 3 years ago

  • Status changed from Resolved to Closed

#9 Updated by Mark Abraham about 2 years ago

  • Related to Bug #1871: segfaults in three regressiontests with NVIDIA OpenCL multi-GPU runs added

#10 Updated by Mark Abraham over 1 year ago

  • Related to Bug #2502: nonbonded interactions go missing with GPU when an empty domain goes non-empty added

Also available in: Atom PDF