Project

General

Profile

Bug #2409

PME gather CUDA kernel failing on Fermi

Added by Szilárd Páll almost 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
Normal
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

For inputs larger than ~350k:

Error while launching kernel pme_gather_kernel: invalid configuration argument

Reproduced on C2070 and GTX 580.

Associated revisions

Revision fa92dbed (diff)
Added by Aleksei Iupinov almost 2 years ago

Fix PME for large systems with Fermi GPUs

PME spread/gather CUDA kernel scheduling did not account for
compute capability limitations. Realistically this has only
caused it to fail on CC 2.x with input systems larger than
2^18 ~= 262k atoms. This is now fixed for all CUDA architectures.

Fixes #2409

Change-Id: I59295b5d53a341d08a221aebb52e1db9f1e80107

History

#1 Updated by Aleksei Iupinov almost 2 years ago

  • Assignee set to Aleksei Iupinov
  • Target version set to 2018.1

Thanks, reproduced on C2075 on gromacs3!

#2 Updated by Mark Abraham almost 2 years ago

  • Subject changed from PME gathee CUDA kernel failing on Fermi to PME gather CUDA kernel failing on Fermi

#3 Updated by Gerrit Code Review Bot almost 2 years ago

Gerrit received a related patchset '3' for Issue #2409.
Uploader: Aleksei Iupinov ()
Change-Id: gromacs~release-2018~I59295b5d53a341d08a221aebb52e1db9f1e80107
Gerrit URL: https://gerrit.gromacs.org/7584

#4 Updated by Aleksei Iupinov almost 2 years ago

  • Status changed from New to Resolved

#5 Updated by Mark Abraham almost 2 years ago

  • Status changed from Resolved to Closed

#6 Updated by Aleksei Iupinov almost 2 years ago

One thing TO DO here would be to have a huge input system sanity test, but Szilard brought up a good point that such a thing would fail for many users due to memory usage, and maybe should be an internal project rather.
Not important anyway.

#7 Updated by Mark Abraham almost 2 years ago

NB we have the GMX_DEVELOPER_BUILD cmake configuration that could be used to enable such things. And of course we'd generate the contents of such a test system rather than store the coordinates.

#8 Updated by Szilárd Páll almost 2 years ago

Mark Abraham wrote:

NB we have the GMX_DEVELOPER_BUILD cmake configuration that could be used to enable such things. And of course we'd generate the contents of such a test system rather than store the coordinates.

That could indeed be useful.

On a side-note it seems to me that it would be quite appropriate for Google test (and/or CTest) to support subsets of tests to not be compulsory and by default issue only warning/note when such a test fails; these tests can be turned compulsory and warnings emitted as failures in our controlled CI environment. This way a test that fails due to out-of-memory error (because the user's browser chews up 1.65 of the 2 Gb GPU memory) won't mark a unit test failed in the users' hands, but if it terminates with some weird error we may still learn about it. Can this be done with the available features?

Also available in: Atom PDF