Project

General

Profile

Bug #2779

Error when using a large PME grid on a GPU

Added by Grégoire Gschwend about 1 year ago. Updated 12 months ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
mdrun
Target version:
Affected version - extra info:
2018.2
Affected version:
Difficulty:
uncategorized
Close

Description

GROMCAS 2018.2 displays the following error message

"Error while launching kernel pme_solve_kernel: invalid argument"

when launching the simulation with the command:

gmx mdrun -v -deffnm nvt -ntmpi 16 -ntomp 2 -npme 1 -pme gpu -nb gpu

however, the simulation works fine with:

gmx mdrun -v -deffnm nvt -ntmpi 16 -ntomp 2

It seems that the error appears with relatively large systems (box of 10 nm x 10 nm x 40 nm). I could not reproduce the error with a system of 3 nm x 3 nm x 9 nm.

nvt_wall.mdp (745 Bytes) nvt_wall.mdp Parameter file Grégoire Gschwend, 11/27/2018 04:13 PM
nvt.tpr (12.1 MB) nvt.tpr File generating the error Grégoire Gschwend, 11/27/2018 04:13 PM
em.gro (22.6 MB) em.gro Input configuration for grompp after energy minimisation Grégoire Gschwend, 11/27/2018 04:13 PM

Associated revisions

Revision aa2305c1 (diff)
Added by Berk Hess about 1 year ago

Make large PME grids work on GPU

With PME grids with z size larger than 511 too large blocks could
be launched causing a cryptic CUDA error.

Fixes #2779

Change-Id: I0833609f64ad2e0ad6b7a799cf2b693f2dec3939

Revision c15057c7 (diff)
Added by Berk Hess 12 months ago

Make large PME grids work on GPU

With PME grids with z size larger than 511 too large blocks could
be launched causing a cryptic CUDA or OpenCL error.

Fixes #2779

Change-Id: Ib2376ae0e9d5a338084df8f3a2cf46ca1b711a6a

History

#1 Updated by Paul Bauer about 1 year ago

I can reproduce this on my machine with v2018.2 and with the current HEAD of release 2018

#2 Updated by Paul Bauer about 1 year ago

  • Status changed from New to Accepted

#3 Updated by Berk Hess about 1 year ago

  • Status changed from Accepted to In Progress

I suppose this is the threads per block limit issue that is attempted to be fixed by: https://gerrit.gromacs.org/#/c/8709/

#4 Updated by Paul Bauer about 1 year ago

Still dies with https://gerrit.gromacs.org/#/c/8709/ and this error

Program:     gmx mdrun, version 2019-beta3-dev-20181116-2bdca7b34
Source file: src/gromacs/gpu_utils/cudautils.cuh (line 347)
Function:    void launchGpuKernel(void (*)(Args ...), const KernelLaunchConfig&, CommandEvent*, const char*, const std::array<void*, sizeof... (Args)>&) [with Args = {PmeGpuCudaKernelParams}; CommandEvent = void]
MPI rank:    15 (out of 16)

Internal error (bug):
GPU kernel (PME solve) failed to launch: invalid argument

#5 Updated by Berk Hess about 1 year ago

Try changing to 32 instead of 16 :)

#6 Updated by Paul Bauer about 1 year ago

gmx mdrun -v -deffnm nvt -ntmpi 32 -ntomp 2 -npme 1 -pme gpu -nb gpu
Program:     gmx mdrun, version 2019-beta3-dev-20181116-2bdca7b34
Source file: src/gromacs/domdec/domdec_setup.cpp (line 764)
MPI rank:    0 (out of 32)

Fatal error:
The number of ranks you selected (31) contains a large prime factor 31. In
most cases this will lead to bad performance. Choose a number with smaller
prime factors or set the decomposition (option -dd) manually.

For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors

#7 Updated by Paul Bauer about 1 year ago

So, this also happens with -ntmpi 33, -ntmpi 17, -ntmpi 5, -ntmpi 2.
It also gives the error

Internal error (bug):
GPU kernel (PME solve) failed to launch: invalid argument

#8 Updated by Berk Hess about 1 year ago

I meant 32 warps per block instead of the 16 the change set. But I meant it half joking, since this is obviously not a solution. One can always come up with a larger system. Instead we could e.g. launch the kernel multiple times.

#9 Updated by Berk Hess about 1 year ago

  • Subject changed from Error when using 3dc Ewald with dedicated pme rank on GPU to Error when using a large PME grid on a GPU

I changed the subject to what I think the actual issue is.

#10 Updated by Paul Bauer about 1 year ago

Sorry that I misunderstood you there!

#11 Updated by Berk Hess about 1 year ago

We are not at all at a block count limit. The issue is that we assign one (or more) grid line to a block, so we can't have more grid pints than 32*#warps_per_block along the minor dimension (currently always z, I think).

Is it easy to change this?

#12 Updated by Gerrit Code Review Bot about 1 year ago

Gerrit received a related patchset '1' for Issue #2779.
Uploader: Berk Hess ()
Change-Id: gromacs~release-2019~If8c126f0a18fc6291f459d1370c4e834cd46d252
Gerrit URL: https://gerrit.gromacs.org/8778

#13 Updated by Gerrit Code Review Bot about 1 year ago

Gerrit received a related patchset '1' for Issue #2779.
Uploader: Berk Hess ()
Change-Id: gromacs~release-2018~I9908f7742b80552d4ba29dcf707103fb4c5a3efd
Gerrit URL: https://gerrit.gromacs.org/8779

#14 Updated by Berk Hess about 1 year ago

  • Category set to mdrun
  • Status changed from In Progress to Fix uploaded
  • Target version set to 2018.5

I uploaded a fix to release-2018 and release-2019 that adds a check for grid size along Z <= 512.

Without changing the complex kernel indexing, we could increase this limit to 2048, which would cover all practical cases. But optimal performance for small and medium setups with CUDA seems indeed to be at 512 threads per block. So ideally we would only use more threads when needed. This is a bit cumbersome in the current code though.

#15 Updated by Gerrit Code Review Bot about 1 year ago

Gerrit received a related patchset '1' for Issue #2779.
Uploader: Berk Hess ()
Change-Id: gromacs~release-2019~Ic8f7e08a934db58e47a3eccc52e6a8eec9be3870
Gerrit URL: https://gerrit.gromacs.org/8780

#16 Updated by Gerrit Code Review Bot about 1 year ago

Gerrit received a related patchset '1' for Issue #2779.
Uploader: Berk Hess ()
Change-Id: gromacs~release-2018~I0833609f64ad2e0ad6b7a799cf2b693f2dec3939
Gerrit URL: https://gerrit.gromacs.org/8781

#17 Updated by Gerrit Code Review Bot about 1 year ago

Gerrit received a related patchset '1' for Issue #2779.
Uploader: Berk Hess ()
Change-Id: gromacs~release-2019~If5ed8534f518c4e553ec4abdb47d28b86736815e
Gerrit URL: https://gerrit.gromacs.org/8782

#18 Updated by Gerrit Code Review Bot about 1 year ago

Gerrit received a related patchset '1' for Issue #2779.
Uploader: Berk Hess ()
Change-Id: gromacs~release-2019~Ib2376ae0e9d5a338084df8f3a2cf46ca1b711a6a
Gerrit URL: https://gerrit.gromacs.org/8783

#19 Updated by Berk Hess 12 months ago

  • Status changed from Fix uploaded to Resolved

#20 Updated by Berk Hess 12 months ago

#21 Updated by Paul Bauer 12 months ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF