Project

General

Profile

Bug #3211

Lincs on GPU can fail due to incorrect assumptions on ordering

Added by Alan Gray 15 days ago. Updated 11 days ago.

Status:
Closed
Priority:
High
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

In the lincs GPU kernel we have code like this (L201 in lincs_cuda.cu)

int c1    = gm_coupledConstraintsIdxes[index];
float3 rc1        = sm_r[c1 - blockIdx.x * blockDim.x];

but for some cases (c1 - blockIdx.x * blockDim.x) is negative or greater than the CUDA block size, such that an out of bounds access to shared memory occurs. (In particular this fails for a large ~1M atom STMV case.) In other cases, it works as expected.

To reproduce:

Private change https://gerrit.gromacs.org/c/gromacs/+/14384 is based on the change that enables "-update gpu" with DD, and includes an assertion in the CPU setup code which fails when the index array contains an element that will cause an out-of-bounds shared memory access in the kernel.

stmv.tpr file (134M) available at https://drive.google.com/open?id=1xzM1VWpiaVR8D1o-owSLl1FEerDctnpS

export GMX_USE_GPU_BUFFER_OPS=1
export GMX_GPU_DD_COMMS=1
export GMX_GPU_PME_PP_COMMS=1

gmx mdrun -s stmv.tpr -ntomp 10 -pme gpu -nb gpu -ntmpi 4 -npme 1 -nsteps 100 -v -notunepme -pin on -bonded gpu -update gpu

Associated revisions

Revision 6e2b1f18 (diff)
Added by Artem Zhmurov 11 days ago

Fix index mapping issue with LINCS on GPU

In some cases (e.g. when DD is used), coupled constraints blocks
can have constraint indices that are interleaved with other blocks.
This was not taken into account when setting up the LINCS GPU data
structures. This patch fixes that by ensuring that all coupled
constraints are added to the mapping array one after the other.
Add note in decideWhetherToUseGpuForUpdate() that we need to add
a check for this when update auto will choose gpu in some cases.

Also renamed spaceNeeded by numCoupledConstraints to avoid confusion.

Fixes #3211.

Change-Id: Ib402c5cc9fc20116496034cdc11be921586bbd15

History

#1 Updated by Szilárd Páll 15 days ago

  • Target version deleted (2020-beta3)

Target TDB later.

#2 Updated by Berk Hess 15 days ago

  • Subject changed from Lincs GPU kernel with DD fails for some cases with out-of-bounds shared-memory access to Lincs on GPU can fail due to incorrect assumptions on ordering
  • Category set to mdrun
  • Assignee set to Artem Zhmurov
  • Target version set to 2020-beta3

The GPU LINCS code assumes coupled constraints are consecutively numbered, which is not the case with DD, but in even without DD is often but not always the case.

Additionally, an with large number of coupled constraints an assertion fails instead of issuing a nice error.

#3 Updated by Mark Abraham 11 days ago

Alan Gray wrote:

In the lincs GPU kernel we have code like this (L201 in lincs_cuda.cu)

int c1 = gm_coupledConstraintsIdxes[index];

float3 rc1 = sm_r[c1 - blockIdx.x * blockDim.x];

but for some cases (c1 - blockIdx.x * blockDim.x) is negative or greater than the CUDA block size, such that an out of bounds access to shared memory occurs. (In particular this fails for a large ~1M atom STMV case.) In other cases, it works as expected.

To reproduce:

Private change https://gerrit.gromacs.org/c/gromacs/+/14384 is based on the change that enables "-update gpu" with DD, and includes an assertion in the CPU setup code which fails when the index array contains an element that will cause an out-of-bounds shared memory access in the kernel.

Please make this public, because this fix cannot be tested without it

#4 Updated by Alan Gray 11 days ago

Please make this public, because this fix cannot be tested without it

Done

#5 Updated by Artem Zhmurov 11 days ago

  • Status changed from New to Resolved

#6 Updated by Mark Abraham 11 days ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF