Project

General

Profile

Task #2882

evaluate different storage layouts for GPU coordinates/changes/forces

Added by Szilárd Páll 9 months ago. Updated 9 months ago.

Status:
New
Priority:
Normal
Category:
mdrun
Target version:
-
Difficulty:
hard
Close

Description

In GPU code we currently use two different AoS layouts:
- for coordinates: xyzq for coordinates & charges in the nonbonded kernels and xyz / separate q in PME
- for forces: xyz everywhere

There are significant drawbacks to using AoS layot with 3-element short vectors (at least 2x global memory transactions, shared/local memory bank conflicts). Padding to 4 elements to be able to use 16-byte/thread vectorized gmem loads does have a 33% extra bandwidth need and the same amount of overhead translates to the amount of shared/local memory needed, but this will often not pose a limitation.

At the same time, while AoS is convenient, SoA does avoid the above AoS drawbacks but it can translate into wasted L1/L2 cache in case of scattered access patterns.

We should evaluate the options and decide whether we can live with a single storage layout across GPU kernels.

History

#1 Updated by Szilárd Páll 9 months ago

First to consider is the impact of getting rid of xqzq. WIP in CUDA, but we need to assess OpenCL too.

#2 Updated by Gerrit Code Review Bot 9 months ago

Gerrit received a related DRAFT patchset '4' for Issue #2882.
Uploader: Szilárd Páll ()
Change-Id: gromacs~master~I3c3cf2123cedaf5d11a67c732169bd2e17aafc91
Gerrit URL: https://gerrit.gromacs.org/9263

Also available in: Atom PDF