Project

General

Profile

Task #2464

GPU performance goals overview

Added by Aleksei Iupinov about 2 years ago. Updated about 2 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Difficulty:
hard
Close

Description

This is an overview/place of dicussion of potential goals for GPU improvements in Gromacs.
We have 3 distinct cases, which we can prioritize differently.

Single GPU simulation:
  • shared inputs to PME/NB (coordination layout transformation kernel/indexing functions; avoiding redundant H2D copy);
  • shared outputs to PME/NB (force buffer layout transformation kernel/indexing functions; atomic reduction of output forces into same device-side buffer);
  • potential incremental PME kernel improvements from #2402 would be relevant to the single GPU case the most.
Multiple GPUs/single PME GPU rank simulation:
  • consider best way of communicating coordinates from PP ranks to the PME rank.
    • CUDA-aware MPI is an option, but it needs a clean, viable code fallback.
    • Can we use multiple contexts/GPUs within single rank instead? They are really trivial to implement for the testing purposes. Will there be benefit?
  • evaluate pipelining the H2D coordinates copy and multiple spread launches on the PME rank. Would require one hopefully small change, which is also required for the PME GPU decomposition (#2463): teaching spread to work with different chunks of atom data. Here it would also happen in multiple streams, while accumulating onto same grid.
Multiple GPUs/multiple PME GPU ranks simulation:
  • the PME GPU decomposition is briefly described at #2463;
  • short-term goal is mixed mode (spread/gather decomposition only)
  • possible way to stay-on GPU is to do redundant cuFFT the whole grid (after spread) on multiple GPUs (possibly use GPUDirect for gathering grid to all GPUs).

Subtasks

Task #2402: PME kernels general performance improvementsNew
Task #2463: PME GPU decompositionNew

History

#1 Updated by Aleksei Iupinov about 2 years ago

  • Subject changed from GPU performance attack angles to GPU performance goals overview

#2 Updated by Aleksei Iupinov about 2 years ago

  • Description updated (diff)

#3 Updated by Aleksei Iupinov about 2 years ago

  • Description updated (diff)

#4 Updated by Szilárd Páll about 2 years ago

  • Description updated (diff)

#5 Updated by Aleksei Iupinov about 2 years ago

  • Private changed from Yes to No

Also available in: Atom PDF