Project

General

Profile

Task #2965

Task #3370: Further improvements to GPU Buffer Ops and Comms

Performance of GPU direct communications

Added by Alan Gray about 1 year ago. Updated 6 months ago.

Status:
In Progress
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Difficulty:
uncategorized
Close

Description

This issue is to track testing / evaluating usability / performance of the CUDA-aware MPI and direct copy implementations of multi-GPU communications (when does it work, when does it not, when is it faster, etc.).

Capture.JPG (82.7 KB) Capture.JPG Alan Gray, 11/22/2019 12:37 PM

History

#1 Updated by Alan Gray about 1 year ago

  • Target version set to 2020

#2 Updated by Alan Gray 9 months ago

Latest performance results for new features on 4-GPU servers:

Al results in ns/day

Volta NVLink: 4xV100-SXM2+2xBroadwell
Volta PCIe: 4xV100-PCIe+2xHaswell
Pascal PCIe: 4xP100-PCIe+2xHaswell

STMV: 1,066,628 atoms
Cellulose: 408,609 atoms
ADHD: 95,561 atoms

Code version: https://gerrit.gromacs.org/c/gromacs/+/14402 (with debug print statement commented out).

[export GMX_USE_GPU_BUFFER_OPS=1] (for all except "Default")
[export GMX_GPU_DD_COMMS=1] (for "Halo")
[export GMX_GPU_PME_PP_COMMS=1] (for "PME-PP")
gmx mdrun -s topol.tpr -ntomp $OMP_NUM_THREADS -pme gpu -nb gpu -ntmpi 4 -npme 1 -nsteps 10000 -resethway -v -notunepme -pin on -bonded gpu -noconfout -gpu_id 0123 -nstlist 200 \
[-update gpu] (for "Update")

#3 Updated by Paul Bauer 7 months ago

  • Target version changed from 2020 to 2021

#4 Updated by Alan Gray 6 months ago

  • Status changed from New to In Progress
  • Parent task changed from #2915 to #3370

Also available in: Atom PDF