Project

General

Profile

Task #2965

Feature #2816: GPU offload / optimization for update&constraits, buffer ops and multi-gpu communication

Feature #2915: GPU direct communications

Performance of GPU direct communications

Added by Alan Gray 6 months ago. Updated 16 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Difficulty:
uncategorized
Close

Description

This issue is to track testing / evaluating usability / performance of the CUDA-aware MPI and direct copy implementations of multi-GPU communications (when does it work, when does it not, when is it faster, etc.).

Capture.JPG (82.7 KB) Capture.JPG Alan Gray, 11/22/2019 12:37 PM

History

#1 Updated by Alan Gray 6 months ago

  • Target version set to 2020

#2 Updated by Alan Gray 16 days ago

Latest performance results for new features on 4-GPU servers:

Al results in ns/day

Volta NVLink: 4xV100-SXM2+2xBroadwell
Volta PCIe: 4xV100-PCIe+2xHaswell
Pascal PCIe: 4xP100-PCIe+2xHaswell

STMV: 1,066,628 atoms
Cellulose: 408,609 atoms
ADHD: 95,561 atoms

Code version: https://gerrit.gromacs.org/c/gromacs/+/14402 (with debug print statement commented out).

[export GMX_USE_GPU_BUFFER_OPS=1] (for all except "Default")
[export GMX_GPU_DD_COMMS=1] (for "Halo")
[export GMX_GPU_PME_PP_COMMS=1] (for "PME-PP")
gmx mdrun -s topol.tpr -ntomp $OMP_NUM_THREADS -pme gpu -nb gpu -ntmpi 4 -npme 1 -nsteps 10000 -resethway -v -notunepme -pin on -bonded gpu -noconfout -gpu_id 0123 -nstlist 200 \
[-update gpu] (for "Update")

Also available in: Atom PDF