Project

General

Profile

Task #3159

Feature #2816: GPU offload / optimization for update&constraits, buffer ops and multi-gpu communication

Feature #2891: PME/PP GPU communications

eliminate regression due to moving gmx_pme_send_coordinates()

Added by Szilárd Páll about 1 month ago. Updated 19 days ago.

Status:
In Progress
Priority:
High
Assignee:
Category:
mdrun
Target version:
Difficulty:
uncategorized
Close

Description

gmx_pme_send_coordinates() is moved past the x H2D which will lead to regressions on the default code-path.
This needs to be made conditional and and done only
i) eliminate this in favor of direct CPU->GPU copy.
ii) if there is proof that there is benefit from doing a H2D followed by direct GPU->GPU is beneficial (which may be the case if we have PCIe between CPU<->GPU but NVLink between GPU<->GPU) -- side-note: that's why perhaps we should use nccl.

Associated revisions

Revision 5b594f3b (diff)
Added by Alan Gray about 1 month ago

GPU Receive for PME/PP GPU Force Communications

This change extends the PME/PP GPU force communication functionality
to allow the force buffer to be recieved direct to GPU memory on the
PP task.

Implements part of #2817
Refs #3158 #3159

Change-Id: I5b1cff1846c7c3bd966b6bf9c0af72769600ef18

Revision c5595a8e (diff)
Added by Alan Gray 30 days ago

GPU Coordinate PME/PP Communications

Extends PmePpCommGpu class to provide PP-side support for coordinate
transfers from either GPU or CPU to PME task, and adds new
PmeCoordinateReceiverGpu class to recieve coordinate data directly to
the GPU on the PME task.

Implements part of #2817
Refs TODOs #3157 #3158 #3159

Change-Id: Iefa2bdfd9813282ad8b07feeb7691f16880e61a2

History

#1 Updated by Alan Gray 19 days ago

  • Status changed from New to In Progress
  • Target version changed from 2020 to 2020-beta3

Direct CPU->GPU copy already works in master branch by setting sendCoordinatesFromGpu=true.

With NVLINK on DGX, it is definitely better to use existing mechanism otherwise there are two H2D PCI transfers (to PP and PME) on the same bus. For non-NVLINK, yes it should be better to go direct CPU->GPU. We can choose path based on whether peer access is enabled (and possibly later extend to instead query architecture using NVML which would give more detail). Working on this at the moment.

Also available in: Atom PDF