gmx_pme_send_coordinates() is moved past the x H2D which will lead to regressions on the default code-path.
This needs to be made conditional and and done only
i) eliminate this in favor of direct CPU->GPU copy.
ii) if there is proof that there is benefit from doing a H2D followed by direct GPU->GPU is beneficial (which may be the case if we have PCIe between CPU<->GPU but NVLink between GPU<->GPU) -- side-note: that's why perhaps we should use nccl.
GPU Coordinate PME/PP Communications
Extends PmePpCommGpu class to provide PP-side support for coordinate
transfers from either GPU or CPU to PME task, and adds new
PmeCoordinateReceiverGpu class to recieve coordinate data directly to
the GPU on the PME task.
- Status changed from New to In Progress
- Target version changed from 2020 to 2020-beta3
Direct CPU->GPU copy already works in master branch by setting sendCoordinatesFromGpu=true.
With NVLINK on DGX, it is definitely better to use existing mechanism otherwise there are two H2D PCI transfers (to PP and PME) on the same bus. For non-NVLINK, yes it should be better to go direct CPU->GPU. We can choose path based on whether peer access is enabled (and possibly later extend to instead query architecture using NVML which would give more detail). Working on this at the moment.