Task #3370: Further improvements to GPU Buffer Ops and Comms
rework GPU direct halo-exchange related force reduction complexities
Force reduction is now done in two stages, if there is halo exchange CPU contribution is already reduced with communicated data early, while in other cases the transfer of force this happens later. The current mechanism also relies on position-dependent code with leading to implicit dependencies rather than explicit event-based sync with a record closely succeeding the producer and an eventWait enqueue at preceding the consumer task.
GPU Force Halo Exchange
Activate with GMX_GPU_DD_COMMS environment variable.
Extends GPU Halo exchange feature to provide GPU Force halo exchange
functionality. Does not yet support virial steps, which require an
extra shift force reduction - these are currently performed on the
non-buffer ops / non direct-comm path. Also has same limitations as
coordinate halo exchange.
Enable StatePropagatorGpuData for force transfers
Force transfers have been switched to use StatePropagatorGpuData already
before. This change updates the synchronization mechanisms as:
- replaces the previous stream sync after GPU buffer/ops reduction with
a waitForcesReadyOnHost call;
- removes the barriers in copyForces[From|To]Gpu() as dependencies
are now satisfied: most dependencies are intra-stream and therefore
implicit, the exception being the halo exchange that uses its own
mechanism to sync H2D in the local stream with the nonlocal stream
(which is yet to be replaces Refs #3093).
Event-based Dependency for GPU Force Halo Exchange
Introduces new event recorded when exchanged forces are ready on GPU,
and passes this into force buffer ops using dependencyList. Removes previous
mechanism of forcing local stream to wait on non-local stream.
#5 Updated by Szilárd Páll 5 months ago
Alan Gray wrote:
Thanks for the update. I've just flagged the former which shows 3 additional tests failing on the gpucomm matrix compared to previous triggers.
I think we need to fix correctness issues of the code before we can really move forward with new changes.
#7 Updated by Szilárd Páll 4 months ago
Paul Bauer wrote:
has this here been resolved?
Partly, but not fully. We still have a conditionality of when do we upload local forces to the GPU based on the code-path, which I think is undesired code complexity. There is however no room for this in the release branch. Should be bumped to later (but preferably not to "infrastructure-stable").