Task #3170
Feature #2816: GPU offload / optimization for update&constraits, buffer ops and multi-gpu communication
Feature #2817: GPU X/F buffer ops
Feature #3029: GPU force buffer ops + reduction
investigate GPU f buffer ops use cases
Description
Check if there is any performance benefits to be had and in which regimes for x / f buffer opts without GPU update in:
- runs with DD and CPU update
- x buffer ops: offloadable with a likely simple crossover heuristic threshold; i.e. below N atoms/core not offloaded (locals or also nonlocals, with/without CPU work?)
- f buffer ops: heuristics likely more complex criteria (as it is combined with reductions)
- runs with / without DD and vsites
- with GPU update requires D2H and H2D -- is it worth it, test use-cases (e.g. multiple ranks per GPU, both ensemble and DD runs, transfers might be overlapped)
- without GPU update: same applies as above non-vistes runs just wait on D2H needs to be earlier
Related issues
Associated revisions
History
#1 Updated by Szilárd Páll over 1 year ago
- Subject changed from investigate GPU f buffer ops + vsites use case to investigate GPU f buffer ops use cases
- Description updated (diff)
#2 Updated by Szilárd Páll over 1 year ago
- Parent task set to #3029
#3 Updated by Szilárd Páll over 1 year ago
- Related to Task #3171: schedule CPU H2D force contribution in separate stream added
#4 Updated by Paul Bauer about 1 year ago
- Target version changed from 2020 to 2021
#5 Updated by Alan Gray about 1 year ago
- Status changed from New to Closed
Moved to umbrella task https://redmine.gromacs.org/issues/3370
Allow overlapping CPU force H2D with compute
The reduction orchestration code already uses explicit sync event
in all cases and StateGpu implements the ability to schedule force
H2D in a separate stream for the "All" locality.
Hence, this change switches for non-DD runs the CPU force H2D to be done
in the update stream to allow overlap with force work in the local
stream.
Refs #3170 #3029
Change-Id: Iceb9aac395335c062109d552d3f0289688a9c75f