Feature #3087
Task #3370: Further improvements to GPU Buffer Ops and Comms
Feature #2915: GPU direct communications
enable GPU peer to peer access
Description
For efficient direct GPU communications peer to peer access between GPUs in the run should be enabled.
This functionality should however be implemented such that all/most errors are handled explicitly and the function only aborts the run if a to be fatal error is detected, otherwise, as it is only a performance concern the run should continue.
Related: current working assumption is that even if peer access is not enabled direct copy should not be sower than staged copy, but as we are not sure, we might want to consider disabling the GPU direct copy if enabling peer access fails.
Related issues
Associated revisions
History
#1 Updated by Szilárd Páll over 1 year ago
- Related to Feature #2890: GPU Halo Exchange added
#2 Updated by Szilárd Páll over 1 year ago
- Related to Feature #2891: PME/PP GPU communications added
#3 Updated by Alan Gray over 1 year ago
- Status changed from New to Closed
Enable GPU Peer Access in GPU Utilities
When using the new GPU communication features, enabling peer access
between pairs of GPUs (where supported) will allow peer-to-peer
communications. In this patch the CUDA code to enable peer access is
introduced into central GPU utilities and called from do_md.
Implements #3087
Change-Id: If668366b76d49f7b624eedb501f8af19135c4386