For efficient direct GPU communications peer to peer access between GPUs in the run should be enabled.
This functionality should however be implemented such that all/most errors are handled explicitly and the function only aborts the run if a to be fatal error is detected, otherwise, as it is only a performance concern the run should continue.
Related: current working assumption is that even if peer access is not enabled direct copy should not be sower than staged copy, but as we are not sure, we might want to consider disabling the GPU direct copy if enabling peer access fails.
Enable GPU Peer Access in GPU Utilities
When using the new GPU communication features, enabling peer access
between pairs of GPUs (where supported) will allow peer-to-peer
communications. In this patch the CUDA code to enable peer access is
introduced into central GPU utilities and called from do_md.