Task #3106

Feature #2816: GPU offload / optimization for update&constraits, buffer ops and multi-gpu communication

Feature #2890: GPU Halo Exchange

Implement multiple pulses with GPU communication

Added by Berk Hess 4 months ago. Updated about 1 month ago.

Fix uploaded
core library
Target version:


I think it is more work to write code to ensure that we only ever use one communication pulse than it is to implement multiple pulses in the GPU communication code. With one pulse we might run into issues with pressure coupling and PME tuning.
Implementing multiple pulses shouldn't be much more work than adding a for loop around the current communication.


#1 Updated by Mark Abraham 4 months ago

  • Parent task set to #2890

#2 Updated by Szilárd Páll 4 months ago

Target 2020? I agree trying to avoid the >1 pulse case without disabling out load-balancing algorithms, seems risky, but disabling all both load balancers will impact performance quite significantly.

#3 Updated by Alan Gray 3 months ago

  • Target version set to future

Moving from other thread:

Additionally, single pulse communication limit, as we realized, can result in fragile behavior and requires a lot of support-scaffolding to make it possible (and safe) to use the GPU DD feature in its current form. We think that we have the necessary restrictions in place (we might want to add an extra check that might abort the run) but this aspect will require thorough testing, i.e. running multi-rank highly imbalanced runs with DLB on. Ideally we would like this limitation addressed, but I realize this is unlikely to happen before a final realease as we have no code yet (I assume?) and other tasks have higher priority. I also suspect such an addition would require a change set that is just too large a risk for breakage that post-beta2. If so, we should however consider it (together with the 1D DD limitation) among the first candidate improvements post-release.

I've not looked into this yet, so yes targeting a post-release version seems sensible. Please can someone provide me with/point me to a test case that invokes multiple pulses, to use for development.

#4 Updated by Alan Gray about 1 month ago

  • Status changed from New to Fix uploaded

Also available in: Atom PDF