Project

General

Profile

Task #3106

Task #3370: Further improvements to GPU Buffer Ops and Comms

Implement multiple pulses with GPU communication

Added by Berk Hess 9 months ago. Updated 5 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
core library
Target version:
Difficulty:
uncategorized
Close

Description

I think it is more work to write code to ensure that we only ever use one communication pulse than it is to implement multiple pulses in the GPU communication code. With one pulse we might run into issues with pressure coupling and PME tuning.
Implementing multiple pulses shouldn't be much more work than adding a for loop around the current communication.

Associated revisions

Revision fdf8c906 (diff)
Added by Alan Gray 5 months ago

Multiple pulses for GPU Halo Exchange

Removes restriction on single pulse.

Implements #3106

Change-Id: I5d68258de831d04c14d6c352fc52e51852fccd80

Revision f85dfceb (diff)
Added by Berk Hess 5 months ago

Remove 1 pulse DD request

Since the CUDA DD code now supports multiple pulses this is no longer
needed.

Refs #3106

Change-Id: I2db1d3cf45d9b0c814c0897a048bc5efb9f99e79

History

#1 Updated by Mark Abraham 9 months ago

  • Parent task set to #2890

#2 Updated by Szilárd Páll 9 months ago

Target 2020? I agree trying to avoid the >1 pulse case without disabling out load-balancing algorithms, seems risky, but disabling all both load balancers will impact performance quite significantly.

#3 Updated by Alan Gray 8 months ago

  • Target version set to future

Moving from other thread:

Additionally, single pulse communication limit, as we realized, can result in fragile behavior and requires a lot of support-scaffolding to make it possible (and safe) to use the GPU DD feature in its current form. We think that we have the necessary restrictions in place (we might want to add an extra check that might abort the run) but this aspect will require thorough testing, i.e. running multi-rank highly imbalanced runs with DLB on. Ideally we would like this limitation addressed, but I realize this is unlikely to happen before a final realease as we have no code yet (I assume?) and other tasks have higher priority. I also suspect such an addition would require a change set that is just too large a risk for breakage that post-beta2. If so, we should however consider it (together with the 1D DD limitation) among the first candidate improvements post-release.

I've not looked into this yet, so yes targeting a post-release version seems sensible. Please can someone provide me with/point me to a test case that invokes multiple pulses, to use for development.

#4 Updated by Alan Gray 7 months ago

  • Status changed from New to Fix uploaded

#5 Updated by Alan Gray 5 months ago

  • Status changed from Fix uploaded to Closed

#6 Updated by Alan Gray 5 months ago

  • Status changed from Closed to Fix uploaded
  • Parent task changed from #2890 to #3370

Re-opening and moving to subtask of #3370, so we don't lose the discussion.

#7 Updated by Alan Gray 5 months ago

  • Status changed from Fix uploaded to Closed

Also available in: Atom PDF