I think it is more work to write code to ensure that we only ever use one communication pulse than it is to implement multiple pulses in the GPU communication code. With one pulse we might run into issues with pressure coupling and PME tuning.
Implementing multiple pulses shouldn't be much more work than adding a for loop around the current communication.
- Target version set to future
Moving from other thread:
Additionally, single pulse communication limit, as we realized, can result in fragile behavior and requires a lot of support-scaffolding to make it possible (and safe) to use the GPU DD feature in its current form. We think that we have the necessary restrictions in place (we might want to add an extra check that might abort the run) but this aspect will require thorough testing, i.e. running multi-rank highly imbalanced runs with DLB on. Ideally we would like this limitation addressed, but I realize this is unlikely to happen before a final realease as we have no code yet (I assume?) and other tasks have higher priority. I also suspect such an addition would require a change set that is just too large a risk for breakage that post-beta2. If so, we should however consider it (together with the 1D DD limitation) among the first candidate improvements post-release.
I've not looked into this yet, so yes targeting a post-release version seems sensible. Please can someone provide me with/point me to a test case that invokes multiple pulses, to use for development.