Integration planning for GROMACS 2020 GPU features¶
We have a set of target functionality to integrate by 9/16. (TODO record that)
- GPU halo-exchange for positions. Coding assigned to Alan. Review assigned to Mark and Berk.
Status - code in gerrit, class+file+method naming input given, can merge when revised accordingly
- pp-pme exchange class. Coding assigned to Alan. Review assigned to Mark, Berk.
Status - https://gerrit.gromacs.org/c/gromacs/+/12783 awaiting further review/merge.
Reviewed/Ready, merge pending rebase (StatePropagator upstream improvements?)
- simulation flags structs. Coding assigned to Szilard. Review assigned to Mark, Berk, Paul, Erik.
Status - WIP at https://gerrit.gromacs.org/c/gromacs/+/11622, would benefit from FTF discussion on how to describe lifetime and intent for this initial patch, whose form is understood to be unlike what we want in the long term
Question: Does multi-domain GPU-halo-exchange force handling support adding extra CPU forces?Alan's answer: I believe it should do (if an H2D copy of CPU forces is placed before the halo exchange), since the non-local part from the remote GPU wil be accumulated into the local part of the force buffer.
- LF pressure-coupling support for GPU-based update. Coding assigned to Artem. Review assigned to TODO.
Status - P-R is plus2 on gerrit at https://gerrit.gromacs.org/c/gromacs/+/12477. Berendsen support easy to add in a future patch?
- tests covering .mdp nst* flags, in particular for well chosen co-prime cases to prevent unexpected behavior changes. Also can we test e.g. forces with FEP on non-output steps. Coding assigned to Mark. Review assigned to TODO.
Status - start work Monday afternoon
- Make GPU version of StatePropagatorData. Coding assigned to Artem. Review assigned to Erik, Mark.
Status - awaiting patch from Artem. Naming decision recorded at https://gerrit.gromacs.org/c/gromacs/+/11986
DONE; follow-up WIP linking tasks with event dependencies
- Clean up to make a PP-rank "PME force receiver". Coding assigned to Mark. Review assigned to Alan, Paul, Berk.
Status: awaiting pp-pme exchange class stabilizing. Also review of clfft init change https://gerrit.gromacs.org/c/gromacs/+/12897.
- Code to receive PME-rank forces on PP-rank GPU buffers (avoiding CPU). Coding assigned to Alan. Review assigned to Szilard, Berk, Mark.
Status: Code at https://gerrit.gromacs.org/c/gromacs/+/12980 awaiting review.
Reviewed (Szilard); merge pending rebase
- Halo-exchange class for force. Coding assigned to Alan. Review assigned to Szilard, Mark, Berk.
Status: Code in Gerrit at https://gerrit.gromacs.org/c/gromacs/+/12943. Awaiting review.
- Stitching together high-level single-GPU logic to achieve performance. Coding assigned to Artem. Review assigned to Erik, Mark.
Status: Awaiting progress on GPU StatePropagatorData patch
Status: WIP, coordinates producers' linking to PME/buf ops: https://gerrit.gromacs.org/c/gromacs/+/13483, https://gerrit.gromacs.org/c/gromacs/+/13484/3
- Lift cr->duty assignment out of init_domain_decomposition(). Coding assigned to Berk. Review assigned to Mark, Paul, Szilard.
- Class containing Pme-pp exchange for coordinate buffers. Coding Assigned to Alan. Review assigned to Szilard, Berk, Mark.
Status: Code at https://gerrit.gromacs.org/c/gromacs/+/13043 awaiting review.
- refinement of simulation flag structs. Coding assigned to Szilard. Review assigned to Mark, Berk, Paul, Erik.
awaiting submission of earlier patch
DONE / follow-up needed: SimulationWorkload flags prepared after task assignment
- Stitching together high-level multi-GPU logic to achieve performance. Coding assigned to TODO. Will need input from many people
Status: precursors WIP (13427, 13494); related tasks on #2890
- mdrun user interface + choices of defaults. Coding assigned to Paul. Review assigned to Alan, Artem, Mark.
not started yet
_Partially DONE; follow-up needed: StimulationWorkload, refine defaults selection.
Due last minute
- Link task assignment to simulation flags and high-level GPU logic. Coding assigned to TODO. Will need input from many people.
Status: clfft init patch needing review https://gerrit.gromacs.org/c/gromacs/+/12897. That prepares for a patch to manage GPU streams at high level and pass handles into individual modules that need to collaborate on the ~5 streams we identified, namely nonlocal, local, pme work, pp-pme xfer, update). Waiting on Berk's patch to pull cr->duty assignment out of init_domain_decomposition(), so we can rearrange order of operations in runner so that GPU device info is available earlier, so that GPU streams can be set up before init_domain_decomposition() and init_forcerec()
Status: data structures ready for simulation flags to be populated (StimulationWorkload) after task-assignment, a StimulationWorkloadBuilder is needed that takes the task-assignment output as well as some of the dev feature flags and makes a runtime-constant workload descriptor that can be passed down to do_force?(). Note that some of the current
c_useFeatureXflags are used to compute a per-step flag with overrides (like buffer ops), but others have the overrides built in the global boolean construction (like
c_enableGpuHaloExchange). Distinction needs to be made between i) per-step overrides ii) feature requirements/static overrides that might be reasonable to include in StimulationWorkload flags or might be best to assert on.