Integration planning for GROMACS 2020 GPU features » History » Version 7
Integration planning for GROMACS 2020 GPU features¶
We have a set of target functionality to integrate by 9/16. (TODO record that)
- GPU halo-exchange for positions. Coding assigned to Alan. Review assigned to Mark and Berk.
Status - code in gerrit, class+file+method naming input given, can merge when revised accordingly
- pp-pme exchange class. Coding assigned to Alan. Review assigned to Mark, Berk.
Status - awaiting Alan's update to existing https://gerrit.gromacs.org/c/gromacs/+/12783 following review last week
- simulation flags structs. Coding assigned to Szilard. Review assigned to Mark, Berk, Paul, Erik.
Status - WIP at https://gerrit.gromacs.org/c/gromacs/+/11622, would benefit from FTF discussion on how to describe lifetime and intent for this initial patch, whose form is understood to be unlike what we want in the long term
- Question: Does multi-domain GPU-halo-exchange force handling support adding extra CPU forces? Alan's answer: I believe it should do (if an H2D copy of CPU forces is placed before the halo exchange), since the non-local part from the remote GPU wil be accumulated into the local part of the force buffer.
- LF pressure-coupling support for GPU-based update. Coding assigned to Artem. Review assigned to TODO.
Status - P-R is plus2 on gerrit at https://gerrit.gromacs.org/c/gromacs/+/12477. Berendsen support easy to add in a future patch?
- tests covering .mdp nst* flags, in particular for well chosen co-prime cases to prevent unexpected behavior changes. Also can we test e.g. forces with FEP on non-output steps. Coding assigned to Mark. Review assigned to TODO.
Status - start work Monday afternoon
- Make GPU version of StatePropagatorData. Coding assigned to Artem. Review assigned to Erik, Mark.
Status - awaiting patch from Artem. Naming decision recorded at https://gerrit.gromacs.org/c/gromacs/+/11986
- Clean up to make a PP-rank "PME force receiver". Coding assigned to Mark. Review assigned to Alan, Paul, Berk.
Status: awaiting pp-pme exchange class stabilizing. Also review of clfft init change https://gerrit.gromacs.org/c/gromacs/+/12897.
- Class to contain code to receive PME-rank forces on PP-rank GPU buffers (avoiding CPU). Coding assigned to Alan. Review assigned to Szilard, Berk, Mark.
Status: awaiting pp-pme exchange class stabilizing
- Halo-exchange class for force+reduction. Coding assigned to Alan. Review assigned to Szilard, Mark, Berk.
Status: need feedback on position exchange to guide the similar choices here, otherwise Alan will just be refactoring existing code
- Stitching together high-level single-GPU logic to achieve performance. Coding assigned to Artem. Review assigned to Erik, Mark.
Status: Awaiting progress on GPU StatePropagatorData patch
- Lift cr->duty assignment out of init_domain_decomposition(). Coding assigned to Berk. Review assigned to Mark, Paul, Szilard.
- Class containing Pme-pp exchange for coordinate buffers. Coding Assigned to Alan. Review assigned to Szilard, Berk, Mark.
Status: New refactored patch needs developed - will be very similar to above Pme-pp force code so hopefully not much review will be required.
- refinement of simulation flag structs. Coding assigned to Szilard. Review assigned to Mark, Berk, Paul, Erik.
Status: awaiting submission of earlier patch
- Stitching together high-level multi-GPU logic to achieve performance. Coding assigned to TODO. Will need input from many people
Status: not started yet
- mdrun user interface + choices of defaults. Coding assigned to Paul. Review assigned to Alan, Artem, Mark.
Status: not started yet
Due last minute
- Link task assignment to simulation flags and high-level GPU logic. Coding assigned to TODO. Will need input from many people.
Status: clfft init patch needing review https://gerrit.gromacs.org/c/gromacs/+/12897. That prepares for a patch to manage GPU streams at high level and pass handles into individual modules that need to collaborate on the ~5 streams we identified, namely nonlocal, local, pme work, pp-pme xfer, update). Waiting on Berk's patch to pull cr->duty assignment out of init_domain_decomposition(), so we can rearrange order of operations in runner so that GPU device info is available earlier, so that GPU streams can be set up before init_domain_decomposition() and init_forcerec()