Integration planning for GROMACS 2020 GPU features » History » Version 9
Integration planning for GROMACS 2020 GPU features¶
We have a set of target functionality to integrate by 9/16. (TODO record that)
- GPU halo-exchange for positions. Coding assigned to Alan. Review assigned to Mark and Berk.
Status - code in gerrit, class+file+method naming input given, can merge when revised accordingly
- pp-pme exchange class. Coding assigned to Alan. Review assigned to Mark, Berk.
Status - https://gerrit.gromacs.org/c/gromacs/+/12783 awaiting further review/merge.
- simulation flags structs. Coding assigned to Szilard. Review assigned to Mark, Berk, Paul, Erik.
Status - WIP at https://gerrit.gromacs.org/c/gromacs/+/11622, would benefit from FTF discussion on how to describe lifetime and intent for this initial patch, whose form is understood to be unlike what we want in the long term
Question: Does multi-domain GPU-halo-exchange force handling support adding extra CPU forces?Alan's answer: I believe it should do (if an H2D copy of CPU forces is placed before the halo exchange), since the non-local part from the remote GPU wil be accumulated into the local part of the force buffer.
- LF pressure-coupling support for GPU-based update. Coding assigned to Artem. Review assigned to TODO.
Status - P-R is plus2 on gerrit at https://gerrit.gromacs.org/c/gromacs/+/12477. Berendsen support easy to add in a future patch?
- tests covering .mdp nst* flags, in particular for well chosen co-prime cases to prevent unexpected behavior changes. Also can we test e.g. forces with FEP on non-output steps. Coding assigned to Mark. Review assigned to TODO.
Status - start work Monday afternoon
- Make GPU version of StatePropagatorData. Coding assigned to Artem. Review assigned to Erik, Mark.
Status - awaiting patch from Artem. Naming decision recorded at https://gerrit.gromacs.org/c/gromacs/+/11986
- Clean up to make a PP-rank "PME force receiver". Coding assigned to Mark. Review assigned to Alan, Paul, Berk.
Status: awaiting pp-pme exchange class stabilizing. Also review of clfft init change https://gerrit.gromacs.org/c/gromacs/+/12897.
- Class to contain code to receive PME-rank forces on PP-rank GPU buffers (avoiding CPU). Coding assigned to Alan. Review assigned to Szilard, Berk, Mark.
Status: awaiting pp-pme exchange class stabilizing
- Halo-exchange class for force. Coding assigned to Alan. Review assigned to Szilard, Mark, Berk.
Status: Code in Gerrit at https://gerrit.gromacs.org/c/gromacs/+/12943. Awaiting review.
- Stitching together high-level single-GPU logic to achieve performance. Coding assigned to Artem. Review assigned to Erik, Mark.
Status: Awaiting progress on GPU StatePropagatorData patch
- Lift cr->duty assignment out of init_domain_decomposition(). Coding assigned to Berk. Review assigned to Mark, Paul, Szilard.
- Class containing Pme-pp exchange for coordinate buffers. Coding Assigned to Alan. Review assigned to Szilard, Berk, Mark.
Status: New refactored patch needs developed - will be very similar to above Pme-pp force code so hopefully not much review will be required.
- refinement of simulation flag structs. Coding assigned to Szilard. Review assigned to Mark, Berk, Paul, Erik.
Status: awaiting submission of earlier patch
- Stitching together high-level multi-GPU logic to achieve performance. Coding assigned to TODO. Will need input from many people
Status: not started yet
- mdrun user interface + choices of defaults. Coding assigned to Paul. Review assigned to Alan, Artem, Mark.
Status: not started yet
Due last minute
- Link task assignment to simulation flags and high-level GPU logic. Coding assigned to TODO. Will need input from many people.
Status: clfft init patch needing review https://gerrit.gromacs.org/c/gromacs/+/12897. That prepares for a patch to manage GPU streams at high level and pass handles into individual modules that need to collaborate on the ~5 streams we identified, namely nonlocal, local, pme work, pp-pme xfer, update). Waiting on Berk's patch to pull cr->duty assignment out of init_domain_decomposition(), so we can rearrange order of operations in runner so that GPU device info is available earlier, so that GPU streams can be set up before init_domain_decomposition() and init_forcerec()