Integration planning for GROMACS 2020 GPU features » History » Version 11
Alan Gray, 09/10/2019 10:30 AM
h1. Integration planning for GROMACS 2020 GPU features
We have a set of target functionality to integrate by 9/16. (TODO record that)
* GPU halo-exchange for positions. Coding assigned to Alan. Review assigned to Mark and Berk.
Status - code in gerrit, class+file+method naming input given, can merge when revised accordingly
* pp-pme exchange class. Coding assigned to Alan. Review assigned to Mark, Berk.
Status - https://gerrit.gromacs.org/c/gromacs/+/12783 awaiting further review/merge.
* simulation flags structs. Coding assigned to Szilard. Review assigned to Mark, Berk, Paul, Erik.
Status - WIP at https://gerrit.gromacs.org/c/gromacs/+/11622, would benefit from FTF discussion on how to describe lifetime and intent for this initial patch, whose form is understood to be unlike what we want in the long term
* -Question: Does multi-domain GPU-halo-exchange force handling support adding extra CPU forces?- Alan's answer: I believe it should do (if an H2D copy of CPU forces is placed before the halo exchange), since the non-local part from the remote GPU wil be accumulated into the local part of the force buffer.
* LF pressure-coupling support for GPU-based update. Coding assigned to Artem. Review assigned to TODO.
Status - P-R is plus2 on gerrit at https://gerrit.gromacs.org/c/gromacs/+/12477. Berendsen support easy to add in a future patch?
* tests covering .mdp nst* flags, in particular for well chosen co-prime cases to prevent unexpected behavior changes. Also can we test e.g. forces with FEP on non-output steps. Coding assigned to Mark. Review assigned to TODO.
Status - start work Monday afternoon
* Make GPU version of StatePropagatorData. Coding assigned to Artem. Review assigned to Erik, Mark.
Status - awaiting patch from Artem. Naming decision recorded at https://gerrit.gromacs.org/c/gromacs/+/11986
* Clean up to make a PP-rank "PME force receiver". Coding assigned to Mark. Review assigned to Alan, Paul, Berk.
Status: awaiting pp-pme exchange class stabilizing. Also review of clfft init change https://gerrit.gromacs.org/c/gromacs/+/12897.
* Code to receive PME-rank forces on PP-rank GPU buffers (avoiding CPU). Coding assigned to Alan. Review assigned to Szilard, Berk, Mark.
Status: Code at https://gerrit.gromacs.org/c/gromacs/+/12980 awaiting review.
* Halo-exchange class for force. Coding assigned to Alan. Review assigned to Szilard, Mark, Berk.
Status: Code in Gerrit at https://gerrit.gromacs.org/c/gromacs/+/12943. Awaiting review.
* Stitching together high-level single-GPU logic to achieve performance. Coding assigned to Artem. Review assigned to Erik, Mark.
Status: Awaiting progress on GPU StatePropagatorData patch
* Lift cr->duty assignment out of init_domain_decomposition(). Coding assigned to Berk. Review assigned to Mark, Paul, Szilard.
* Class containing Pme-pp exchange for coordinate buffers. Coding Assigned to Alan. Review assigned to Szilard, Berk, Mark.
Status: Code at https://gerrit.gromacs.org/c/gromacs/+/13043 awaiting review.
* refinement of simulation flag structs. Coding assigned to Szilard. Review assigned to Mark, Berk, Paul, Erik.
Status: awaiting submission of earlier patch
* Stitching together high-level multi-GPU logic to achieve performance. Coding assigned to TODO. Will need input from many people
Status: not started yet
* mdrun user interface + choices of defaults. Coding assigned to Paul. Review assigned to Alan, Artem, Mark.
Status: not started yet
Due last minute
* Link task assignment to simulation flags and high-level GPU logic. Coding assigned to TODO. Will need input from many people.
Status: clfft init patch needing review https://gerrit.gromacs.org/c/gromacs/+/12897. That prepares for a patch to manage GPU streams at high level and pass handles into individual modules that need to collaborate on the ~5 streams we identified, namely nonlocal, local, pme work, pp-pme xfer, update). Waiting on Berk's patch to pull cr->duty assignment out of init_domain_decomposition(), so we can rearrange order of operations in runner so that GPU device info is available earlier, so that GPU streams can be set up before init_domain_decomposition() and init_forcerec()