bonded CUDA offload task
Top-level task, a summary of the sub-tasks required to deliver the bonded GPU offload in CUDA for the 2019 release.
The plan is to take the NVIDIA code and attempt integrating it into the 2019 release with the goal of: running it next to the PP task using the same coordinates (and possibly force output buffer) and minimizing new CUDA code needed. The initial implementation will only support bonded offload if all listed interactions can be offloaded (offloading a subset should be straightforward extension, same goes for excluding perturbed bondeds).Coarse list (individual subtasks linked):
filler-particle extension to the DD modulebonded task conversion based on NB indexing (allows reuse of nbnxn coordinates +/- force buffer for bondeds)
- initial bonded CUDA code cleanup (https://gerrit.gromacs.org/#/c/8460)
- bonded task scheduling and reduction scheduling code
- command line interface and task assignment
Add CUDA bonded kernels
CUDA bonded kernels are added for the most common bonded and LJ-14
The default auto settings of mdrun offloads these interactions
to the GPU when possible.
Currently these interactions are computed in the local or non-local
nbnxn non-bonded streams. We should consider using a separate stream.
This change uses synchronous transfers. A child change will change
these to asynchronous.
Updated release notes and performance guide.