Task #2679
Task #2675: bonded CUDA offload task
bonded GPU offload task assignment
Description
To be implemented:
- -bonded auto/cpu/gpu
command line option
- new task and valid assignments to the PP or PP+PME ranks' GPU
- only offload if all available listed interaction types are supported at first
- default behavior (TBD based on performance)
Associated revisions
History
#1 Updated by Szilárd Páll over 2 years ago
Note that in my latest tests bonded offload with rnase on a Quadro P6000 (simiar to a 1080Ti) it takes more work to offload the bonded interactions than to execute all listed kernels + reduction on half of a slow CPU (and about the same amount of time if we just add up kernel times).
Hence, for small inputs we need to fuse kernels and/or disable bonded offload, I think.
#2 Updated by Mark Abraham over 2 years ago
Szilárd Páll wrote:
To be implemented:
--bonded auto/cpu/gpu
command line option
Done
- new task and valid assignments to the PP or PP+PME ranks' GPU
Done
- only offload if all available listed interaction types are supported at first
Berk's changes mean we don't need this
- default behavior (TBD based on performance)
Currently always on if supported and NB is on GPU.
#3 Updated by Gerrit Code Review Bot over 2 years ago
Gerrit received a related patchset '1' for Issue #2679.
Uploader: Mark Abraham (mark.j.abraham@gmail.com)
Change-Id: gromacs~master~I0ebbbd33c2cba5808561111b0ec6160bfd2f840d
Gerrit URL: https://gerrit.gromacs.org/8535
#4 Updated by Szilárd Páll over 2 years ago
- Status changed from New to In Progress
- Assignee set to Mark Abraham
Currently always on if supported and NB is on GPU.
OK. I suggest we keep it that way and tweak defaults post-beta as we'll need more benchmarks to know when is is worth offloading.
#5 Updated by Mark Abraham over 2 years ago
- Status changed from In Progress to Resolved
Applied in changeset a07e8b21025a4e429ff8ec61783c1ad17e9516a8.
#6 Updated by Mark Abraham over 2 years ago
- Status changed from Resolved to Closed
Task assignment for bonded interactions on CUDA GPUs
Made a query function to find whether any interactions of supported
times exist in the global topology, so that we can make efficient
high-level decisions.
Added free for gpuBondedLists pointer.
Minor cleanup in manage-threading.h
Fixes #2679
Change-Id: I0ebbbd33c2cba5808561111b0ec6160bfd2f840d