Feature #2934
Feature #2816: GPU offload / optimization for update&constraits, buffer ops and multi-gpu communication
Feature #2817: GPU X/F buffer ops
GPU X Buffer ops
Description
Implement and improve the GPU version of position buffer operations. Gerrit change 9169 implements the functionality, and a follow-up change will improve as below.
TODOs:Use pinned host vectors for grid and gridset arrays and remove explicit cudahostregister/unregister calls in init fnReplace allocatedevicebuffer with reallocatedevicebuffer in init fnImprove variable naming in init and buffer ops fnsFix issue with position buffer pinning to allow use of gmx api for memcpyImplement sync point between PME and NB streams.Improve mechanism for deciding if position buffer needs to be copied to GPU in advance of buffer opmove GPU conditional out of OpenMP loop in atomdata_copy_x_to_nbat_x. Precompute maxAtomsInCOlumn and that and store it in the grid object, instead of recomputing it every step.move the complex coordinate management into a single place (from where it can be refactored easier): code added by early X buf ops changes added technical debt and complexity by conditionally doing data copying and management of xrvec when this is not done in PME. Consolidation and refactoring needed- refactor grid data members added without any grouping into the main nonbonded data structure.
- consider uniform behavior across search and non-search steps (ATM at search the xyzq fmt is generated in search and copied while non-ns steps do the conversion which leads to complexity).
Subtasks
Associated revisions
Pin X buffer with DD and use proper abstraction in buffer ops
This patch fixes an issue that the coordinate buffer was not being
pinned with DD, and replaces a raw cudaMemcpy call in the buffer ops
routine with proper abstraction through the copyToDeviceBuffer API.
Follow up to Gerrit change 9169
Implements part of #2934
Change-Id: I64aec66157ceb5eb04bcfd7cfe24b6ea5c18e4ae
Use HostVector for Grid/GridSet data need on-GPU
Grid.cxy_na_, Grid.cxy_ind_, GridSet.cells and GridSet.atomIndices
have been converted from std::vector to gmx::HostVector. This allow
the code to pin the HostVector when X buffer ops is used and to
eliminate the hacky pin/unpin in CUDA buffer ops functions.
Change-Id: Icca21dd076128ec582f805ed96e253dfab461270
Conditionally pin GPU-related grid data
Data that is transferred to the GPU when the buffer ops is offloaded is
now only pinned when the nonbonded module uses GPU offload avoidign the
runtime errors encountered when a GPU-enabled build does not detect a
GPU and therefore the CUDA runtime refuses to register the memory.
Change-Id: Iabbc0d9f37fad0e88cd39a078af1346e8f713ec1
History
#1 Updated by Alan Gray almost 2 years ago
- Description updated (diff)
#2 Updated by Alan Gray almost 2 years ago
- Description updated (diff)
#3 Updated by Szilárd Páll over 1 year ago
- Description updated (diff)
#4 Updated by Alan Gray over 1 year ago
- Description updated (diff)
#5 Updated by Alan Gray over 1 year ago
- Description updated (diff)
#6 Updated by Paul Bauer about 1 year ago
- Status changed from New to Resolved
Is this work done now?
#7 Updated by Paul Bauer about 1 year ago
- Status changed from Resolved to In Progress
- Target version changed from 2020 to 2021
no comment and open TODOs means this gets bumped
#8 Updated by Alan Gray about 1 year ago
- Status changed from In Progress to Closed
Moved to umbrella task https://redmine.gromacs.org/issues/3370
Code quality improvements for CUDA position buffer ops
Follow up to Gerrit change 9169
Implements part of #2934
Change-Id: Ie948f624ae6f7e2df7a2b6c6c734d08d862096e5