Task #2498
Feature #2054: PME on GPU
Task #2453: PME OpenCL porting effort
OpenCL memory pinning/mapping
Description
One aspect of the good GPU performance is providing for fast asynchronous data transfers, potentially overlapping with GPU compute.
With CUDA implementation we care about it rather much, having designed HostAllocationPolicy around cudaHostRegister to provide aligned and pinned host memory allocations.
With OpenCL, it seems we didn't care so far, as we only have pmalloc() making plain 16 byte-aligned allocations and a meager TODO to at least use 4k pages the way CUDA supposedly does.
Looking around on the internet, it seems that OpenCL works in terms of mapped memory instead of pinned memory, so one is expected to manage both the host allocation and a corresponding device-side cl_mem buffer, e.g. by calling clCreateBuffer (https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clCreateBuffer.html) with CL_MEM_USE_HOST_PTR flag and then using clEnqueueMapBuffer (https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clEnqueueMapBuffer.html) for producing the new mapped/"pinned" host-side pointer.
One description is here in a very old NVIDIA OpenCL best practices guide in 3.1.1:
http://www.nvidia.com/content/cudazone/CUDABrowser/downloads/papers/NVIDIA_OpenCL_BestPracticesGuide.pdf
There are more discussions on the internet, whether to use CL_MEM_USE_HOST_PTR or CL_MEM_ALLOC_HOST_PTR, when to call map/unmap, etc.
But to reiterate the core issue, our current HostAllocationPolicy, as name implies, works for CUDA pinning, but is likely not fit to accommodate proper host/device OpenCL memory handling.
With PME OpenCL, I will have to sidestep this design problem.
Associated revisions
History
#1 Updated by Aleksei Iupinov over 2 years ago
- Subject changed from OpenCL memory pinning to OpenCL memory pinning/mapping
- Description updated (diff)
- Difficulty hard added
- Difficulty deleted (
uncategorized)
#2 Updated by Aleksei Iupinov over 2 years ago
- Description updated (diff)
- Private changed from Yes to No
#3 Updated by Gerrit Code Review Bot over 2 years ago
Gerrit received a related patchset '1' for Issue #2498.
Uploader: Aleksei Iupinov (a.yupinov@gmail.com)
Change-Id: gromacs~master~I2a294aee460947cd3aad5e23869cead1b99fd610
Gerrit URL: https://gerrit.gromacs.org/7874
#4 Updated by Mark Abraham about 2 years ago
- Target version changed from 2019 to 2020
#5 Updated by Paul Bauer about 1 year ago
- Target version changed from 2020 to future
Ensure PME with OpenCL does not attempt to pin
Host-only memory pinning was designed with CUDA in mind, while OpenCL
requires managing both host and device memory buffer for efficient
mapping, which is not yet implemented.
This change teaches the PME module to understand what pinning policy
is appropriate to the build configuration, so that the setup of data
structures in various parts of the code can use a pinning policy that
always works.
Refs #2498
Change-Id: I2a294aee460947cd3aad5e23869cead1b99fd610