Project

General

Profile

Task #2498

Feature #2054: PME on GPU

Task #2453: PME OpenCL porting effort

OpenCL memory pinning/mapping

Added by Aleksei Iupinov over 1 year ago. Updated 11 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Difficulty:
hard
Close

Description

One aspect of the good GPU performance is providing for fast asynchronous data transfers, potentially overlapping with GPU compute.
With CUDA implementation we care about it rather much, having designed HostAllocationPolicy around cudaHostRegister to provide aligned and pinned host memory allocations.
With OpenCL, it seems we didn't care so far, as we only have pmalloc() making plain 16 byte-aligned allocations and a meager TODO to at least use 4k pages the way CUDA supposedly does.
Looking around on the internet, it seems that OpenCL works in terms of mapped memory instead of pinned memory, so one is expected to manage both the host allocation and a corresponding device-side cl_mem buffer, e.g. by calling clCreateBuffer (https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clCreateBuffer.html) with CL_MEM_USE_HOST_PTR flag and then using clEnqueueMapBuffer (https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clEnqueueMapBuffer.html) for producing the new mapped/"pinned" host-side pointer.
One description is here in a very old NVIDIA OpenCL best practices guide in 3.1.1:
http://www.nvidia.com/content/cudazone/CUDABrowser/downloads/papers/NVIDIA_OpenCL_BestPracticesGuide.pdf
There are more discussions on the internet, whether to use CL_MEM_USE_HOST_PTR or CL_MEM_ALLOC_HOST_PTR, when to call map/unmap, etc.
But to reiterate the core issue, our current HostAllocationPolicy, as name implies, works for CUDA pinning, but is likely not fit to accommodate proper host/device OpenCL memory handling.
With PME OpenCL, I will have to sidestep this design problem.

Associated revisions

Revision abb0e3c6 (diff)
Added by Aleksei Iupinov over 1 year ago

Ensure PME with OpenCL does not attempt to pin

Host-only memory pinning was designed with CUDA in mind, while OpenCL
requires managing both host and device memory buffer for efficient
mapping, which is not yet implemented.

This change teaches the PME module to understand what pinning policy
is appropriate to the build configuration, so that the setup of data
structures in various parts of the code can use a pinning policy that
always works.

Refs #2498

Change-Id: I2a294aee460947cd3aad5e23869cead1b99fd610

History

#1 Updated by Aleksei Iupinov over 1 year ago

  • Subject changed from OpenCL memory pinning to OpenCL memory pinning/mapping
  • Description updated (diff)
  • Difficulty hard added
  • Difficulty deleted (uncategorized)

#2 Updated by Aleksei Iupinov over 1 year ago

  • Description updated (diff)
  • Private changed from Yes to No

#3 Updated by Gerrit Code Review Bot over 1 year ago

Gerrit received a related patchset '1' for Issue #2498.
Uploader: Aleksei Iupinov ()
Change-Id: gromacs~master~I2a294aee460947cd3aad5e23869cead1b99fd610
Gerrit URL: https://gerrit.gromacs.org/7874

#4 Updated by Mark Abraham 11 months ago

  • Target version changed from 2019 to 2020

Also available in: Atom PDF