Project

General

Profile

Task #2520

Feature #2054: PME on GPU

Task #2453: PME OpenCL porting effort

Treat OpenCL kernel width more diligently

Added by Aleksei Iupinov 12 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Difficulty:
uncategorized
Close

Description

As discussed on https://gerrit.gromacs.org/#/c/7924/, we're currently treating OpenCL kernel execution width with CUDA-centric world view.
Function getWarpSize() queries CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE for a test kernel and returns 64 for AMD, 32 for NVidia accordingly.
We should likely be using vendor extensions (CL_DEVICE_WAVEFRONT_WIDTH_AMD from cl_amd_device_attribute_query, CL_DEVICE_WARP_SIZE_NV from cl_nv_device_attribute_query, CL_KERNEL_MAX_SUB_GROUP_SIZE_FOR_NDRANGE from cl_intel_subgroups) and only use CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE as a fallback. We should also be querying this "warp" size for each kernel separately and storing it.


Related issues

Related to GROMACS - Task #2516: Support PME OpenCL execution width < 16New

Associated revisions

Revision 691f1d0e (diff)
Added by Szilárd Páll 6 months ago

Ensure minimum exec width of the PME OpenCL kernels

This change adds checks to make sure that we don't execute incorrect
kernels in the case of the rare event if the Intel OpenCL compiler
decides to generate spread or gather kernels for 8-wide execution.

Refs #2516 #2520

Change-Id: I7ab33accebe908a56eb194e8245dfcfa6f817324

History

#1 Updated by Mark Abraham 7 months ago

  • Related to Task #2516: Support PME OpenCL execution width < 16 added

Also available in: Atom PDF