Project

General

Profile

Task #2514

Feature #2054: PME on GPU

Task #2453: PME OpenCL porting effort

PME OpenCL reductions with intrinsics

Added by Aleksei Iupinov over 1 year ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
mdrun
Target version:
Difficulty:
hard
Close

Description

PME has 2 reduction stages: obligatory per-atom force reduction in gather, optional observable 7 global energy/virial components in solve.
PME CUDA kernels implement versions of reductions with shared memory or with faster CUDA shuffle intrinsics.
PME OpenCL kernels only implement shared (__local in terms of OpenCL) memory reductions.
It should be beneficial to implement versions of reductions with AMD/Intel/... intrinsics.

AMD intrinsics

High-level description:
https://gpuopen.com/amd-gcn-assembly-cross-lane-operations/
Szilard asking about OpenCL use:
https://github.com/RadeonOpenCompute/ROCm/issues/189#issuecomment-325780455
Builtin "docs":
https://github.com/llvm-mirror/clang/blob/master/include/clang/Basic/BuiltinsAMDGPU.def

History

#1 Updated by Szilárd Páll over 1 year ago

Note that same reduction optimization applies to the nonbondeds (at least for AMD for Intel Roland implemented it using Intel subgroup extensions).

Also available in: Atom PDF