Project

General

Profile

Task #3446

Task #2792: Improvement of PME gather and spread CUDA kernels

apply maintainability updates across all GPU kernels

Added by Szilárd Páll 2 months ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Category:
-
Target version:
-
Difficulty:
uncategorized
Close

Description

Before the CUDA and OpenCL GPU device code greatly diverges, in order to improve maintainability, we need to refactor/introduce in the OpenCL kernels the simplest (form of the) changes made in the CUDA PME kernels:
- add threadsPerAtom=16 constant
- move spline calculation to separate clh file
- introduce the c_recalculateSplines paths in spread/gather
- introduce c_useAtomDataPrefetch code-paths

The items above are the low-effort maintenance needs to avoid major code divergence. Implementing threadsPerAtom=4 can be considered separately as in contrast with the above that's mostly code-path (re)organization, it is a new feature to be implemented.

History

#1 Updated by Mark Abraham 2 months ago

The first point is partly done already - see pme_gpu_program.cl

#2 Updated by Szilárd Páll 2 months ago

  • Description updated (diff)

Mark Abraham wrote:

The first point is partly done already - see pme_gpu_program.cl

Thanks for the correction.

#3 Updated by Szilárd Páll 2 months ago

  • Description updated (diff)

Also available in: Atom PDF