Project

General

Profile

Task #2516

Feature #2054: PME on GPU

Task #2453: PME OpenCL porting effort

Support PME OpenCL execution width < 16

Added by Aleksei Iupinov about 1 year ago. Updated 2 months ago.

Status:
New
Priority:
Low
Category:
-
Target version:
-
Difficulty:
hard
Close

Description

PME CUDA/OpenCL code is implemented with the hardcoded assumption of 16 threads per atom (PME_SPREADGATHER_THREADS_PER_ATOM).
This corresponds to spreading/gathering in 2 dimensions - one can search for assignments of ithy and ithz in the spread and gather kernel files.
This logic has to be changed to only use 1 dimension to support execution widths < 16, e.g. on Intel.
Changing assignments/loop code themselves should be easy, but expect more pitfalls :-)


Related issues

Related to GROMACS - Task #2030: make the OpenCL nobonded kernels work on Intel iGPUClosed
Related to GROMACS - Task #2519: Improve/remove PME OpenCL kernel barriersNew
Related to GROMACS - Task #2520: Treat OpenCL kernel width more diligentlyNew

Associated revisions

Revision 691f1d0e (diff)
Added by Szilárd Páll 9 months ago

Ensure minimum exec width of the PME OpenCL kernels

This change adds checks to make sure that we don't execute incorrect
kernels in the case of the rare event if the Intel OpenCL compiler
decides to generate spread or gather kernels for 8-wide execution.

Refs #2516 #2520

Change-Id: I7ab33accebe908a56eb194e8245dfcfa6f817324

Revision a19dd7d5 (diff)
Added by Szilárd Páll 2 days ago

Fix OpenCL gather reduction

On >=16-wide execution it is correct (narrower is checked and excluded
during compilation).

TODO: Consider changing the default on NVIDIA & Intel where offloading
PME is generally not advantageous to performance.

Addresses part of #2519
Refs #2453 #2516

Change-Id: I24beaaeea096954ba32b3a80251945a9d82a3c05

History

#1 Updated by Aleksei Iupinov about 1 year ago

  • Related to Task #2030: make the OpenCL nobonded kernels work on Intel iGPU added

#2 Updated by Aleksei Iupinov about 1 year ago

  • Related to Task #2519: Improve/remove PME OpenCL kernel barriers added

#3 Updated by Aleksei Iupinov about 1 year ago

As discussed with Roland, changing the code might be not needed at all.
One might get away with inserting additional synchronisation points as needed, and maybe still treating warp_size on host and device as multiple of 16.

#4 Updated by Gerrit Code Review Bot 10 months ago

Gerrit received a related patchset '1' for Issue #2516.
Uploader: Szilárd Páll ()
Change-Id: gromacs~release-2019~I7ab33accebe908a56eb194e8245dfcfa6f817324
Gerrit URL: https://gerrit.gromacs.org/8635

#5 Updated by Mark Abraham 10 months ago

  • Related to Task #2520: Treat OpenCL kernel width more diligently added

#6 Updated by Szilárd Páll 2 months ago

  • Priority changed from Normal to Low

Dropping priority as we do not expect to work on this anytime soon nor do we expect hardware that would need it.

Also available in: Atom PDF