Task #2519
Feature #2054: PME on GPU
Task #2453: PME OpenCL porting effort
Improve/remove PME OpenCL kernel barriers
Description
PME OpenCL kernels currently have additional synchronisation points, as compared to CUDA ones.
Some of those barriers should probably depend on minimal execution width (e.g. subgroup size?).
It might also be that some are needed at all. The purpose of this issue is to track all of them.
Relaxing any barrier requires rerunning Ewald unit tests on all supported and relevant platforms.
Hence it is probably beneficial achieve correctness on Intel GPUs first and only then start changing the barriers.
Related issues
Associated revisions
Fix OpenCL gather reduction
On >=16-wide execution it is correct (narrower is checked and excluded
during compilation).
TODO: Consider changing the default on NVIDIA & Intel where offloading
PME is generally not advantageous to performance.
Addresses part of #2519
Refs #2453 #2516
Change-Id: I24beaaeea096954ba32b3a80251945a9d82a3c05
History
#1 Updated by Aleksei Iupinov over 1 year ago
- Private changed from Yes to No
#2 Updated by Aleksei Iupinov over 1 year ago
- Related to Task #2516: Support PME OpenCL execution width < 16 added
#3 Updated by Gerrit Code Review Bot about 1 year ago
Gerrit received a related patchset '1' for Issue #2519.
Uploader: Szilárd Páll (pall.szilard@gmail.com)
Change-Id: gromacs~master~I4c8f2cff405cd3044bd60b99f01bcdd918dc5d0e
Gerrit URL: https://gerrit.gromacs.org/8512
#4 Updated by Gerrit Code Review Bot 10 months ago
Gerrit received a related patchset '1' for Issue #2519.
Uploader: Szilárd Páll (pall.szilard@gmail.com)
Change-Id: gromacs~release-2019~I4c8f2cff405cd3044bd60b99f01bcdd918dc5d0e
Gerrit URL: https://gerrit.gromacs.org/9167
Relax OpenCL gather kernel barrier on AMD
Not needed on arch with >32 execution width.
Refs #2519
Change-Id: I4c8f2cff405cd3044bd60b99f01bcdd918dc5d0e