Project

General

Profile

Task #2453

Updated by Aleksei Iupinov over 2 years ago

With porting PME from CUDA to OpenCL I'm first going with a dirty code with lots of duplication to see how to strike a balance between neatness and extensibility. Most of the host-side logic is quite easy to wrap to look the same in CUDA/OpenCL since there is no C++ limitations.

Functionality achieved (https://github.com/yupinov/gromacs/tree/pme_opencl_gerrit): already achieved:
- PME OpenCL kernels passing unit tests on NVIDIA NVIDIA, Intel and AMD GPUs; GPUs (also, mixed mode PME tests passing there as well, except for Intel, where NBs are incorrect).
https://github.com/yupinov/gromacs/tree/pme_opencl_dirty
- PME fully working on AMDGPU-PRO OpenCL, but broken with Rocm stack, only due to clFFT still being broken with RoCM. (https://github.com/clMathLibraries/clFFT/issues/218)

TODO: TODO for correctness of the development branch:
- check correctness take a glance at performance - the first glance revealed not just FFT/solve, btu also sperad being 2.5x slower on Intel; Vega with AMDGPU-PRO - have to get to performacne counters eventually.
- document test mroe on AMD/Intel, change warp_size==32 assumptions (the tests for spread/gather already pass with preferred widths 16 and cleanup FIXMEs; 64, width of 8 is gonna need some small modifications in kernels);

TODO for clean submission into master branch: checklist

- subtasks. (which should eventually have gerrit links for everything)
(There is also probably much more stuff that I've forgotten)

Back