Task #2453

Updated by Aleksei Iupinov over 2 years ago

With porting PME from CUDA to OpenCL I'm first going with a dirty code with lots of duplication to see how to strike a balance between neatness and extensibility. Most of the host-side logic is quite easy to wrap to look the same in CUDA/OpenCL since there is no C++ limitations.

Functionality already achieved:
- Spline/spread and gather OpenCL kernels passing unit tests on NVIDIA, Intel and AMD GPUs (also, mixed mode PME tests passing there as well, except for Intel, where NBs are incorrect).

TODO for correctness of the development branch:
- implement gather and solve kernel kernels as well, using unit tests;
- test on AMD/Intel, change warp_size==32 assumptions (the tests for spread/gather already pass with preferred widths 16 and 64, width of 8 is gonna need some small modifications in kernels); remove warpsize==32 assumptions;
- try importing and using clFFT, verifying correctness of the full PME OpenCL with PmeTest/regression tests;
- take a glance at performance.

TODO for clean submission into master branch: checklist
(which should eventually have gerrit links for everything)
(There is also probably much more stuff that I've forgotten)