Task #2453
Updated by Aleksei Iupinov almost 3 years ago
With porting PME from CUDA to OpenCL I'm first going with a dirty code with lots of duplication to see how to strike a balance between neatness and extensibility. Most of the host-side logic is quite easy to wrap to look the same in CUDA/OpenCL since there is no C++ limitations.
Functionality already achieved with a dirty duplicated code:
- Spline/spread OpenCL kernels passing unit tests on NVIDIA NVidia GPU.
TODO for correctness of the development branch:
- implement gather and solve kernels as well, using unit tests;
- test on AMD/Intel, remove warpsize==32 assumptions;
- try importing and using clFFT, verifying correctness of the full PME OpenCL with PmeTest/regression tests;
- take a glance at performance.
TODO for clean cleans submission into master branch: checklist
(which should eventually have gerrit links for everything)
(There is also probably much more stuff that I've forgotten)
below
Functionality already achieved with a dirty duplicated code:
- Spline/spread OpenCL kernels passing unit tests on NVIDIA NVidia GPU.
TODO for correctness of the development branch:
- implement gather and solve kernels as well, using unit tests;
- test on AMD/Intel, remove warpsize==32 assumptions;
- try importing and using clFFT, verifying correctness of the full PME OpenCL with PmeTest/regression tests;
- take a glance at performance.
TODO for clean cleans submission into master branch: checklist
(which should eventually have gerrit links for everything)
(There is also probably much more stuff that I've forgotten)
below