Functionality already achieved with a dirty duplicated code:

Spline/spread OpenCL kernels kernel passing unit tests on NVidia GPU. NVidia.
TODO for correctness of the development branch:
- implement gather and solve kernels work as well, using unit tests;
- test on AMD/Intel, remove warpsize==32 assumptions;
- try importing and using clFFT, verifying correctness of the full PME OpenCL with PmeTest/regression tests;
- take a glance at performance.

TODO for cleans submission into master branch: checklist below