Feature #2885

Updated by Artem Zhmurov about 1 year ago

Adapt the LINCS constraints to work efficiently on CUDA-enabled GPUs.


* -A separate class that contains the logic.-
* Reduction for the virial using shuffle.
* -Many-GPU version.- PLINCS.
* Many-GPU version.
Free energy.

Ideas for kernel improvement:

* Use analytical solution for matrix A inversion (for small matrices of H-bonds constraints), inverted matrix itself can be reused rather than recomputed.
* Move more data to local/shared memory and try to get rid of atomics (at least on the device level).
* Use locality of coupled constraints better (maybe go from block-sync to warp-sync)
* Introduce mapping of thread id to both single constraint and single atom, thus designating Nth threads to deal with Nat <= Nth coupled atoms and Nc <= Nth coupled constraints.


* -Initial integration to the constraints test.-
* Add bigger systems to test virial reduction and overall redistribution of constraints among threads.
* Generalization of tests for different platforms.

Current version of the code is in gerrit change 9193 (