Feature #2934

Updated by Alan Gray over 1 year ago

Implement and improve the GPU version of position buffer operations. Gerrit change 9169 implements the functionality, and a follow-up change will improve as below.

* -Use pinned host vectors for grid and gridset arrays and remove explicit cudahostregister/unregister calls in init fn-
* -Replace allocatedevicebuffer with reallocatedevicebuffer in init fn-
* -Improve variable naming in init and buffer ops fns-
* -Fix issue with position buffer pinning to allow use of gmx api for memcpy-
* Implement sync point between PME and NB streams. [WIP]
* Improve mechanism for deciding if position buffer needs to be
copied to GPU in advance of buffer op
* -move move GPU conditional out of OpenMP loop in atomdata_copy_x_to_nbat_x. Precompute maxAtomsInCOlumn and that and store it in the grid object, instead of recomputing it every step.-

* move the complex coordinate management into a single place (from where it can be refactored easier): code added by early X buf ops changes added technical debt and complexity by conditionally doing data copying and management of xrvec when this is not done in PME. Consolidation and refactoring needed [WIP]
* refactor grid data members added without any grouping into the main nonbonded data structure.
* consider uniform behavior across search and non-search steps (ATM at search the xyzq fmt is generated in search and copied while non-ns steps do the conversion which leads to complexity).