Task #3370

Updated by Alan Gray 10 months ago

h1. Umbrella task for follow-up improvements.

h2. Unification of code-paths across different types of step in do_force

* The GPU buffer ops/reduction kernel should grow a feature to do virial calculation based on the bonded and short-range nonbonded contribution to avoid having to fall back to CPU on virial steps. Additionally, if there are any bonded interactions calculated on the CPU, these forces need to be transferred separately for the virial reduction on virial steps.
* Allow uniformity across search and non-search steps (ATM at search the xyzq fmt is generated in search and copied while non-ns steps do the conversion which leads to complexity).

h2. Improve synchronization

* Implement better receiver ready / notify in halo exchange: Current notification mechanisms render the one-sided communication synchronous two-sided. Alternatives should be considered.
* Separate PME x receive sync: the data dependency sychronization should be implemented on the consumer task's end which is PME spread in the case of PME. PME-only ranks have the receive enqueue wait as soon as MPI returns. Consider assembling a list of events and passed to spread instead. Consider whether having to receive from multiple PP ranks actually makes is more beneficial to overlap some receive with event wait enqueue.

<Tasks to be added here>