improve CPU force reductions
Force reductions across different CPU modules need improvements, especially when considering heterogeneous / accelerator code-paths:
- make it more clear which is the "master" CPU buffer, its lifetime and validity
-- code-path aware force clearing
-- consumers should only get a read-only view
- have a separate reduction module be in charge of summing separate force contributions (e.g. PME GPU module does reduction of PME forces in the main buffer)
- make reduction code aware when there is a opportunity to store vs accumulate (e.g. when all forces are offloaded on a rank, the final F reduction can be 30% faster by storing rather than accumulating into the master buffer)
#3 Updated by Szilárd Páll 4 months ago
Berk Hess wrote:
What is code-path aware force clearing.
Not the best wording, I guess. On the GPU-offload code-path, both with or without buffer ops offloaded, we may not have anything to compute on the CPU, so clearing (as well as accumulating into) the CPU force buffer is a waste.