Task #3002
Task #2675: bonded CUDA offload task
consider splitting bonded work into local/nonlocal
Description
Splitting the bonded task into local and non-local would allow more uniform, simpler and easier to maintain code, in particular with GPU offload where the other short-range task (nonbondeds) is already split in this manner.
This will likely also allow some performance improvements to be had as a smaller nonlocal task will speed up the critical path allowing the halo-exchange to start earlier.