Incorrect forces with DD and GPUs with partially empty boxes
When there are non-local atoms but no non-local interactions, the CUDA kernel is not called, but the F reduction is called due to a stupid typo. This situation is very rare, which is why it hasn't been noticed.
Fixed GPU non-local F copy local conditional
With domain decomposition and GPUs the copy of the non-local part of
the host memory force buffer to the force array was conditional on
the local instead of the non-local list size. This meant that with
an empty non-local list and non-empty local list outdated non-local
forces would be copied. Conversely, with an empty local list all
non-local forces would not be added. Both things can only happen
in systems with partially empty boxes and then only rarely.
Having the local kernel, D2H copyback and F reduction called
conditionally is not useful in practice, so they are now unconditional
to avoid complicating the code.