Bonded GPU kernel launched in the wrong stream with 1 PP + 1 PME rank
Due to a confusion od the
DOMAINDECOMP(cr) check, when there is only 1 PP and 1 PME rank (and the check evaluates to true), the bonded module gets initialized with a null stream and therefore ends up blocking the overlap of other operations when it gets launched.
Fix the GPU bonded stream with 1 PP + 1 PME rank
WIth 1 PP + 1 PME rank the GpuBonded constructor gets passed the
non-local nonbonded stream which is nullptr and as a result the bonded
kernel launch happens in the default stream blocking concurrent
This change makes sure that only when there is PP domain decomposition
is the GpuBonded constructor passed the nonlocal stream.