Difference between single rank and multiple rank when pulling using constraints relative to rest of the system
I noticed that I get completely different e.g. pull forces and temperature on a run started on a GPU server running NB and PME on GPU and then continued from a restart file on the PDC Beskow supercomputer running MPI (no GPUs).
Continuing from a checkpoint on a different hardware would not make the results binary identical, but in this case the difference is remarkable. The pull forces and temperature fluctuations are a lot higher on Beskow. I guess something is going wrong and I guess the output from the GPU server is correct, based only on the fact that it is more stable.
I'm attaching the pull force and temperature output and the log file from a run where the first 100 ps are run on a GPU server, the next 200 ps on Beskow and then 200 ps on the GPU server again.
Add check for pull group PBC to grompp
Pull groups that use a reference atom for periodic boundary treatment
should have all their atoms well within half the box size of this
reference. When this is not the case, grompp will now issue a warning.
Fixes a bug in pull group size calculation
The wrong atom indexes were used when checking the coordinates
of atoms in a pull group (commit aa102e691d59b4de37c8e4).
That lead to false reports of too large pull group
(and presumably false negatives). This fixes the problem.
#3 Updated by Magnus Lundborg about 1 year ago
- Subject changed from Difference between MPI and thread-MPI version pulling using constraints relative to rest of the system to Difference between single rank and multiple rank when pulling using constraints relative to rest of the system
The problem was identified to be related to single rank vs multiple rank. Subject updated.
#11 Updated by Magnus Lundborg 8 months ago
I guess such a check could be a good idea. But what if atoms move to be further away from the PBC than half the box size during the simulation? Then there would still be a problem, I guess, and I don't think it would be fixed by https://gerrit.gromacs.org/#/c/8060/ .