Project

General

Profile

Bug #2397

Difference between single rank and multiple rank when pulling using constraints relative to rest of the system

Added by Magnus Lundborg 6 months ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

I noticed that I get completely different e.g. pull forces and temperature on a run started on a GPU server running NB and PME on GPU and then continued from a restart file on the PDC Beskow supercomputer running MPI (no GPUs).

Continuing from a checkpoint on a different hardware would not make the results binary identical, but in this case the difference is remarkable. The pull forces and temperature fluctuations are a lot higher on Beskow. I guess something is going wrong and I guess the output from the GPU server is correct, based only on the fact that it is more stable.

I'm attaching the pull force and temperature output and the log file from a run where the first 100 ps are run on a GPU server, the next 200 ps on Beskow and then 200 ps on the GPU server again.

pullf.xvg (64.1 KB) pullf.xvg Magnus Lundborg, 02/02/2018 02:22 PM
temperature.xvg (929 Bytes) temperature.xvg Magnus Lundborg, 02/02/2018 02:22 PM
md.log (55.4 KB) md.log Magnus Lundborg, 02/02/2018 02:22 PM
topol.tpr (1.19 MB) topol.tpr Magnus Lundborg, 02/02/2018 02:22 PM

History

#1 Updated by Magnus Lundborg 6 months ago

Would it be possible that domain decomposition might affect the center of mass positions and thereby upset pulling using constraints? I haven't seen this when pulling with an absolute reference - only related to the rest of the system.

#2 Updated by Berk Hess 5 months ago

Could it be that with MPI the global communication frequency, and thus also the COM removal frequency, is automatically increased?

#3 Updated by Magnus Lundborg 5 months ago

  • Subject changed from Difference between MPI and thread-MPI version pulling using constraints relative to rest of the system to Difference between single rank and multiple rank when pulling using constraints relative to rest of the system

The problem was identified to be related to single rank vs multiple rank. Subject updated.

#4 Updated by Mark Abraham 5 months ago

  • Assignee set to Berk Hess
  • Target version changed from 2018.1 to 2018.2

Berk is still looking into this

#5 Updated by Mark Abraham about 1 month ago

Have we progressed here?

#6 Updated by Mark Abraham about 1 month ago

  • Target version changed from 2018.2 to 2018.3

Also available in: Atom PDF