Project

General

Profile

Feature #1464

implement PP-PME re-balancing

Added by Szilárd Páll over 5 years ago. Updated about 4 years ago.

Status:
New
Priority:
Normal
Category:
core library
Target version:
Difficulty:
uncategorized
Close

Description

As the PP_PME load-balancing is not fully dynamic, during a simulation imbalance can develop an result in performance loss. This can be caused by a number of reasons, e.g. GPU or CPU throttling, increased load imbalance caused by the protein drifting in the box.

Most annoyingly, NVIDIA Kepler GeForce GPUs, even under normal MD load, start to throttle after warming up. This will often result in PP-PME imbalance and increased CPU wait time even if the run was balanced in the beginning, after the initial cut-off tuning.

To avoid unnecessary slowdowns, we should re-run the PP_PME load balancing if imbalance is detected. In the case of GPU acceleration this should be done by keeping track of the average "Wait for GPU" time and triggering re-balancing if this increases by a pre-determined amount.

History

#1 Updated by Mark Abraham about 4 years ago

  • Target version changed from 5.x to future

Also available in: Atom PDF