Project

General

Profile

Bug #1101

fix & improve CPU oversubscription handling

Added by Szilárd Páll almost 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Oversubscription of the available CPU cores should be avoided in most (if not all) cases as it results in bad performance. This is made even worse by thread pinning. The current oversubscription check implemented in the gmx_omp_nthreads module is incorrect with separate PME nodes that use a different number of OpenMP threads than the PP nodes.

The following improvements are required:
  • implementing correct check outside of gmx_omp_nthreads - it's not (only) OpenMP-related, it can happen with pure MPI/tMPI;
  • turning off thread pinning when oversubscription is detected.

Associated revisions

Revision a1bd375a (diff)
Added by Erik Lindahl almost 7 years ago

Added basic CPU topology information to cpuid code

We can now detect the locality of hardware threads, cores,
and packages for Intel and AMD CPUs under Linux and Windows.
In particular, this provides an array with locality order
for logical processors that can be used to optimize placement.
Refs #1086, #1101.

Change-Id: I3f7985b1b67729376918c5a135b9157a9086235e

History

#1 Updated by Mark Abraham almost 7 years ago

Can we deal with this in the next fortnight or so for 4.6, or push it back to 4.6.1?

#2 Updated by Szilárd Páll almost 7 years ago

Mark Abraham wrote:

Can we deal with this in the next fortnight or so for 4.6, or push it back to 4.6.1?

Depends what does "we" mean.

I have not strived to fixing the current code because it requires yet another splitting of the default communicator (the same is already done in 2-3 places) and doing a per-node thread enumeration (at least I don't know of a simpler way). However, as we discussed earlier (and now I'm realizing an issue for this is missing), there should instead be communicators set up and stored for inter- and intra-node communication which would remove current redundancy and enable implementing other features. This task is pretty simple, so I might as well just do it myself if nobody can pitch in.

#3 Updated by Erik Lindahl almost 7 years ago

  • Status changed from New to Closed

Fixed by gerrit 2051, at least for what is realistic to expect in 4.6.

Also available in: Atom PDF