Project

General

Profile

Bug #2345

Avoid duplicate over-subscription note+warning

Added by Mark Abraham almost 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

The thread-affinity and OpenMP management code both warn about oversubscription:

gmx mdrun -s reference_s -deffnm test -nt 5

...
GROMACS:      gmx mdrun, version 2018-beta2-dev-20171211-2e91fcf47
Executable:   /home/mabraham/git/r2018/build-cmake-clang-debug/install/bin/gmx
Data prefix:  /home/mabraham/git/r2018/build-cmake-clang-debug/install
Working dir:  /home/mabraham/git/regressiontests/complex/nbnxn_rf
Command line:
  gmx mdrun -s reference_s -deffnm test -nt 5

Reading file reference_s.tpr, VERSION 2016-beta1-dev-20160524-3a9e67d (single precision)
Note: file tpx version 110, software tpx version 112
Changing nstlist from 10 to 25, rlist from 0.944 to 1.068

Using 5 MPI threads

WARNING: Oversubscribing the available 4 logical CPU cores with 5 thread-MPI threads.
         This will cause considerable performance loss!

NOTE: Oversubscribing the CPU, will not pin threads

NOTE: Thread affinity was not set.

The one in the OpenMP code only works when there's no separate PME ranks, and not on nodes with only PME ranks, assumes that each PP rank has the same number of threads (which might be wrong if the user has set up MPI creatively), and has an existing todo to move it elsewhere and support PME-only ranks, so probably the best approach is to remove it.


Related issues

Related to GROMACS - Bug #2342: Avoid yelling about thread pinning twiceClosed

Associated revisions

Revision 7607e626 (diff)
Added by Berk Hess almost 2 years ago

Make oversubscription warning consistent

The hardware thread oversubscription warning was only issued
with OpenMP and without separate PME ranks. Now it actually reduces
the thread count over the physical node.
Also moved the thread affinity up to the earliest possible point.

Refs #2345

Change-Id: Ifdf62c723fd87b0ddaab0df1e2f1bf36b461ea33

History

#1 Updated by Mark Abraham almost 2 years ago

  • Related to Bug #2342: Avoid yelling about thread pinning twice added

#2 Updated by Gerrit Code Review Bot almost 2 years ago

Gerrit received a related patchset '1' for Issue #2345.
Uploader: Mark Abraham ()
Change-Id: gromacs~release-2018~Ib9f91a5e4db73b399060a25ab7e927a785231fdc
Gerrit URL: https://gerrit.gromacs.org/7332

#3 Updated by Berk Hess almost 2 years ago

  • Status changed from New to Resolved
  • Assignee changed from Mark Abraham to Berk Hess

#4 Updated by Mark Abraham almost 2 years ago

  • Status changed from Resolved to Fix uploaded

The fix has just a +1 right now :-)

#5 Updated by Berk Hess almost 2 years ago

  • Assignee changed from Berk Hess to Mark Abraham

I misinterpreted this issue. The double affinity note should be fixed and is fixed with my fix. But the oversubscription warning has nothing to do with affinities (although it also triggers the affinity note). The oversubscription performance warning should certainly stay. We should also note that we do not set affinities.

#6 Updated by Mark Abraham almost 2 years ago

Berk Hess wrote:

I misinterpreted this issue. The double affinity note should be fixed and is fixed with my fix. But the oversubscription warning has nothing to do with affinities (although it also triggers the affinity note). The oversubscription performance warning should certainly stay. We should also note that we do not set affinities.

This issue is orthogonal to the affinity issues. I am happy to reword the remaining warning to mention the performance loss.

#7 Updated by Erik Lindahl almost 2 years ago

Let's have one warning/note about what is happening, not three...

#8 Updated by Erik Lindahl almost 2 years ago

Suggestion:

WARNING: Oversubscribing the available 4 logical CPU cores with 5 thread-MPI threads.
This is bad for performance, and prevents us from pinning threads.

In general, what is the reason we have to echo a "note" in every case we are not pinning threads? Since we already warn about cases we consider severe, I see no reason why we just must tell the user about an internal decision we made.

#9 Updated by Gerrit Code Review Bot almost 2 years ago

Gerrit received a related patchset '1' for Issue #2345.
Uploader: Berk Hess ()
Change-Id: gromacs~release-2018~Ifdf62c723fd87b0ddaab0df1e2f1bf36b461ea33
Gerrit URL: https://gerrit.gromacs.org/7356

#10 Updated by Mark Abraham almost 2 years ago

Erik Lindahl wrote:

Suggestion:

WARNING: Oversubscribing the available 4 logical CPU cores with 5 thread-MPI threads.
This is bad for performance, and prevents us from pinning threads.

In general, what is the reason we have to echo a "note" in every case we are not pinning threads? Since we already warn about cases we consider severe, I see no reason why we just must tell the user about an internal decision we made.

I agree such minor things don't belong on stderr.

#11 Updated by Mark Abraham almost 2 years ago

  • Status changed from Fix uploaded to Closed

Also available in: Atom PDF