Project

General

Profile

Bug #1270

affinity setting broken with MPI

Added by Szilárd Páll over 4 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

With the MPI builds (i.e GMX_MPI=ON) the affinity setting seems to be broken. While affinity setting works with the exact same build and launch configuration when compiling with thread_mpi, with MPI I get the warning suggesting that affinity setting is not supported by the current platform.

This means that tMPI_Thread_setaffinity_support() returns TMPI_SETAFFINITY_SUPPORT_NO which suggests that HAVE_PTHREAD_SETAFFINITY is not defined. I suspect a bug in the build system.

As this bug will result in no affinities set in all MPI runs, it will cause considerable performance regression - especially at high parallelization where OpenMP is used.


Related issues

Related to GROMACS - Bug #1334: concurrency-related bug with thread-MPIClosed2013-09-13
Has duplicate GROMACS - Bug #1266: Affinity setting not supported on Cray XK7Closed2013-05-28

Associated revisions

Revision 78569369 (diff)
Added by Sander Pronk over 4 years ago

Comprehensive hwinfo structure concurrency fix.

The hwinfo structure and structures contained therein are inherently
global to any mdrun processes/ranks. This patch makes sure that
- The hwinfo structure is shared among all threads
- Only one thread creates a hwinfo structure
- The hwinfo structure is safe to read for all threads after they
obtain it

In addition, it fixes the detection for pthread_setaffinity in thread_mpi.

This fixes concurrency issues with thread affinity settings with or
without MPI, and makes runner.c slightly easier to read because the
concurrency logic is pushed to gmx_detect_hardware.c

Fixes #1270, #1254

Note that #1254 issue 3 seems to be an OpenMPI bug.

Change-Id: I236e81923324d7873f3d8633889b91c7c02a7843

History

#1 Updated by Szilárd Páll over 4 years ago

The bug was introduced by 972ab1f9 which moved the code that does the HAVE_PTHREAD_SETAFFINITY in to the TMPI_ENABLE macro which is called only with thread_mpi. The solution is moving this check out similarly to the way TMPI_TEST_ATOMICS is done.

#2 Updated by Mark Abraham over 4 years ago

Ugh. Thanks for the diagnosis!

#3 Updated by Sander Pronk over 4 years ago

  • Assignee changed from Berk Hess to Sander Pronk

#4 Updated by Rossen Apostolov over 4 years ago

  • Status changed from New to Fix uploaded

#5 Updated by Mark Abraham over 4 years ago

  • Status changed from Fix uploaded to Resolved

#6 Updated by Sander Pronk about 4 years ago

  • % Done changed from 0 to 100

#7 Updated by Szilárd Páll about 4 years ago

  • Status changed from Resolved to Closed

Sander Pronk wrote:

Applied in changeset 78569369348e07a300a03f90e667e61879858025.

Closing this issue, but I wanted to note that the change seems to be causing yet another problem, see #1334.

Also available in: Atom PDF