Bug #1270
affinity setting broken with MPI
Description
With the MPI builds (i.e GMX_MPI=ON) the affinity setting seems to be broken. While affinity setting works with the exact same build and launch configuration when compiling with thread_mpi, with MPI I get the warning suggesting that affinity setting is not supported by the current platform.
This means that tMPI_Thread_setaffinity_support()
returns TMPI_SETAFFINITY_SUPPORT_NO
which suggests that HAVE_PTHREAD_SETAFFINITY
is not defined. I suspect a bug in the build system.
As this bug will result in no affinities set in all MPI runs, it will cause considerable performance regression - especially at high parallelization where OpenMP is used.
Related issues
Associated revisions
History
#1 Updated by Szilárd Páll over 5 years ago
The bug was introduced by 972ab1f9 which moved the code that does the HAVE_PTHREAD_SETAFFINITY
in to the TMPI_ENABLE
macro which is called only with thread_mpi. The solution is moving this check out similarly to the way TMPI_TEST_ATOMICS
is done.
#2 Updated by Mark Abraham over 5 years ago
Ugh. Thanks for the diagnosis!
#3 Updated by Sander Pronk over 5 years ago
- Assignee changed from Berk Hess to Sander Pronk
#4 Updated by Rossen Apostolov over 5 years ago
- Status changed from New to Fix uploaded
#5 Updated by Mark Abraham over 5 years ago
- Status changed from Fix uploaded to Resolved
#6 Updated by Sander Pronk over 5 years ago
- % Done changed from 0 to 100
Applied in changeset 78569369348e07a300a03f90e667e61879858025.
#7 Updated by Szilárd Páll over 5 years ago
- Status changed from Resolved to Closed
Sander Pronk wrote:
Applied in changeset 78569369348e07a300a03f90e667e61879858025.
Closing this issue, but I wanted to note that the change seems to be causing yet another problem, see #1334.
Comprehensive hwinfo structure concurrency fix.
The hwinfo structure and structures contained therein are inherently
global to any mdrun processes/ranks. This patch makes sure that
- The hwinfo structure is shared among all threads
- Only one thread creates a hwinfo structure
- The hwinfo structure is safe to read for all threads after they
obtain it
In addition, it fixes the detection for pthread_setaffinity in thread_mpi.
This fixes concurrency issues with thread affinity settings with or
without MPI, and makes runner.c slightly easier to read because the
concurrency logic is pushed to gmx_detect_hardware.c
Fixes #1270, #1254
Note that #1254 issue 3 seems to be an OpenMPI bug.
Change-Id: I236e81923324d7873f3d8633889b91c7c02a7843