Project

General

Profile

Bug #1358

race condition in GPU detection with MPI

Added by Szilárd Páll almost 4 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

With the MPI-build of mdrun all ranks in a node run the GPU detection. With NVIDIA GPUs set to process- or thread-exclusive mode a race condition occurs as detection that happens concurrently will lead to one of the participating ranks getting an error that the device(es) is/are not available/busy.

Associated revisions

Revision 82341057 (diff)
Added by Berk Hess almost 4 years ago

GPU detection is done once per physical node

Only one MPI rank in each physical node now run the GPU detection.
The resulting information is broadcasted to the other ranks.
Note that we should also implement this for the CPU detection.
Fixes #1358

Change-Id: I16c6ccc40bd53d96b99d3f6a0abed69cc89136d8

History

#1 Updated by Berk Hess almost 4 years ago

  • Status changed from New to Fix uploaded

#2 Updated by Berk Hess almost 4 years ago

  • Status changed from Fix uploaded to Resolved
  • % Done changed from 0 to 100

#3 Updated by Mark Abraham almost 4 years ago

  • Target version changed from 4.6.5 to 4.6.4

#4 Updated by Rossen Apostolov over 3 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF