race condition in GPU detection with MPI
With the MPI-build of
mdrun all ranks in a node run the GPU detection. With NVIDIA GPUs set to process- or thread-exclusive mode a race condition occurs as detection that happens concurrently will lead to one of the participating ranks getting an error that the device(es) is/are not available/busy.
GPU detection is done once per physical node
Only one MPI rank in each physical node now run the GPU detection.
The resulting information is broadcasted to the other ranks.
Note that we should also implement this for the CPU detection.