MdrunMpiCoordinationTestsTwoRanks time out without OPENMP support in Gitlab CI
When running CI integration testing on Gitlab, the mentioned test times out in PropagatorsWithConstraints/PeriodicActionsTest.PeriodicActionsAgreeWithReference/1
The tests are in general quite slow there, so that might be related
Test parallel mpi only with openmp
Can time out in other conditions
Found while working on Gitlab CI
#3 Updated by Paul Bauer 8 months ago
- Subject changed from MdrunMpiCoordinationTestsTwoRanks time out with clang 6 and 7 in Gitlab to MdrunMpiCoordinationTestsTwoRanks time out without OPENMP support in Gitlab CI
- Status changed from Rejected to Fix uploaded
it doesn't get installed by default (at least with the Ubuntu we are using for the CI)
Here is the build from current master with clang-6 (https://gitlab.com/gromacs/gromacs/-/jobs/375576962), showing that no OPENMP is found and resulting in the timeout here (https://gitlab.com/gromacs/gromacs/-/jobs/375576981) Note that this is a release build, so it runs the slow tests for MPI coordination.
In this follow-up build on the CI testing branch I added openmp to the apt-get install list (https://gitlab.com/gromacs/gromacs-testing/-/jobs/374423780), but it still didn't get picked up for clang-6, but worked for clang-7, see here (https://gitlab.com/gromacs/gromacs-testing/-/jobs/374423779).
This meant that the test with clang-6 still timed out (https://gitlab.com/gromacs/gromacs-testing/-/jobs/374423787), but passed for clang-7 (https://gitlab.com/gromacs/gromacs-testing/-/jobs/374423785).
Turning off OPENMP explicitly on this build caused the same behaviour (https://gitlab.com/gromacs/gromacs-testing/-/jobs/374458047).
After correcting the conditional for the tests, the timeout is avoided (https://gitlab.com/gromacs/gromacs-testing/-/jobs/374581970).
#4 Updated by Mark Abraham 8 months ago
Thanks for the details. Libomp-dev and similarly named packages are needed for omp.h to be found and thus CMake report OpenMP support is found.
However the timeouts suggest another problem. 16 steps on an Argon box with 12 atoms, or a box with two spc water, shouldn't take 4-5 seconds. The mdrun logs indicate that something external is affecting thread affinity, evidently not MPI or OpenMP. Is k8s distributing containers across the cores like we want it to? What does htop suggest when the machine has a decent load?