Project

General

Profile

Bug #3232

MdrunMpiCoordinationTestsTwoRanks time out without OPENMP support in Gitlab CI

Added by Paul Bauer 8 months ago. Updated 8 months ago.

Status:
Closed
Priority:
High
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

When running CI integration testing on Gitlab, the mentioned test times out in PropagatorsWithConstraints/PeriodicActionsTest.PeriodicActionsAgreeWithReference/1
The tests are in general quite slow there, so that might be related

Associated revisions

Revision 0ba9a0a8 (diff)
Added by Paul Bauer 8 months ago

Test parallel mpi only with openmp

Can time out in other conditions
Found while working on Gitlab CI

Refs #3232

Change-Id: I7cf62ed913d2bc8c88317a674b0594dbe54ba394

Revision 5a5e732b (diff)
Added by Paul Bauer 8 months ago

Test parallel mpi only with openmp

Can time out in other conditions
Found while working on Gitlab CI

Refs #3232

Change-Id: I7cf62ed913d2bc8c88317a674b0594dbe54ba394

History

#1 Updated by Paul Bauer 8 months ago

  • Status changed from New to Rejected

caused by not having openmp available

#2 Updated by Mark Abraham 8 months ago

apt versions of clang have had openmp support built in for several versions (back to 5, I think) so that suggests the problem is not what you think

#3 Updated by Paul Bauer 8 months ago

  • Subject changed from MdrunMpiCoordinationTestsTwoRanks time out with clang 6 and 7 in Gitlab to MdrunMpiCoordinationTestsTwoRanks time out without OPENMP support in Gitlab CI
  • Status changed from Rejected to Fix uploaded

it doesn't get installed by default (at least with the Ubuntu we are using for the CI)
Here is the build from current master with clang-6 (https://gitlab.com/gromacs/gromacs/-/jobs/375576962), showing that no OPENMP is found and resulting in the timeout here (https://gitlab.com/gromacs/gromacs/-/jobs/375576981) Note that this is a release build, so it runs the slow tests for MPI coordination.

In this follow-up build on the CI testing branch I added openmp to the apt-get install list (https://gitlab.com/gromacs/gromacs-testing/-/jobs/374423780), but it still didn't get picked up for clang-6, but worked for clang-7, see here (https://gitlab.com/gromacs/gromacs-testing/-/jobs/374423779).

This meant that the test with clang-6 still timed out (https://gitlab.com/gromacs/gromacs-testing/-/jobs/374423787), but passed for clang-7 (https://gitlab.com/gromacs/gromacs-testing/-/jobs/374423785).

Turning off OPENMP explicitly on this build caused the same behaviour (https://gitlab.com/gromacs/gromacs-testing/-/jobs/374458047).

After correcting the conditional for the tests, the timeout is avoided (https://gitlab.com/gromacs/gromacs-testing/-/jobs/374581970).

#4 Updated by Mark Abraham 8 months ago

Thanks for the details. Libomp-dev and similarly named packages are needed for omp.h to be found and thus CMake report OpenMP support is found.

However the timeouts suggest another problem. 16 steps on an Argon box with 12 atoms, or a box with two spc water, shouldn't take 4-5 seconds. The mdrun logs indicate that something external is affecting thread affinity, evidently not MPI or OpenMP. Is k8s distributing containers across the cores like we want it to? What does htop suggest when the machine has a decent load?

#5 Updated by Paul Bauer 8 months ago

I can check this later today and will try to reproduce it in a local container

#6 Updated by Paul Bauer 8 months ago

  • Status changed from Fix uploaded to Resolved

#7 Updated by Paul Bauer 8 months ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF