Project

General

Profile

Bug #3411

Nightly master release build failure

Added by Paul Bauer 3 months ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
mdrun
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

The nightly release builds for master are currently failing in the MpiCoordinationTests for two ranks with the following

[2020-03-06T04:55:29.440Z] Program:     mdrun-mpi-coordination-test, version 2021
[2020-03-06T04:55:29.440Z] Source file: src/gromacs/domdec/domdec_topology.cpp (line 430)
[2020-03-06T04:55:29.440Z] MPI rank:    0 (out of 2)
[2020-03-06T04:55:29.440Z] 
[2020-03-06T04:55:29.440Z] Fatal error:
[2020-03-06T04:55:29.440Z] 5 of the 5 bonded interactions could not be calculated because some atoms
[2020-03-06T04:55:29.440Z] involved moved further apart than the multi-body cut-off distance (-1 nm) or
[2020-03-06T04:55:29.440Z] the two-body cut-off distance (0.925347 nm), see option -rdd, for pairs and
[2020-03-06T04:55:29.440Z] tabulated bonds also see option -ddcheck
[2020-03-06T04:55:29.440Z] 
[2020-03-06T04:55:29.440Z] For more information and tips for troubleshooting, please check the GROMACS
[2020-03-06T04:55:29.440Z] website at http://www.gromacs.org/Documentation/Errors

History

#1 Updated by Paul Bauer 3 months ago

Build config is this

/opt/cmake/3.15.1/bin/cmake /home/jenkins/workspace/Release_workflow_master/gromacs-2021-dev -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=g++-5 '-DCMAKE_CXX_LINK_FLAGS=-Wl,-rpath,/opt/gcc/5.4/bin/../lib64 -L/opt/gcc/5.4/bin/../lib64' -DCMAKE_C_COMPILER=gcc-5 -DCMAKE_INSTALL_PREFIX=/home/jenkins/workspace/Release_workflow_master/test-install -DCUDA_TOOLKIT_ROOT_DIR=/opt/cuda_10.0 -DGMXAPI=ON -DGMX_COMPILER_WARNINGS=ON -DGMX_DEFAULT_SUFFIX=OFF -DGMX_GPU=ON -DGMX_HWLOC=AUTO -DGMX_SIMD=None -DGMX_USE_RDTSCP=DETECT -DREGRESSIONTEST_PATH=/home/jenkins/workspace/Release_workflow_master/regressiontests-2021-dev

#2 Updated by Mark Abraham 3 months ago

Sigh, here am I thinking "MpiCoordinationTests what the heck is that? some clown named that... oh it was me"

bisecting

#3 Updated by Paul Bauer 3 months ago

Mark Abraham wrote:

Sigh, here am I thinking "MpiCoordinationTests what the heck is that? some clown named that... oh it was me"

bisecting

this has to be something recent that is only present in master, as the issue is not present in release-2020.
Not your duty in fixing this, but needs to be addressed at some point :) And usually the clown that can't name things for his life is me :)

#4 Updated by Mark Abraham 3 months ago

I bisected against master, testing with

(cd build-cmake-clang-release; ninja mdrun-mpi-coordination-test && bin/mdrun-mpi-coordination-test --gtest_filter=PropagatorsWithConstraints/PeriodicActionsTest.PeriodicActionsAgreeWithReference/\* -ntmpi 2)

and identified 9894ff5595ebff3810ab1e1d72c31c09e3d25c4a as the first failing one. It tends to fail only on case PropagatorsWithConstraints/PeriodicActionsTest.PeriodicActionsAgreeWithReference/6 but i don't know why. Any ideas Pascal?

Also available in: Atom PDF