Nightly master release build failure
The nightly release builds for master are currently failing in the MpiCoordinationTests for two ranks with the following
[2020-03-06T04:55:29.440Z] Program: mdrun-mpi-coordination-test, version 2021 [2020-03-06T04:55:29.440Z] Source file: src/gromacs/domdec/domdec_topology.cpp (line 430) [2020-03-06T04:55:29.440Z] MPI rank: 0 (out of 2) [2020-03-06T04:55:29.440Z] [2020-03-06T04:55:29.440Z] Fatal error: [2020-03-06T04:55:29.440Z] 5 of the 5 bonded interactions could not be calculated because some atoms [2020-03-06T04:55:29.440Z] involved moved further apart than the multi-body cut-off distance (-1 nm) or [2020-03-06T04:55:29.440Z] the two-body cut-off distance (0.925347 nm), see option -rdd, for pairs and [2020-03-06T04:55:29.440Z] tabulated bonds also see option -ddcheck [2020-03-06T04:55:29.440Z] [2020-03-06T04:55:29.440Z] For more information and tips for troubleshooting, please check the GROMACS [2020-03-06T04:55:29.440Z] website at http://www.gromacs.org/Documentation/Errors
#1 Updated by Paul Bauer 8 months ago
Build config is this
/opt/cmake/3.15.1/bin/cmake /home/jenkins/workspace/Release_workflow_master/gromacs-2021-dev -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_COMPILER=g++-5 '-DCMAKE_CXX_LINK_FLAGS=-Wl,-rpath,/opt/gcc/5.4/bin/../lib64 -L/opt/gcc/5.4/bin/../lib64' -DCMAKE_C_COMPILER=gcc-5 -DCMAKE_INSTALL_PREFIX=/home/jenkins/workspace/Release_workflow_master/test-install -DCUDA_TOOLKIT_ROOT_DIR=/opt/cuda_10.0 -DGMXAPI=ON -DGMX_COMPILER_WARNINGS=ON -DGMX_DEFAULT_SUFFIX=OFF -DGMX_GPU=ON -DGMX_HWLOC=AUTO -DGMX_SIMD=None -DGMX_USE_RDTSCP=DETECT -DREGRESSIONTEST_PATH=/home/jenkins/workspace/Release_workflow_master/regressiontests-2021-dev
#3 Updated by Paul Bauer 8 months ago
Mark Abraham wrote:
Sigh, here am I thinking "MpiCoordinationTests what the heck is that? some clown named that... oh it was me"
this has to be something recent that is only present in master, as the issue is not present in release-2020.
Not your duty in fixing this, but needs to be addressed at some point :) And usually the clown that can't name things for his life is me :)
#4 Updated by Mark Abraham 8 months ago
I bisected against master, testing with
(cd build-cmake-clang-release; ninja mdrun-mpi-coordination-test && bin/mdrun-mpi-coordination-test --gtest_filter=PropagatorsWithConstraints/PeriodicActionsTest.PeriodicActionsAgreeWithReference/\* -ntmpi 2)
and identified 9894ff5595ebff3810ab1e1d72c31c09e3d25c4a as the first failing one. It tends to fail only on case PropagatorsWithConstraints/PeriodicActionsTest.PeriodicActionsAgreeWithReference/6 but i don't know why. Any ideas Pascal?