Project

General

Profile

Task #3205

address Debian experimental issues

Added by Szilárd Páll 21 days ago. Updated about 8 hours ago.

Status:
Fix uploaded
Priority:
High
Assignee:
-
Category:
testing
Target version:
Difficulty:
uncategorized
Close

Description

The results of the beta1 show a couple of failures:
https://buildd.debian.org/status/package.php?p=gromacs&suite=experimental

Some of these have been resolved in beta2, other have (likely) not, in particular:
  • the arm64 build fails:
    13: [ RUN      ] HardwareTopologyTest.NumaCacheSelfconsistency
    13: /<<PKGBUILDDIR>>/src/gromacs/hardware/tests/hardwaretopology.cpp:205: Failure
    13: Expected: (c.size) > (0), actual: 0 vs 0
    13: /<<PKGBUILDDIR>>/src/gromacs/hardware/tests/hardwaretopology.cpp:205: Failure
    13: Expected: (c.size) > (0), actual: 0 vs 0
    13: [  FAILED  ] HardwareTopologyTest.NumaCacheSelfconsistency (9 ms)
    13: [----------] 4 tests from HardwareTopologyTest (35 ms total)
    
  • i386: ListedForcesTest, MathUnitTest fail
  • ppc64el:SimulatorsAreEquivalentDefaultModular/SimulatorComparisonTest.WithinTolerances/3

On most the other hardware there are failures in MdrunTests, FileIOTests, MdrunNonIntegratorTests

make-check.log.gz (36 KB) make-check.log.gz Szilárd Páll, 12/05/2019 12:28 AM

Associated revisions

Revision c5a01a73 (diff)
Added by Mark Abraham 19 days ago

Stop testing that hwloc cache-size detection works properly

This was never a unit test of whether GROMACS HardwareTopology can
detect cache line size, because it requires both that hwloc works
correctly on the execution hardware, and that GROMACS code uses its
results correctly. So we should stop running such tests.

Refs #3205

Change-Id: I4d13e39620335b235f9969e5e09299fb9b41b1a0

Revision d76ad197 (diff)
Added by Mark Abraham 19 days ago

Relaxed listed forces test tolerances

Different compilers on different hardware generate different FP
instructions, which we need to tolerate.

Refs #3205

Change-Id: I965cecd82593dd7a0dd9e34d69937d06cd60735c

Revision bc628c63 (diff)
Added by Mark Abraham 10 days ago

Relax tolerance for bd tests

The implementation is somehow not very reproducible.

Refs #3205

Change-Id: Ibf9de00d6e67b8e2237cc8726f300b2bf8ad2327

Revision 245b4433 (diff)
Added by Test User 4 days ago

Relax test tolerance for cross correlation measure for densities

Some compilers do not perform the optimisations that are necessary to
reach default real tolerance when calculating the cross correlation of
larger three dimensional densities (100x100x100 voxels)

This patch loosens the requirement on the precision of cross correlation
evaluations.

refs #3205

Change-Id: Iad1bb1b17bb6935b2ce2537bcbb6c3902172a6bc

Revision a83b3bdd (diff)
Added by Test User about 22 hours ago

Relax cross-correlation test precision

Relase the floating point precision in cross-correlation evaluations on
large data arrays further. i386 fails here with debian experimental build.

Refs #3205

Change-Id: I86c9ad1804da5be98671ac14a5bc37ea5636aad0

History

#1 Updated by Mark Abraham 19 days ago

Szilárd Páll wrote:

The results of the beta1 show a couple of failures:
https://buildd.debian.org/status/package.php?p=gromacs&suite=experimental

Some of these have been resolved in beta2, other have (likely) not, in particular:

Done in c5a01a73e9095ae3998fc947b7e8bcfac3b57240

[...]
  • i386: ListedForcesTest, MathUnitTest fail

Proposed workarounds at https://gerrit.gromacs.org/c/gromacs/+/14319

  • ppc64el:SimulatorsAreEquivalentDefaultModular/SimulatorComparisonTest.WithinTolerances/3

Done in 536052b4c4ff46c700bde79ab00c022e817ac936

On most the other hardware there are failures in MdrunTests, FileIOTests, MdrunNonIntegratorTests

FileIOTests fails with some mrc reading stuff, e.g. https://buildd.debian.org/status/fetch.php?pkg=gromacs&arch=hppa&ver=2020%7Ebeta1-2&stamp=1571308902&raw=0. MdrunTests are failing on DensityFitting tests at https://buildd.debian.org/status/fetch.php?pkg=gromacs&arch=powerpc&ver=2020%7Ebeta1-2&stamp=1571361286&raw=0. I can see the latter throwing when we try to make a vector that is too large, which is generally because some size field is uninitialized or reads a garbage large integer. Is there something broken on some kind of weird hardware, Christian?

#2 Updated by Christian Blau 18 days ago

All density related failures seem to stem from the endianess issue I am currently working to fix, also the MdrunTests DensityFitting test because this is reading a reference density.

#3 Updated by Nicholas Breen 11 days ago

beta2 + the three patches Mark listed does a little better across architectures, though it uncovers a new problem on amd64 and x32 - NormalIntegrators/MdrunNoAppendContinuationIsExact.WithinTolerances/10 fails in single precision with a difference of 100 ULPs > allowed tolerance of 32 ULPs. I can't reproduce this one locally.

amd64 log: https://buildd.debian.org/status/fetch.php?pkg=gromacs&arch=amd64&ver=2020%7Ebeta2-1&stamp=1574545386&raw=0

The remaining failures look like the ones already identified from beta1, mostly FileIOTests and DensityFitting.

#4 Updated by Mark Abraham 11 days ago

Nicholas Breen wrote:

beta2 + the three patches Mark listed does a little better across architectures, though it uncovers a new problem on amd64 and x32 - NormalIntegrators/MdrunNoAppendContinuationIsExact.WithinTolerances/10 fails in single precision with a difference of 100 ULPs > allowed tolerance of 32 ULPs. I can't reproduce this one locally.

amd64 log: https://buildd.debian.org/status/fetch.php?pkg=gromacs&arch=amd64&ver=2020%7Ebeta2-1&stamp=1574545386&raw=0

OK I'll look into that one, but probably we'll increase the tolerance

The remaining failures look like the ones already identified from beta1, mostly FileIOTests and DensityFitting.

Likely resolved by Christian's work now submitted on release-2020 branch.

#5 Updated by Mark Abraham 10 days ago

Mark Abraham wrote:

Nicholas Breen wrote:

beta2 + the three patches Mark listed does a little better across architectures, though it uncovers a new problem on amd64 and x32 - NormalIntegrators/MdrunNoAppendContinuationIsExact.WithinTolerances/10 fails in single precision with a difference of 100 ULPs > allowed tolerance of 32 ULPs. I can't reproduce this one locally.

amd64 log: https://buildd.debian.org/status/fetch.php?pkg=gromacs&arch=amd64&ver=2020%7Ebeta2-1&stamp=1574545386&raw=0

OK I'll look into that one, but probably we'll increase the tolerance

Done at https://gerrit.gromacs.org/c/gromacs/+/14479

#6 Updated by Paul Bauer 4 days ago

  • Status changed from New to Fix uploaded
  • Target version changed from 2020-beta3 to 2020-rc1

I think most of those have been addressed now, but we should check again after beta3

#7 Updated by Mark Abraham 4 days ago

I've seen that Nicholas has tested that the bd fixes were effective. Remaining issues e.g. at https://buildd.debian.org/status/fetch.php?pkg=gromacs&arch=i386&ver=2020%7Ebeta2-2&stamp=1575010762&raw=0 have probably been fixed by Christian already, but not in the version Nicholas tested.

#8 Updated by Szilárd Páll 1 day ago

Mark Abraham wrote:

Szilárd Páll wrote:

The results of the beta1 show a couple of failures:
https://buildd.debian.org/status/package.php?p=gromacs&suite=experimental

Some of these have been resolved in beta2, other have (likely) not, in particular:

Done in c5a01a73e9095ae3998fc947b7e8bcfac3b57240

[...]
  • i386: ListedForcesTest, MathUnitTest fail

Proposed workarounds at https://gerrit.gromacs.org/c/gromacs/+/14319

Still failing:
https://buildd.debian.org/status/fetch.php?pkg=gromacs&arch=i386&ver=2020%7Ebeta3-1&stamp=1575458804&raw=0

#9 Updated by Mark Abraham 1 day ago

Failed a bunch of Angle/ListedForcesTest.Ifunc/* for mysterious reasons and also

14: [ RUN      ] DensitySimilarityTest.CrossCorrelationIsOne
14: /<<PKGBUILDDIR>>/src/gromacs/math/tests/densityfit.cpp:214: Failure
14:   Value of: measure.similarity(comparedDensity.asConstView())
14:     Actual: 0.99999999999703382
14:   Expected: expectedSimilarity
14:   Which is: 1
14: Difference: 2.96618e-12 (26717 double-prec. ULPs, rel. 2.97e-12)
14:  Tolerance: abs. 2.22045e-12, 10000 ULPs
14: [  FAILED  ] DensitySimilarityTest.CrossCorrelationIsOne (74 ms)
14: [ RUN      ] DensitySimilarityTest.CrossCorrelationIsMinusOneWhenAntiCorrelated
14: /<<PKGBUILDDIR>>/src/gromacs/math/tests/densityfit.cpp:234: Failure
14:   Value of: measure.similarity(comparedDensity.asConstView())
14:     Actual: -0.99999999999703382
14:   Expected: expectedSimilarity
14:   Which is: -1
14: Difference: 2.96618e-12 (26717 double-prec. ULPs, rel. 2.97e-12)
14:  Tolerance: 10000 ULPs
14: [  FAILED  ] DensitySimilarityTest.CrossCorrelationIsMinusOneWhenAntiCorrelated (86 ms)

which looks like a tolerance that should be relaxed (only at double?)

#10 Updated by Christian Blau 1 day ago

Yes, tolerance may be relaxed here. Without the proper optimisations this is 100'000 elements multiplied and added, so relaxing ULP tolerance to roughly 100'000 should fix the issue without compromiting the test.

#11 Updated by Mark Abraham 1 day ago

Nicholas, https://gerrit.gromacs.org/#/c/14624 would give us more insight into the Angles failure on 386. It uses a posix specific function, so doesn't pass all our testing, but if possible to run the 386 case again on beta 3, plus this patch, we should know what to do about it! Thanks!

#12 Updated by Szilárd Páll 1 day ago

Mark Abraham wrote:

Nicholas, https://gerrit.gromacs.org/#/c/14624 would give us more insight into the Angles failure on 386. It uses a posix specific function, so doesn't pass all our testing, but if possible to run the 386 case again on beta 3, plus this patch, we should know what to do about it! Thanks!

I could reproduce it in a vm, here is the output (this is with gcc 8.3): https://termbin.com/yb5e

Full make check output uploaded, in my hands it also produces MdrunIsReproduced/MdrunRerunFreeEnergyTest.WithinTolerances failures.

#13 Updated by Nicholas Breen 1 day ago

My local builds give identical results to Szilárd's, except that MdrunRerunFreeEnergyTest.WithinTolerances passed without error.

#14 Updated by Mark Abraham about 23 hours ago

The code for the angles test looks fine to me, so I'm 99% confident this is buggy code generation.

I'm curious whether the problem might be

  • the scaling by DEG2RAD on lines 927 & 928 of listed_forces/bonded.cpp,
  • the += on line 926 (since in this test we only evaluate one interaction, simple assignment should pass the test also),
  • one or other part of the computation on line 451, or
  • that the parameters looked up from the t_iparams union for lines 926-928 are somehow wrong.

if someone with the ability to test locally wants to put in some printf!

#15 Updated by Nicholas Breen about 8 hours ago

Buggy code generation at higher optimization levels does indeed seem to be the culprit.
  • Default compiler options at -O3 -> test fails
  • Compiling at -O2 -> test passes
  • Remaining at -O3, adding printf("index %d\n", i); at line 925 -> test passes

This is with gcc 9.2.1. Hard to nail down the exact problem when even a trivial printf like that avoids the misoptimization!

Also available in: Atom PDF