Project

General

Profile

Task #3205

address Debian experimental issues

Added by Szilárd Páll 8 months ago. Updated 6 months ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
testing
Target version:
Difficulty:
uncategorized
Close

Description

The results of the beta1 show a couple of failures:
https://buildd.debian.org/status/package.php?p=gromacs&suite=experimental

Some of these have been resolved in beta2, other have (likely) not, in particular:
  • the arm64 build fails:
    13: [ RUN      ] HardwareTopologyTest.NumaCacheSelfconsistency
    13: /<<PKGBUILDDIR>>/src/gromacs/hardware/tests/hardwaretopology.cpp:205: Failure
    13: Expected: (c.size) > (0), actual: 0 vs 0
    13: /<<PKGBUILDDIR>>/src/gromacs/hardware/tests/hardwaretopology.cpp:205: Failure
    13: Expected: (c.size) > (0), actual: 0 vs 0
    13: [  FAILED  ] HardwareTopologyTest.NumaCacheSelfconsistency (9 ms)
    13: [----------] 4 tests from HardwareTopologyTest (35 ms total)
    
  • i386: ListedForcesTest, MathUnitTest fail
  • ppc64el:SimulatorsAreEquivalentDefaultModular/SimulatorComparisonTest.WithinTolerances/3

On most the other hardware there are failures in MdrunTests, FileIOTests, MdrunNonIntegratorTests

make-check.log.gz (36 KB) make-check.log.gz Szilárd Páll, 12/05/2019 12:28 AM
bonded.cpp.volatile.diff (668 Bytes) bonded.cpp.volatile.diff Nicholas Breen, 12/07/2019 01:41 AM

Associated revisions

Revision c5a01a73 (diff)
Added by Mark Abraham 8 months ago

Stop testing that hwloc cache-size detection works properly

This was never a unit test of whether GROMACS HardwareTopology can
detect cache line size, because it requires both that hwloc works
correctly on the execution hardware, and that GROMACS code uses its
results correctly. So we should stop running such tests.

Refs #3205

Change-Id: I4d13e39620335b235f9969e5e09299fb9b41b1a0

Revision d76ad197 (diff)
Added by Mark Abraham 8 months ago

Relaxed listed forces test tolerances

Different compilers on different hardware generate different FP
instructions, which we need to tolerate.

Refs #3205

Change-Id: I965cecd82593dd7a0dd9e34d69937d06cd60735c

Revision bc628c63 (diff)
Added by Mark Abraham 7 months ago

Relax tolerance for bd tests

The implementation is somehow not very reproducible.

Refs #3205

Change-Id: Ibf9de00d6e67b8e2237cc8726f300b2bf8ad2327

Revision 245b4433 (diff)
Added by Test User 7 months ago

Relax test tolerance for cross correlation measure for densities

Some compilers do not perform the optimisations that are necessary to
reach default real tolerance when calculating the cross correlation of
larger three dimensional densities (100x100x100 voxels)

This patch loosens the requirement on the precision of cross correlation
evaluations.

refs #3205

Change-Id: Iad1bb1b17bb6935b2ce2537bcbb6c3902172a6bc

Revision a83b3bdd (diff)
Added by Test User 7 months ago

Relax cross-correlation test precision

Relase the floating point precision in cross-correlation evaluations on
large data arrays further. i386 fails here with debian experimental build.

Refs #3205

Change-Id: I86c9ad1804da5be98671ac14a5bc37ea5636aad0

Revision 93b5f239 (diff)
Added by Mark Abraham 7 months ago

Provide more feedback when listed forces tests fail

Refs #3205

Change-Id: Ic012aeaf8b35af88b78e109acabb6048d593cab3

Revision c13f4653 (diff)
Added by Mark Abraham 7 months ago

Avoid 386 O3 codegen bug

Refs #3205

Change-Id: I2aa5018bde089b11497892dd68a7f9715e69e20e

Revision af698c8f (diff)
Added by Mark Abraham 6 months ago

Avoid 386 O3 codegen bug even more

Refs #3205

Change-Id: I78f3cc03da36e129a842507f9cdd3d170beb8c08

History

#1 Updated by Mark Abraham 8 months ago

Szilárd Páll wrote:

The results of the beta1 show a couple of failures:
https://buildd.debian.org/status/package.php?p=gromacs&suite=experimental

Some of these have been resolved in beta2, other have (likely) not, in particular:

Done in c5a01a73e9095ae3998fc947b7e8bcfac3b57240

[...]
  • i386: ListedForcesTest, MathUnitTest fail

Proposed workarounds at https://gerrit.gromacs.org/c/gromacs/+/14319

  • ppc64el:SimulatorsAreEquivalentDefaultModular/SimulatorComparisonTest.WithinTolerances/3

Done in 536052b4c4ff46c700bde79ab00c022e817ac936

On most the other hardware there are failures in MdrunTests, FileIOTests, MdrunNonIntegratorTests

FileIOTests fails with some mrc reading stuff, e.g. https://buildd.debian.org/status/fetch.php?pkg=gromacs&arch=hppa&ver=2020%7Ebeta1-2&stamp=1571308902&raw=0. MdrunTests are failing on DensityFitting tests at https://buildd.debian.org/status/fetch.php?pkg=gromacs&arch=powerpc&ver=2020%7Ebeta1-2&stamp=1571361286&raw=0. I can see the latter throwing when we try to make a vector that is too large, which is generally because some size field is uninitialized or reads a garbage large integer. Is there something broken on some kind of weird hardware, Christian?

#2 Updated by Christian Blau 8 months ago

All density related failures seem to stem from the endianess issue I am currently working to fix, also the MdrunTests DensityFitting test because this is reading a reference density.

#3 Updated by Nicholas Breen 7 months ago

beta2 + the three patches Mark listed does a little better across architectures, though it uncovers a new problem on amd64 and x32 - NormalIntegrators/MdrunNoAppendContinuationIsExact.WithinTolerances/10 fails in single precision with a difference of 100 ULPs > allowed tolerance of 32 ULPs. I can't reproduce this one locally.

amd64 log: https://buildd.debian.org/status/fetch.php?pkg=gromacs&arch=amd64&ver=2020%7Ebeta2-1&stamp=1574545386&raw=0

The remaining failures look like the ones already identified from beta1, mostly FileIOTests and DensityFitting.

#4 Updated by Mark Abraham 7 months ago

Nicholas Breen wrote:

beta2 + the three patches Mark listed does a little better across architectures, though it uncovers a new problem on amd64 and x32 - NormalIntegrators/MdrunNoAppendContinuationIsExact.WithinTolerances/10 fails in single precision with a difference of 100 ULPs > allowed tolerance of 32 ULPs. I can't reproduce this one locally.

amd64 log: https://buildd.debian.org/status/fetch.php?pkg=gromacs&arch=amd64&ver=2020%7Ebeta2-1&stamp=1574545386&raw=0

OK I'll look into that one, but probably we'll increase the tolerance

The remaining failures look like the ones already identified from beta1, mostly FileIOTests and DensityFitting.

Likely resolved by Christian's work now submitted on release-2020 branch.

#5 Updated by Mark Abraham 7 months ago

Mark Abraham wrote:

Nicholas Breen wrote:

beta2 + the three patches Mark listed does a little better across architectures, though it uncovers a new problem on amd64 and x32 - NormalIntegrators/MdrunNoAppendContinuationIsExact.WithinTolerances/10 fails in single precision with a difference of 100 ULPs > allowed tolerance of 32 ULPs. I can't reproduce this one locally.

amd64 log: https://buildd.debian.org/status/fetch.php?pkg=gromacs&arch=amd64&ver=2020%7Ebeta2-1&stamp=1574545386&raw=0

OK I'll look into that one, but probably we'll increase the tolerance

Done at https://gerrit.gromacs.org/c/gromacs/+/14479

#6 Updated by Paul Bauer 7 months ago

  • Status changed from New to Fix uploaded
  • Target version changed from 2020-beta3 to 2020-rc1

I think most of those have been addressed now, but we should check again after beta3

#7 Updated by Mark Abraham 7 months ago

I've seen that Nicholas has tested that the bd fixes were effective. Remaining issues e.g. at https://buildd.debian.org/status/fetch.php?pkg=gromacs&arch=i386&ver=2020%7Ebeta2-2&stamp=1575010762&raw=0 have probably been fixed by Christian already, but not in the version Nicholas tested.

#8 Updated by Szilárd Páll 7 months ago

Mark Abraham wrote:

Szilárd Páll wrote:

The results of the beta1 show a couple of failures:
https://buildd.debian.org/status/package.php?p=gromacs&suite=experimental

Some of these have been resolved in beta2, other have (likely) not, in particular:

Done in c5a01a73e9095ae3998fc947b7e8bcfac3b57240

[...]
  • i386: ListedForcesTest, MathUnitTest fail

Proposed workarounds at https://gerrit.gromacs.org/c/gromacs/+/14319

Still failing:
https://buildd.debian.org/status/fetch.php?pkg=gromacs&arch=i386&ver=2020%7Ebeta3-1&stamp=1575458804&raw=0

#9 Updated by Mark Abraham 7 months ago

Failed a bunch of Angle/ListedForcesTest.Ifunc/* for mysterious reasons and also

14: [ RUN      ] DensitySimilarityTest.CrossCorrelationIsOne
14: /<<PKGBUILDDIR>>/src/gromacs/math/tests/densityfit.cpp:214: Failure
14:   Value of: measure.similarity(comparedDensity.asConstView())
14:     Actual: 0.99999999999703382
14:   Expected: expectedSimilarity
14:   Which is: 1
14: Difference: 2.96618e-12 (26717 double-prec. ULPs, rel. 2.97e-12)
14:  Tolerance: abs. 2.22045e-12, 10000 ULPs
14: [  FAILED  ] DensitySimilarityTest.CrossCorrelationIsOne (74 ms)
14: [ RUN      ] DensitySimilarityTest.CrossCorrelationIsMinusOneWhenAntiCorrelated
14: /<<PKGBUILDDIR>>/src/gromacs/math/tests/densityfit.cpp:234: Failure
14:   Value of: measure.similarity(comparedDensity.asConstView())
14:     Actual: -0.99999999999703382
14:   Expected: expectedSimilarity
14:   Which is: -1
14: Difference: 2.96618e-12 (26717 double-prec. ULPs, rel. 2.97e-12)
14:  Tolerance: 10000 ULPs
14: [  FAILED  ] DensitySimilarityTest.CrossCorrelationIsMinusOneWhenAntiCorrelated (86 ms)

which looks like a tolerance that should be relaxed (only at double?)

#10 Updated by Christian Blau 7 months ago

Yes, tolerance may be relaxed here. Without the proper optimisations this is 100'000 elements multiplied and added, so relaxing ULP tolerance to roughly 100'000 should fix the issue without compromiting the test.

#11 Updated by Mark Abraham 7 months ago

Nicholas, https://gerrit.gromacs.org/#/c/14624 would give us more insight into the Angles failure on 386. It uses a posix specific function, so doesn't pass all our testing, but if possible to run the 386 case again on beta 3, plus this patch, we should know what to do about it! Thanks!

#12 Updated by Szilárd Páll 7 months ago

Mark Abraham wrote:

Nicholas, https://gerrit.gromacs.org/#/c/14624 would give us more insight into the Angles failure on 386. It uses a posix specific function, so doesn't pass all our testing, but if possible to run the 386 case again on beta 3, plus this patch, we should know what to do about it! Thanks!

I could reproduce it in a vm, here is the output (this is with gcc 8.3): https://termbin.com/yb5e

Full make check output uploaded, in my hands it also produces MdrunIsReproduced/MdrunRerunFreeEnergyTest.WithinTolerances failures.

#13 Updated by Nicholas Breen 7 months ago

My local builds give identical results to Szilárd's, except that MdrunRerunFreeEnergyTest.WithinTolerances passed without error.

#14 Updated by Mark Abraham 7 months ago

The code for the angles test looks fine to me, so I'm 99% confident this is buggy code generation.

I'm curious whether the problem might be

  • the scaling by DEG2RAD on lines 927 & 928 of listed_forces/bonded.cpp,
  • the += on line 926 (since in this test we only evaluate one interaction, simple assignment should pass the test also),
  • one or other part of the computation on line 451, or
  • that the parameters looked up from the t_iparams union for lines 926-928 are somehow wrong.

if someone with the ability to test locally wants to put in some printf!

#15 Updated by Nicholas Breen 7 months ago

Buggy code generation at higher optimization levels does indeed seem to be the culprit.
  • Default compiler options at -O3 -> test fails
  • Compiling at -O2 -> test passes
  • Remaining at -O3, adding printf("index %d\n", i); at line 925 -> test passes

This is with gcc 9.2.1. Hard to nail down the exact problem when even a trivial printf like that avoids the misoptimization!

#16 Updated by Mark Abraham 7 months ago

Nicholas Breen wrote:

Buggy code generation at higher optimization levels does indeed seem to be the culprit.
  • Default compiler options at -O3 -> test fails
  • Compiling at -O2 -> test passes
  • Remaining at -O3, adding printf("index %d\n", i); at line 925 -> test passes

This is with gcc 9.2.1. Hard to nail down the exact problem when even a trivial printf like that avoids the misoptimization!

Thanks. Hopefully https://gerrit.gromacs.org/c/gromacs/+/14658 is an effective work around

#17 Updated by Nicholas Breen 7 months ago

Mark Abraham wrote:

Thanks. Hopefully https://gerrit.gromacs.org/c/gromacs/+/14658 is an effective work around

In my VM, the tests still fail with that patch, but they do pass if it's modified to use -O1 instead of -O2. I can't explain that one yet.

The tests also pass by applying the attached patch to designate one variable as volatile, without the use of the 14658 patch. Proof of concept only - that patch most likely breaks every other template with data-type mismatches. Perhaps it gives a clue about the mistaken code generation?

#18 Updated by Szilárd Páll 7 months ago

I've done some tests in a VM earlier and the GMX_SIMD=None build + 14658 had several failures with multiple compilers (gcc7/8 clang8), among others ListedForcesTests, TableUnitTests, and SimdUnitTests. Not sure if this suggest a fragility of GROMACS non-SIMD builds or several buggy compilers on i386.

Having looked at the beta3 results there seem to be some hppa failures with PME unit tests in SP that are suspicious (all manage only 6 ULP) and the sh4 build seems to fail is it can't find fftw.

#19 Updated by Paul Bauer 7 months ago

  • Status changed from Fix uploaded to Feedback wanted
  • Target version changed from 2020-rc1 to 2020

giving it one more bump to get more feedback

#20 Updated by Nicholas Breen 7 months ago

With -rc1 on i386, I still get the error unless changing the #pragma fix at src/gromacs/listed_forces/bonded.cpp:899 to -O1, as in comment 17.

I haven't yet uploaded it for autobuilding on the other architectures, since at the moment the builds fail for an entirely unrelated documentation build failure. (Not yet diagnosed far enough for a useful bug report or patch, but it looks to be entirely external: fails with doxygen 1.8.16, passes with 1.8.15.)

#21 Updated by Paul Bauer 6 months ago

  • Target version changed from 2020 to 2020.1

I think we should revisit this for 2020.1 to make sure any bugs that are still open for the final 2020 release will be fixed by then.
Thanks for your help in testing things!

#22 Updated by Mark Abraham 6 months ago

Nicholas Breen wrote:

With -rc1 on i386, I still get the error unless changing the #pragma fix at src/gromacs/listed_forces/bonded.cpp:899 to -O1, as in comment 17.

Done in https://gerrit.gromacs.org/c/gromacs/+/15139

#23 Updated by Mark Abraham 6 months ago

  • Target version changed from 2020.1 to 2020

#24 Updated by Mark Abraham 6 months ago

  • Target version changed from 2020 to 2020.1

Sorry, we should definitely revisit once we have more debian feedback

#25 Updated by Nicholas Breen 6 months ago

With the 2020 release, I think this can be closed; builds are successful on all Debian release architectures so far. Still waiting on the two MIPS builds to finish, but those built the betas successfully and haven't generally been a problem.

Three remaining build failures in the less-common ports are not concerns: hppa (PA-RISC) is another ULP tolerance test failure, but it's not worth relaxing again just for that, and SuperH and RISC-V have underlying toolchain problems instead.

#26 Updated by Mark Abraham 6 months ago

Thanks Nicholas, I concur

#27 Updated by Szilárd Páll 6 months ago

  • Status changed from Feedback wanted to Resolved

Thanks for the feedback!

#28 Updated by Christoph Junghans 6 months ago

This fixes #2366 as well.

#29 Updated by Paul Bauer 6 months ago

  • Status changed from Resolved to Closed

Thanks for the feedback, this helped fixing a number of important issues!

Also available in: Atom PDF