Project

General

Profile

Bug #2234

PME solving test missing reference data failures on various configurations

Added by Aleksei Iupinov about 2 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Category:
-
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

The same LJ PME solver test fails repeatedly on the same ARM_NEON Jenkins node on the same complex grid cell.

Example:
http://jenkins.gromacs.org/job/Matrix_PostSubmit_master/126/OPTIONS=gcc-4.8%20simd=ARM_NEON%20release%20host=bs_jetson_tk1,label=bs_jetson_tk1/testReport/junit/(root)/DifferentEwaldCoeffLJ_PmeSolveTest/ReproducesOutputs_3/

gcc-4.8 simd=ARM_NEON release host=bs_jetson_tk1,bs_jetson_tk1

DifferentEwaldCoeffLJ/PmeSolveTest.ReproducesOutputs/3 (from DifferentEwaldCoeffLJ_PmeSolveTest)

/home/jenkins/workspace/Matrix_PostSubmit_master/b375e229/gromacs/src/testutils/refdata.cpp:925
Reference data item /ComplexSpaceGrid/Cell 0 4 0 im not found
Google Test trace:
/home/jenkins/workspace/Matrix_PostSubmit_master/b375e229/gromacs/src/gromacs/ewald/tests/pmesolvetest.cpp:128: Testing solving (Lennard-Jones, YZX, without energy/virial) with CPU for PME grid size 9 7 23, Ewald coefficients 2 0.3

Related issues

Related to GROMACS - Bug #2242: PME LJ tests fail with SIMD with an fp exceptionClosed
Related to GROMACS - Bug #2243: Reference SIMD exp doesn't flush to zero for very large negative inputsClosed

Associated revisions

Revision c9a72e5d (diff)
Added by Aleksei Iupinov about 2 years ago

Adjust PME LJ solver test input coefficient

The previous value caused one of the unit tests to fail predictably
on ARM_NEON SIMD. Coefficient was so low that it caused a specific
grid value to hover at the GMX_FLOAT_MIN threshold, which is used
to allow using same test reference data for single/double precision.

Ref #2234

Change-Id: Ia1aa51ead263e82487585abb167c4d080fd813ac

History

#1 Updated by Aleksei Iupinov about 2 years ago

  • Subject changed from small LJ PME test failure on jetson k1 to small LJ PME test failure on jetson tk1

#2 Updated by Gerrit Code Review Bot about 2 years ago

Gerrit received a related patchset '2' for Issue #2234.
Uploader: Aleksei Iupinov ()
Change-Id: gromacs~master~Ia1aa51ead263e82487585abb167c4d080fd813ac
Gerrit URL: https://gerrit.gromacs.org/6861

#3 Updated by Aleksei Iupinov about 2 years ago

  • Status changed from Accepted to Fix uploaded

#4 Updated by Aleksei Iupinov about 2 years ago

  • Status changed from Fix uploaded to Resolved

#5 Updated by Aleksei Iupinov about 2 years ago

  • Related to Bug #2242: PME LJ tests fail with SIMD with an fp exception added

#6 Updated by Aleksei Iupinov about 2 years ago

  • Status changed from Resolved to Accepted

So the same kind of failures can happen on other configurations as well, as shown in the related issue.
I should revise all the unit test input Ewald coefficients to sane values corresponding to 1 / (box sizes).

#7 Updated by Mark Abraham about 2 years ago

  • Target version set to 2018

#8 Updated by Szilárd Páll about 2 years ago

  • Status changed from Accepted to Resolved

Can we close this?

#9 Updated by Aleksei Iupinov about 2 years ago

  • Subject changed from small LJ PME test failure on jetson tk1 to PME solving test missing reference data failures on various configurations

You're right, apparently this got fixed by 36311d95 (Improve accuracy of SIMD exp for small args),
at least when tested with Roland's settings - gcc 5.4 on Haswell, -DGMX_SIMD=Reference -DGMX_SIMD_REF_FLOAT_WIDTH=16 -DGMX_SIMD_REF_DOUBLE_WIDTH=8 -DCMAKE_BUILD_TYPE=Debug.
How nice.
Unless people know other configurations in which this still fails, I'll close this in a day.

#10 Updated by Aleksei Iupinov about 2 years ago

  • Related to Bug #2243: Reference SIMD exp doesn't flush to zero for very large negative inputs added

#11 Updated by Aleksei Iupinov about 2 years ago

  • Status changed from Resolved to Closed

And as Berk explained, it's fixed because ldexp now runs in safe mode by default.
To reiterate, if we decide to run ldexp in unsafe mode in PME solve, and the issue pops up again, one solution would be to increase all the ewaldCoeff_* input values in src/gromacs/ewald/tests/pmesolvetest.cpp to be much greater than (1.0 / (any of input boxes dimensions)), and regenerate the ewald-test reference data. One can verify that the grid values in reference data would then be much larger than GMX_FLOAT_MIN, so that we would only be testing realistic cases.

Also available in: Atom PDF