PME solving test missing reference data failures on various configurations
The same LJ PME solver test fails repeatedly on the same ARM_NEON Jenkins node on the same complex grid cell.
gcc-4.8 simd=ARM_NEON release host=bs_jetson_tk1,bs_jetson_tk1 DifferentEwaldCoeffLJ/PmeSolveTest.ReproducesOutputs/3 (from DifferentEwaldCoeffLJ_PmeSolveTest) /home/jenkins/workspace/Matrix_PostSubmit_master/b375e229/gromacs/src/testutils/refdata.cpp:925 Reference data item /ComplexSpaceGrid/Cell 0 4 0 im not found Google Test trace: /home/jenkins/workspace/Matrix_PostSubmit_master/b375e229/gromacs/src/gromacs/ewald/tests/pmesolvetest.cpp:128: Testing solving (Lennard-Jones, YZX, without energy/virial) with CPU for PME grid size 9 7 23, Ewald coefficients 2 0.3
Adjust PME LJ solver test input coefficient
The previous value caused one of the unit tests to fail predictably
on ARM_NEON SIMD. Coefficient was so low that it caused a specific
grid value to hover at the GMX_FLOAT_MIN threshold, which is used
to allow using same test reference data for single/double precision.
#6 Updated by Aleksei Iupinov almost 2 years ago
- Status changed from Resolved to Accepted
So the same kind of failures can happen on other configurations as well, as shown in the related issue.
I should revise all the unit test input Ewald coefficients to sane values corresponding to 1 / (box sizes).
#9 Updated by Aleksei Iupinov almost 2 years ago
- Subject changed from small LJ PME test failure on jetson tk1 to PME solving test missing reference data failures on various configurations
You're right, apparently this got fixed by 36311d95 (Improve accuracy of SIMD exp for small args),
at least when tested with Roland's settings - gcc 5.4 on Haswell, -DGMX_SIMD=Reference -DGMX_SIMD_REF_FLOAT_WIDTH=16 -DGMX_SIMD_REF_DOUBLE_WIDTH=8 -DCMAKE_BUILD_TYPE=Debug.
Unless people know other configurations in which this still fails, I'll close this in a day.
#11 Updated by Aleksei Iupinov almost 2 years ago
- Status changed from Resolved to Closed
And as Berk explained, it's fixed because ldexp now runs in safe mode by default.
To reiterate, if we decide to run ldexp in unsafe mode in PME solve, and the issue pops up again, one solution would be to increase all the ewaldCoeff_* input values in src/gromacs/ewald/tests/pmesolvetest.cpp to be much greater than (1.0 / (any of input boxes dimensions)), and regenerate the ewald-test reference data. One can verify that the grid values in reference data would then be much larger than GMX_FLOAT_MIN, so that we would only be testing realistic cases.