avx512 double precision simd failure
With gcc 7.1 on dev-purley01 (AVX512) in double precision:
[ RUN ] SimdFloatingpointUtilTest.transposeScatterStoreU3 ../src/gromacs/simd/tests/simd_floatingpoint_util.cpp:303: Failure Value of: mem0_[j] Actual: 1001.0000000018564 Expected: refmem[j] Which is: 0.10000000000000223 Difference: 1000.9 (60145924867270031 double-prec. ULPs, rel. 1e+04) Tolerance: abs. 8.88178e-16, 4 ULPs ../src/gromacs/simd/tests/simd_floatingpoint_util.cpp:303: Failure Value of: mem0_[j] Actual: 1002.0000000000223 Expected: refmem[j] Which is: 0.20000000000000445 Difference: 1001.8 (55651121332905610 double-prec. ULPs, rel. 5.01e+03) Tolerance: abs. 8.88178e-16, 4 ULPs ../src/gromacs/simd/tests/simd_floatingpoint_util.cpp:303: Failure Value of: mem0_[j] Actual: 4.667261465178151e-62 Expected: refmem[j] Which is: 1.0000000000000222 Difference: 1 (917833604050253824 double-prec. ULPs, rel. 1) Tolerance: abs. 8.88178e-16, 4 ULPs ../src/gromacs/simd/tests/simd_floatingpoint_util.cpp:303: Failure Value of: mem0_[j] Actual: 1010.0000000018612 Expected: refmem[j] Which is: 1.1000000000000245 ...
which leads various other things to fail, including the settle unit tests to segfault
... [----------] 24 tests from WithParameters/SettleTest [ RUN ] WithParameters/SettleTest.SatisfiesConstraints/0
Work around AVX-512 issues in gcc-5.4 and 7.1
Fixes compilation issues with mixed and double precision builds using
AVX-512 SIMD with gcc-5.4 or gcc-7.1. Also tested with gcc-6.3, and
Debug as well as Release builds for all three versions, all of which
now pass the simd unit tests.
Update double-precision test configurations
These changes improve coverage of double precision, using more release
mode, particularly with latest gcc and icc, and using 128-bit SIMD,
which have been cases that were buggy recently. The other aspects of
the configurations that have been modified have been
non-critical. Where appropriate, brief rationales are recorded. This
resolves an old TODO item in the post-submit matrix.
Fixed a sign mismatch in initializing an OpenCL variable that didn't
need to be initialized.
Noted relevant new TODOs.
#3 Updated by Paul Bauer over 2 years ago
When stepping through the simd function in the debugger I noticed that the memory in the base address becomes overwritten with garbage at this point
_mm512_i32scatter_pd(base, simdoffset.simdInternal_, v0.simdInternal_, sizeof(double));
but I don't understand enough of it to know why. Hope that helps the more knowledgeable people.
#4 Updated by Roland Schulz over 2 years ago
I can't reproduce this. Both with GCC 6.1 and 7.3 the simd unit tests pass.
In both cases I have failures:
7 - EwaldUnitTests (Failed)
17 - TableUnitTests (Failed)
Both those seem unrelated. Did you add any other flags besides "-DGMX_SIMD=AVX_512 -DGMX_DOUBLE=yes"?
#5 Updated by Paul Bauer over 2 years ago
I used this as my cmake command line options
"-DGMX_DOUBLE=yes -DGMX_SIMD=AVX_512 -DCMAKE_C_COMPILER=gcc-6 -DCMAKE_CXX_COMPILER=g++-6"
When building from the latest source I also can no longer reproduce this, strangely enough, and I get the same failures in Ewald and Table tests that seem to be simple precision issues.
#9 Updated by Mark Abraham over 2 years ago
On dev-purley01, I can reproduce this with release-2017 HEAD b01a10543dcec1ea87e409a28821c3f668e04b2b with a gcc 7.1 AVX512 double-precision Debug build, but not a Release build. With gcc 6-4, again Debug had an issue and Release did not. (All gave the table error from #2336, too.)
All the failing configs warn about "warning: pointer of type ‘void *’ used in arithmetic [-Wpointer-arith]" three times in transposeScatterStoreU<3> called from multiple places. Notice how the simd test failure has failed to replace mem0 with the correct values during the scatter. That's clearly a gcc bug (Release works ok, all the pointers we use have a non-void base type, so it seems to be doing an incorrect code transformation). Note that gcc permits void-pointer arithmetic as an extension (but I don't think this is related). I will work on a repro case to file a bug
#10 Updated by Mark Abraham over 2 years ago
- Status changed from Closed to Blocked, need info
- Affected version changed from 2018-beta1 to 2018-beta2
Still present in 2018-beta2 with gcc 6.4 in Debug, both double and mixed builds.
Reopened so that I have a reminder to report this bug. We also need some further investigation and perhaps some kind of workaround so that people can't use whatever range of compilers and build configurations are vulnerable to it.