Project

General

Profile

Bug #2325

avx512 double precision simd failure

Added by Mark Abraham almost 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
core library
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

With gcc 7.1 on dev-purley01 (AVX512) in double precision:

[ RUN      ] SimdFloatingpointUtilTest.transposeScatterStoreU3
../src/gromacs/simd/tests/simd_floatingpoint_util.cpp:303: Failure
  Value of: mem0_[j]
    Actual: 1001.0000000018564
  Expected: refmem[j]
  Which is: 0.10000000000000223
Difference: 1000.9 (60145924867270031 double-prec. ULPs, rel. 1e+04)
 Tolerance: abs. 8.88178e-16, 4 ULPs
../src/gromacs/simd/tests/simd_floatingpoint_util.cpp:303: Failure
  Value of: mem0_[j]
    Actual: 1002.0000000000223
  Expected: refmem[j]
  Which is: 0.20000000000000445
Difference: 1001.8 (55651121332905610 double-prec. ULPs, rel. 5.01e+03)
 Tolerance: abs. 8.88178e-16, 4 ULPs
../src/gromacs/simd/tests/simd_floatingpoint_util.cpp:303: Failure
  Value of: mem0_[j]
    Actual: 4.667261465178151e-62
  Expected: refmem[j]
  Which is: 1.0000000000000222
Difference: 1 (917833604050253824 double-prec. ULPs, rel. 1)
 Tolerance: abs. 8.88178e-16, 4 ULPs
../src/gromacs/simd/tests/simd_floatingpoint_util.cpp:303: Failure
  Value of: mem0_[j]
    Actual: 1010.0000000018612
  Expected: refmem[j]
  Which is: 1.1000000000000245
...

which leads various other things to fail, including the settle unit tests to segfault

...
[----------] 24 tests from WithParameters/SettleTest
[ RUN      ] WithParameters/SettleTest.SatisfiesConstraints/0

Associated revisions

Revision 22213935 (diff)
Added by Erik Lindahl almost 2 years ago

Work around AVX-512 issues in gcc-5.4 and 7.1

Fixes compilation issues with mixed and double precision builds using
AVX-512 SIMD with gcc-5.4 or gcc-7.1. Also tested with gcc-6.3, and
Debug as well as Release builds for all three versions, all of which
now pass the simd unit tests.

Fixes #2325.

Change-Id: I59c3ae0467b51412d1ebbb5b57a248534288a5db

Revision 80dd3f5b (diff)
Added by Mark Abraham almost 2 years ago

Update double-precision test configurations

These changes improve coverage of double precision, using more release
mode, particularly with latest gcc and icc, and using 128-bit SIMD,
which have been cases that were buggy recently. The other aspects of
the configurations that have been modified have been
non-critical. Where appropriate, brief rationales are recorded. This
resolves an old TODO item in the post-submit matrix.

Fixed a sign mismatch in initializing an OpenCL variable that didn't
need to be initialized.

Noted relevant new TODOs.

Refs #2300, #2325, #2326, #2334, #2335, #2336, #2337, #2338

Change-Id: I131fa1a6776d1e7809799c3f931a1fc8100fcdc9

History

#1 Updated by Mark Abraham almost 2 years ago

  • Description updated (diff)

#2 Updated by Paul Bauer almost 2 years ago

Reproduced with gcc-6 and double prevision with GMX_SIMD=AVX_512. Looking into it if I find something obvious

#3 Updated by Paul Bauer almost 2 years ago

When stepping through the simd function in the debugger I noticed that the memory in the base address becomes overwritten with garbage at this point

_mm512_i32scatter_pd(base,   simdoffset.simdInternal_, v0.simdInternal_, sizeof(double));

but I don't understand enough of it to know why. Hope that helps the more knowledgeable people.

#4 Updated by Roland Schulz almost 2 years ago

I can't reproduce this. Both with GCC 6.1 and 7.3 the simd unit tests pass.
In both cases I have failures:
7 - EwaldUnitTests (Failed)
17 - TableUnitTests (Failed)
Both those seem unrelated. Did you add any other flags besides "-DGMX_SIMD=AVX_512 -DGMX_DOUBLE=yes"?

#5 Updated by Paul Bauer almost 2 years ago

I used this as my cmake command line options

"-DGMX_DOUBLE=yes -DGMX_SIMD=AVX_512 -DCMAKE_C_COMPILER=gcc-6 -DCMAKE_CXX_COMPILER=g++-6"

When building from the latest source I also can no longer reproduce this, strangely enough, and I get the same failures in Ewald and Table tests that seem to be simple precision issues.

#6 Updated by Gerrit Code Review Bot almost 2 years ago

Gerrit received a related patchset '1' for Issue #2325.
Uploader: Mark Abraham ()
Change-Id: gromacs~release-2018~I131fa1a6776d1e7809799c3f931a1fc8100fcdc9
Gerrit URL: https://gerrit.gromacs.org/7303

#7 Updated by Erik Lindahl almost 2 years ago

  • Status changed from New to Rejected

Feel free to reopen if it can be reproduced.

#8 Updated by Erik Lindahl almost 2 years ago

  • Status changed from Rejected to Closed

#9 Updated by Mark Abraham almost 2 years ago

On dev-purley01, I can reproduce this with release-2017 HEAD b01a10543dcec1ea87e409a28821c3f668e04b2b with a gcc 7.1 AVX512 double-precision Debug build, but not a Release build. With gcc 6-4, again Debug had an issue and Release did not. (All gave the table error from #2336, too.)

All the failing configs warn about "warning: pointer of type ‘void *’ used in arithmetic [-Wpointer-arith]" three times in transposeScatterStoreU<3> called from multiple places. Notice how the simd test failure has failed to replace mem0 with the correct values during the scatter. That's clearly a gcc bug (Release works ok, all the pointers we use have a non-void base type, so it seems to be doing an incorrect code transformation). Note that gcc permits void-pointer arithmetic as an extension (but I don't think this is related). I will work on a repro case to file a bug

#10 Updated by Mark Abraham almost 2 years ago

  • Status changed from Closed to Blocked, need info
  • Affected version changed from 2018-beta1 to 2018-beta2

Still present in 2018-beta2 with gcc 6.4 in Debug, both double and mixed builds.

Reopened so that I have a reminder to report this bug. We also need some further investigation and perhaps some kind of workaround so that people can't use whatever range of compilers and build configurations are vulnerable to it.

#11 Updated by Gerrit Code Review Bot almost 2 years ago

Gerrit received a related patchset '1' for Issue #2325.
Uploader: Erik Lindahl ()
Change-Id: gromacs~release-2018~I59c3ae0467b51412d1ebbb5b57a248534288a5db
Gerrit URL: https://gerrit.gromacs.org/7336

#12 Updated by Erik Lindahl almost 2 years ago

I think I managed to work around it.

#13 Updated by Erik Lindahl almost 2 years ago

  • Status changed from Blocked, need info to Fix uploaded

#14 Updated by Erik Lindahl almost 2 years ago

  • Status changed from Fix uploaded to Resolved

#15 Updated by Erik Lindahl almost 2 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF