Project

General

Profile

Bug #2421

EwaldUnitTests and SimdUnitTests fail on ppc64le with gcc-8.0.1

Added by Christoph Junghans 12 months ago. Updated about 2 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
core library
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

/builddir/build/BUILD/gromacs-2018/src/testutils/refdata.cpp:925: Failure
   In item: /Forces/[0]/X
    Actual: -11.769416809082031
 Reference: -10.129964828491211
Difference: 1.63945 (1719090 single-prec. ULPs, rel. 0.162)
 Tolerance: abs. 1.43051e-06, 12 ULPs
Google Test trace:
/builddir/build/BUILD/gromacs-2018/src/gromacs/ewald/tests/pmegathertest.cpp:413: Testing force gathering with CPU for PME grid size 16 12 14, order 4, 1 atoms, without reduction

and
/builddir/build/BUILD/gromacs-2018/src/gromacs/simd/tests/bootstrap_loadstore.cpp:112: Failure
Value of: pCopyDst[i]
  Actual: 5
Expected: pCopySrc[i]
Which is: 6
SIMD load or store not moving data correctly for element 0

Found here: https://koji.fedoraproject.org/koji/taskinfo?taskID=25294220 (detailed build.log for ppe64le attached)
build.log (35.2 MB) build.log Christoph Junghans, 02/25/2018 02:54 AM
build.log (35.1 MB) build.log Christoph Junghans, 03/20/2018 06:33 AM
build.log.txt (35.6 MB) build.log.txt Christoph Junghans, 06/06/2018 02:03 AM
build.log.txt (35.6 MB) build.log.txt Christoph Junghans, 06/07/2018 02:40 PM
marks-build-log.txt (1.48 MB) marks-build-log.txt Mark Abraham, 06/12/2018 10:27 PM
tmpi-double.txt (6.83 MB) tmpi-double.txt Mark Abraham, 06/12/2018 10:27 PM

Associated revisions

Revision e6932745 (diff)
Added by Mark Abraham 11 months ago

Fix VSX SIMD with gcc 8

gcc 8 apparently generates different code for the former GROMACS
code, which seems buggy. See
https://bugzilla.redhat.com/show_bug.cgi?id=1556989#c3
and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84907

Fixes #2421

Change-Id: I31492cd582b785cdfb42e8b999a165a7339ce4be

Revision f1712c37 (diff)
Added by Mark Abraham 11 months ago

More gcc-8 fixes for POWER

Fixes #2421

Change-Id: I43c13df3a217d0f2154dca0ef215efd9cd27474a

Revision 9ae6059d (diff)
Added by Mark Abraham 8 months ago

More fixes to suit gcc 8 for double build

gcc 7 supports the same syntax, but only gcc 8 requires it.

xl code path untested, because we have no access to a working compiler

Fixes #2421

Change-Id: I8f89af4b066be68e07a286a9fa45b8ded3c925f3

History

#1 Updated by Christoph Junghans 12 months ago

I think it is problem with gcc-8 as the same build worked fine on Fedora27 with gcc-7.2.1 (https://koji.fedoraproject.org/koji/buildinfo?buildID=1019225)

#2 Updated by Christoph Junghans 12 months ago

And this is with GMX_SIMD=IBM_VSX.

#3 Updated by Mark Abraham 12 months ago

gcc 8.0.1 is still pre-release. I don't have access to it anywhere, since my access is all to production login nodes. So I suggest we file a regression report with gcc.

#4 Updated by Christoph Junghans 12 months ago

You could do:

$ docker pull fedora:rawhide
$ docker run -it fedora:rawhide /bin/bash

#5 Updated by Mark Abraham 12 months ago

Christoph Junghans wrote:

You could do:
[...]

Yes, but that won't work on any machine capable of executing VSX instructions.

#6 Updated by Erik Lindahl 12 months ago

This appears to be bugs in the very basic load/store operations. Unfortunately I've seen those before, and although it might be possible to work around some of them, it indicates severe compiler bugs.

#7 Updated by Christoph Junghans 11 months ago

GCC upstream thinks this is a bug in Gromacs and proposed a patch here: https://bugzilla.redhat.com/show_bug.cgi?id=1556989#c3

--- gromacs-2018/src/gromacs/simd/impl_ibm_vsx/impl_ibm_vsx_simd_float.h.jj    2017-12-11 13:44:54.000000000 +0100
+++ gromacs-2018/src/gromacs/simd/impl_ibm_vsx/impl_ibm_vsx_simd_float.h    2018-03-16 17:02:45.915186989 +0100
@@ -121,14 +121,14 @@ static inline SimdFloat gmx_simdcall
 simdLoadU(const float *m, SimdFloatTag = {})
 {
     return {
-               *reinterpret_cast<const __vector float *>(m)
+               vec_xl(0, m)
     };
 }

 static inline void gmx_simdcall
 storeU(float *m, SimdFloat a)
 {
-    *reinterpret_cast<__vector float *>(m) = a.simdInternal_;
+    vec_xst(a.simdInternal_, 0, m);
 }

 static inline SimdFloat gmx_simdcall
@@ -157,14 +157,14 @@ static inline SimdFInt32 gmx_simdcall
 simdLoadU(const std::int32_t *m, SimdFInt32Tag)
 {
     return {
-               *reinterpret_cast<const __vector int *>(m)
+               vec_xl(0, m)
     };
 }

 static inline void gmx_simdcall
 storeU(std::int32_t * m, SimdFInt32 a)
 {
-    *reinterpret_cast<__vector int *>(m) = a.simdInternal_;
+    vec_xst(a.simdInternal_, 0, m);
 }

 static inline SimdFInt32 gmx_simdcall
--- gromacs-2018/src/gromacs/simd/impl_ibm_vsx/impl_ibm_vsx_simd_double.h.jj    2017-12-11 13:44:54.000000000 +0100
+++ gromacs-2018/src/gromacs/simd/impl_ibm_vsx/impl_ibm_vsx_simd_double.h    2018-03-16 17:05:08.248153217 +0100
@@ -121,14 +121,14 @@ static inline SimdDouble gmx_simdcall
 simdLoadU(const double *m, SimdDoubleTag = {})
 {
     return {
-               *reinterpret_cast<const __vector double *>(m)
+               vec_xl(0, m)
     };
 }

 static inline void gmx_simdcall
 storeU(double *m, SimdDouble a)
 {
-    *reinterpret_cast<__vector double *>(m) = a.simdInternal_;
+    vec_xst(a.simdInternal_, 0, m);
 }

 static inline SimdDouble gmx_simdcall

#9 Updated by Gerrit Code Review Bot 11 months ago

Gerrit received a related patchset '1' for Issue #2421.
Uploader: Mark Abraham ()
Change-Id: gromacs~release-2018~I31492cd582b785cdfb42e8b999a165a7339ce4be
Gerrit URL: https://gerrit.gromacs.org/7688

#10 Updated by Mark Abraham 11 months ago

With and without that patch, gcc 7.2 in release mode is fine. gcc 6.3 without that patch is fine. But with it:

FAILED: src/gromacs/simd/tests/CMakeFiles/simd-test.dir/bootstrap_loadstore.cpp.o 
/gpfs/software/opt/gcc/6.3.0/bin/g++  -DGMX_DOUBLE=0 -DGTEST_HAS_PTHREAD=1 -DHAVE_CONFIG_H -DTEST_DATA_PATH=\"src/gromacs/simd/tests\" -DTEST_TEMP_PATH=\"/gpfs/homeb/padc/padc010/git/master/build-cmake-gcc-6.3-release/src/gromacs/simd/tests/Testing/Temporary\" -isystem ../src/external/gmock-1.7.0/gtest/include -isystem ../src/external/gmock-1.7.0/include -Isrc -isystem ../src/external/thread_mpi/include -I../src -isystem /gpfs/homeb/padc/padc010/progs/include -mcpu=power8 -mpower8-vector -mpower8-fusion -mdirect-move  -mvsx    -std=c++11  -Wundef -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wmissing-declarations -Wall  -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast  -Wno-array-bounds    -Wno-unused-variable -MD -MT src/gromacs/simd/tests/CMakeFiles/simd-test.dir/bootstrap_loadstore.cpp.o -MF src/gromacs/simd/tests/CMakeFiles/simd-test.dir/bootstrap_loadstore.cpp.o.d -o src/gromacs/simd/tests/CMakeFiles/simd-test.dir/bootstrap_loadstore.cpp.o -c ../src/gromacs/simd/tests/bootstrap_loadstore.cpp
../src/gromacs/simd/tests/bootstrap_loadstore.cpp: In member function ‘virtual void gmx::test::{anonymous}::SimdBootstrapTest_loadUI_Test::TestBody()’:
../src/gromacs/simd/tests/bootstrap_loadstore.cpp:183:1: error: unrecognizable insn:
 }
 ^
(insn 66 65 69 3 (set (reg:V4SI 270)
        (vec_select:V4SI (vec_select:V4SI (mem:V4SI (reg:DI 271 [ ivtmp.541 ]) [0  S16 A8])
                (parallel [
                        (const_int 3 [0x3])
                        (const_int 2 [0x2])
                        (const_int 1 [0x1])
                        (const_int 0 [0])
                    ]))
            (parallel:V4SI [
                    (const_int 2 [0x2])
                    (const_int 3 [0x3])
                    (const_int 0 [0])
                    (const_int 1 [0x1])
                ]))) ../src/gromacs/simd/impl_ibm_vsx/impl_ibm_vsx_simd_float.h:161 -1
     (expr_list:REG_DEAD (reg:DI 271 [ ivtmp.541 ])
        (expr_list:REG_EQUAL (vec_select:V4SI (vec_select:V4SI (mem:V4SI (reg:DI 234 [ ivtmp.541 ]) [0  S16 A8])
                    (parallel [
                            (const_int 3 [0x3])
                            (const_int 2 [0x2])
                            (const_int 1 [0x1])
                            (const_int 0 [0])
                        ]))
                (parallel:V4SI [
                        (const_int 2 [0x2])
                        (const_int 3 [0x3])
                        (const_int 0 [0])
                        (const_int 1 [0x1])
                    ]))
            (nil))))
../src/gromacs/simd/tests/bootstrap_loadstore.cpp:183:1: internal compiler error: in extract_insn, at recog.c:2287
0x107c87f3 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
    ../../gcc-6.3.0/gcc/rtl-error.c:108
0x107c8857 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
    ../../gcc-6.3.0/gcc/rtl-error.c:116
0x1078f447 extract_insn(rtx_insn*)
    ../../gcc-6.3.0/gcc/recog.c:2287
0x1067607b scan_one_insn
    ../../gcc-6.3.0/gcc/ira-costs.c:1431
0x1067607b process_bb_for_costs
    ../../gcc-6.3.0/gcc/ira-costs.c:1592
0x10677727 find_costs_and_classes
    ../../gcc-6.3.0/gcc/ira-costs.c:1699
0x10679117 ira_set_pseudo_classes(bool, _IO_FILE*)
    ../../gcc-6.3.0/gcc/ira-costs.c:2239
0x1069f7cf move_loop_invariants()
    ../../gcc-6.3.0/gcc/loop-invariant.c:2254
0x106999cb execute
    ../../gcc-6.3.0/gcc/loop-init.c:523
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

#11 Updated by Erik Lindahl 11 months ago

Based on the IBM feedback, it's only version 8 that will get the ABI change.

I would suggest that we simply add a check for the gcc version and use the old construct for version < 7.

Cheers,

Erik

#12 Updated by Gerrit Code Review Bot 11 months ago

Gerrit received a related patchset '1' for Issue #2421.
Uploader: Mark Abraham ()
Change-Id: gromacs~master~I31492cd582b785cdfb42e8b999a165a7339ce4be
Gerrit URL: https://gerrit.gromacs.org/7690

#13 Updated by Mark Abraham 11 months ago

  • Status changed from New to Resolved

#14 Updated by Mark Abraham 11 months ago

  • Target version set to 2018.1

#17 Updated by Gerrit Code Review Bot 11 months ago

Gerrit received a related patchset '1' for Issue #2421.
Uploader: Mark Abraham ()
Change-Id: gromacs~release-2018~I43c13df3a217d0f2154dca0ef215efd9cd27474a
Gerrit URL: https://gerrit.gromacs.org/7696

#18 Updated by Mark Abraham 11 months ago

  • Status changed from Resolved to Fix uploaded

#19 Updated by Mark Abraham 11 months ago

Double precision needs some work, too

#20 Updated by Mark Abraham 11 months ago

I think I have double fixed, but will finish it tomorrow

#21 Updated by Gerrit Code Review Bot 11 months ago

Gerrit received a related patchset '1' for Issue #2421.
Uploader: Mark Abraham ()
Change-Id: gromacs~release-2018~I8f89af4b066be68e07a286a9fa45b8ded3c925f3
Gerrit URL: https://gerrit.gromacs.org/7710

#22 Updated by Christoph Junghans 11 months ago

I patched 92d6e21 in, but it is still failing: https://koji.fedoraproject.org/koji/taskinfo?taskID=25859230

#23 Updated by Mark Abraham 11 months ago

  • Target version changed from 2018.1 to 2018.2

#24 Updated by Mark Abraham 11 months ago

will look further, but am away for ~1 week

#25 Updated by Mark Abraham 11 months ago

  • Status changed from Fix uploaded to Resolved

#26 Updated by Christoph Junghans 11 months ago

Don't we need the 2nd patch set, too?

#27 Updated by Mark Abraham 11 months ago

  • Status changed from Resolved to Fix uploaded

probably, that last post is done by a bot

#28 Updated by Mark Abraham 9 months ago

  • Category changed from testing to core library
  • Status changed from Fix uploaded to Feedback wanted

Christoph With gcc 8.1.0 on POWER9, I can no longer reproduce any issues with @make check (mixed and double; release and debug). Can you resubmit the fedora build and confirm? Thus I assume there is no further need of any fix.

#29 Updated by Mark Abraham 9 months ago

Mark Abraham wrote:

Christoph With gcc 8.1.0 on POWER9, I can no longer reproduce any issues with @make check (mixed and double; release and debug). Can you resubmit the fedora build and confirm? Thus I assume there is no further need of any fix.

Ignore this, I can reproduce the problem. I was mistakently checking the branch including the fix.

#30 Updated by Christoph Junghans 9 months ago

Still persists with gcc-8.1.1, log attached

#31 Updated by Mark Abraham 9 months ago

Christoph Junghans wrote:

Still persists with gcc-8.1.1, log attached

OK, tests pass for me with gcc 8.1.0 on JURON with https://gerrit.gromacs.org/#/c/7710/3 - can you try that for us please Christoph?

#32 Updated by Christoph Junghans 9 months ago

Even patching ba13293 in didn't help.

#33 Updated by Christoph Junghans 8 months ago

Mark, I got a confused with all the patch flying around, I added https://gerrit.gromacs.org/#/c/7710/ (ba13293), do I need https://gerrit.gromacs.org/#/c/7688/ (e693274) as well?

#34 Updated by Mark Abraham 8 months ago

Christoph Junghans wrote:

Mark, I got a confused with all the patch flying around, I added https://gerrit.gromacs.org/#/c/7710/ (ba13293), do I need https://gerrit.gromacs.org/#/c/7688/ (e693274) as well?

No, the latter is the grandparent patch of the former - see git log

#35 Updated by Mark Abraham 8 months ago

I'll upload some comparable cmake+make logs shortly

#36 Updated by Mark Abraham 8 months ago

tmpi-double.txt is cut from Christoph's build.log.txt from comment 32. The only point of interest I can see in these build log diffs is that cmake host SIMD detection looks a bit different. So maybe both our observations are valid on the hardware+software stack we're testing. So I guess we have to defer resolving this issue until we have more information from somewhere.

#37 Updated by Mark Abraham 8 months ago

  • Status changed from Feedback wanted to Resolved

#38 Updated by Paul Bauer 6 months ago

Hej, would you consider this one now as fully resolved, or should we still keep it open for the future?

#39 Updated by Christoph Junghans 6 months ago

I think the issue still persists on Fedora, but honestly I lost track, which patches got merged already.

#40 Updated by Paul Bauer 6 months ago

  • Target version changed from 2018.3 to future

ok, I'll move this to future then, if there are no strong opinions against having it still open if more bugs are found

#41 Updated by Mark Abraham about 2 months ago

  • Status changed from Resolved to Closed

Open it when we have an issue. Otherwise we have hundreds of issues whose status is unknown!

Also available in: Atom PDF