Project

General

Profile

Task #2102

decide future of xlc+power support

Added by James Ostrander over 2 years ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
build system
Target version:
Difficulty:
simple
Close

Description

Hi, I am trying to compile GROMACS with xlc/xlC in the simplest way I can.

From the build directory:
cmake3 -DCMAKE_C_COMPILER=xlc -DCMAKE_CXX_COMPILER=xlC ..

Output:
[u0017592@sys-85310 build]$ cmake3 -DCMAKE_C_COMPILER=xlc -DCMAKE_CXX_COMPILER=xlC ..
-- No compatible CUDA toolkit found (v5.0+), disabling native GPU acceleration
CMake Error at cmake/gmxManageSimd.cmake:90 (message):
Cannot find IBM VSX compiler flag. Use a newer compiler, or disable SIMD
support (slower).
Call Stack (most recent call first):
cmake/gmxManageSimd.cmake:414 (gmx_give_fatal_error_when_simd_support_not_found)
CMakeLists.txt:665 (gmx_manage_simd)

CMakeError.log & CMakeOutput.log attached.

My xlc version is 13.1.5.
Many of the flags checked for in the CMakeError.log are no longer supported.

xl compiler reference 13.1.5: http://www-01.ibm.com/support/docview.wss?uid=swg27048883&aid=7
xl compiler reference 13.1: http://www-01.ibm.com/support/docview.wss?uid=swg27041925&aid=1

  • -mabi=altivec: Not in any xlc doc. Exists in gcc. Maybe in previous versions this didn't throw errors. "-qaltivec" Enables the compiler support for vector data types and operators. Effects VSX/VMX load & store functions. This might be functionally equivalent to -maltivec, which is supported for gcc compatibility
  • -mvsx: not in any doc. See above - can this be replaced by -qaltivec (or -maltivec)?
  • -qsuppress: Exists in 13.1 but not 13.1.5
  • -qhalt=e: The flag exists, but "w" is the only valid option in 13.1.5. In 13.1 more values were allowed, e.g. e, s

The invalid options throw warnings but not errors, so maybe it's not necessary to fail the build if they are raised when compiling with xlc/xlC.

CMakeError.txt (18.9 KB) CMakeError.txt James Ostrander, 01/26/2017 07:42 PM
CMakeOutput.txt (76.2 KB) CMakeOutput.txt James Ostrander, 01/26/2017 07:42 PM

Related issues

Related to GROMACS - Bug #2103: xlc/xlC: shellfc.cpp: empty array initializer "expression not supported"Closed

Associated revisions

Revision deb27c17 (diff)
Added by Erik Lindahl over 2 years ago

Work around false xlc-13.1.5 bug in SIMD test

atan2(0,0) should return 0.0, which the Gromacs simd implementation
does. However, since at least one compiler produces -nan for the
standard library version it's better to compare with the known
correct value rather than calling std:atan2(0,0).

Refs #2102.

Change-Id: I60449e9f16fb1ab79486927a3e9993da0cce937f

Revision f05743b0 (diff)
Added by Mark Abraham over 2 years ago

Fixes for xl compilers 13.1.5 on Power8

Applied Erik's recent fix also for the atan2SinglePrecisionAccuracy
test.

Refs #2102

Change-Id: I230dac8084be2d0693cb616b5a5951b0ae4b71a6

Revision a590e9e1 (diff)
Added by Paul Bauer almost 2 years ago

Add information regarding xlc compiler

Added information concerning that the xlc compiler
is neither supported nor tested.

Refs #2102

Change-Id: I1963a2fdaa6e27f4d9521c28088fc1c1f7eabe97

History

#1 Updated by Szilárd Páll over 2 years ago

  • Status changed from New to Accepted

Our build system seems to
i) lack xlc robust (due to the lack of hw/sw access and lack of maintainers)
ii) have no chance in keeping up with changes in compiler flags given the changes you claim between patch versions.

Given that we have no access and little incentive (until xlc proves to be superior than gcc), I'm not sure we can commit much resources to this issue. However, contributions would be welcome and we can certainly provide guidance!

If you'd test/contribute the xlc flags, consider adding the general flags not directly related to SIMD support in prepare_power_vsx_toolchain() SIMD flags required/useful for VMX are tested next here.

#2 Updated by Mark Abraham over 2 years ago

In the last fortnight, I finally got access to such a compiler (on JURON at Juelich). I can reproduce something like James report. (Previously the only interesting platform for xlc was BlueGene, and that was locked on the xlc 12 major version.)

I assume that xlc is likely slower than gcc on any platform where both it and gcc are supported, and I'm sure gcc was the only compiler used when Erik implemented the SIMD layer for VMX and VSX. But I can look into making it work with xlc, at least.

#3 Updated by Mark Abraham over 2 years ago

GROMACS deliberately checks for a range of flags that might activate SIMD, so that multiple compilers can be supported with minimal fuss on our end. So we expect a bunch of the tests to fail in most cases.

The unrecognized suppression flags don't seem to be a serious problem. They were working around bugs in previous versions of the compiler...

The xlC 13.1.5 C++ compiler tolerates -qhalt=e with only a warning, so the GROMACS check for -qarch=auto -qaltivec succeeds there. The matching C compiler does not tolerate -qhalt=e, which seems like a behaviour worth reporting, if someone knows where to file that. We no longer require the C toolchain to support SIMD on Power machines, so the GROMACS build system can be hacked to get around that (remove the check related to SIMD_${GMX_SIMD}_C_flags on line 421 of cmake/gmxManageSimd.cmake).

The -qhalt=e is generated by CMake (even for its master branch), so until someone proposes a fix for that, xlc 13.1.5 will not be usable for any C application that uses CMake. However, I agree with Szilard that such changes in point release versions make attempting to support this compiler seem more trouble that it is worth.

There are other bugs in our implementation of our SIMD layer for xlc, so I will attempt to fix those and see what may be a reasonable way forward.

#4 Updated by Mark Abraham over 2 years ago

A large bunch of our unit tests fail, even after the above mentioned fixes, so xlc 13 isn't supported on POWER* until we have time to get to the bottom of the issues (and I'm making no promises, this compiler is not a high priority for us).

I suggest trying gcc on these platforms - we'll likely fix any issue with that quickly.

#5 Updated by Mark Abraham over 2 years ago

  • Related to Bug #2103: xlc/xlC: shellfc.cpp: empty array initializer "expression not supported" added

#6 Updated by James Ostrander over 2 years ago

Would you accept pull requests for fixes for this group of compilers? You mentioned that there was further work to do even after making the above changes. Are there discrete chunks that work could be broken into so that I could assign someone else to work on those tasks?

#8 Updated by Gerrit Code Review Bot over 2 years ago

Gerrit received a related patchset '1' for Issue #2102.
Uploader: Mark Abraham ()
Change-Id: gromacs~release-2016~I230dac8084be2d0693cb616b5a5951b0ae4b71a6
Gerrit URL: https://gerrit.gromacs.org/6444

#9 Updated by Mark Abraham over 2 years ago

James Ostrander wrote:

Would you accept pull requests for fixes for this group of compilers?

Very likely, yes.

You mentioned that there was further work to do even after making the above changes. Are there discrete chunks that work could be broken into so that I could assign someone else to work on those tasks?

I've got most things in working shape. There are some TODOs in the patch for our 2016 branch that I've just uploaded.

The test binaries tend to segfault from a double free after completing the tests, which is irritating but not actually a problem.

#10 Updated by Erik Lindahl over 2 years ago

For the record, we actually did test xlC extensively when developing SIMD code - both on big and little-endian machines.

However, the was >1 year ago, and compared to gcc the xlC compiler is not stellar. There were several SIMD-related bugs, and performance was clearly lower than for gcc. Even the IBM people I interacted with seemed to prioritize gcc at least as much (if not more) when it comes to VSX and general optimization for Power8.

Previous versions of xlc also had an issue where our unit tests failed because there was a sign error in the reference value for tan(0) provided by the compiler.

#11 Updated by Erik Lindahl over 2 years ago

Update, mostly for reference. When compiling a trivial program (which properly includes cmath) that prints tan(0) we get the following with gcc-4.8.5 and xlc-13.1.5:

gcc -O2       0
gcc -O3    0
xlc -O2    0
xlc -O3   -0

According to the tan() man page, the correct value is 0.

#12 Updated by Mark Abraham over 2 years ago

Output for the atan2 tests:

[juronb1 r2016 ((0e79182...))] $ (cd build-cmake-xlc-release; bin/simd-test --gtest_filter=\*atan\*)
Note: Google Test filter = *atan*
[==========] Running 8 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from SimdMathTest
[ RUN      ] SimdMathTest.atan
[       OK ] SimdMathTest.atan (0 ms)
[ RUN      ] SimdMathTest.atan2
/gpfs/homeb/padc/padc010/git/r2016/src/gromacs/simd/tests/simd_math.cpp:443: Failure
Failing comparison between setSimdRealFrom1R(std::atan2(0.0, 0.0)) and atan2(setSimdRealFrom3R(0.0, 0.0, 0.0), setZero())
Requested abs tolerance: 0
Requested ulp tolerance: 16
(And values should not differ in sign unless within abs tolerance.)
Reference values: { -nan, -nan, -nan, -nan }
SIMD values:      { 0, 0, 0, 0 }
Abs. difference:  { nan, nan, nan, nan }
Ulp difference:   { 4194304, 4194304, 4194304, 4194304 }

[  FAILED  ] SimdMathTest.atan2 (0 ms)
[ RUN      ] SimdMathTest.atanSingleAccuracy
[       OK ] SimdMathTest.atanSingleAccuracy (1 ms)
[ RUN      ] SimdMathTest.atan2SingleAccuracy
/gpfs/homeb/padc/padc010/git/r2016/src/gromacs/simd/tests/simd_math.cpp:695: Failure
Failing comparison between setSimdRealFrom1R(std::atan2(0.0, 0.0)) and atan2SingleAccuracy(setSimdRealFrom3R(0.0, 0.0, 0.0), setZero())
Requested abs tolerance: 0
Requested ulp tolerance: 16
(And values should not differ in sign unless within abs tolerance.)
Reference values: { -nan, -nan, -nan, -nan }
SIMD values:      { 0, 0, 0, 0 }
Abs. difference:  { nan, nan, nan, nan }
Ulp difference:   { 4194304, 4194304, 4194304, 4194304 }

[  FAILED  ] SimdMathTest.atan2SingleAccuracy (0 ms)
[----------] 4 tests from SimdMathTest (1 ms total)

[----------] 4 tests from SimdScalarMathTest
[ RUN      ] SimdScalarMathTest.atan
[       OK ] SimdScalarMathTest.atan (0 ms)
[ RUN      ] SimdScalarMathTest.atan2
[       OK ] SimdScalarMathTest.atan2 (0 ms)
[ RUN      ] SimdScalarMathTest.atanSingleAccuracy
[       OK ] SimdScalarMathTest.atanSingleAccuracy (0 ms)
[ RUN      ] SimdScalarMathTest.atan2SingleAccuracy
[       OK ] SimdScalarMathTest.atan2SingleAccuracy (0 ms)
[----------] 4 tests from SimdScalarMathTest (0 ms total)

[----------] Global test environment tear-down
[==========] 8 tests from 2 test cases ran. (2 ms total)
[  PASSED  ] 6 tests.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] SimdMathTest.atan2
[  FAILED  ] SimdMathTest.atan2SingleAccuracy

 2 FAILED TESTS

The reference value comes from std::atan2(+0, +0), and should be +0, but perhaps is getting turned into -nan from a buggy implementation of std::tan. Regardless, our SIMD result is correct, so we should probably disable this test for this compiler (family).

#13 Updated by Mark Abraham over 2 years ago

There are further issues trying to use pthreads mutexes for thread MPI, e.g. the following (with or without CUDA in the build, and also in release mode)

$ (cd build-cmake-xlc-gpu-debug; ninja && gdb bin/mdrun-test)
[1/30] Generating git version information
GNU gdb (GDB) 7.11.1.20160801-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying" 
and "show warranty" for details.
This GDB was configured as "powerpc64le-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from bin/mdrun-test...done.
(gdb) run
[==========] Running 25 tests from 9 test cases.
[----------] Global test environment set-up.
[----------] 6 tests from BondedInteractionsTest[ RUN      ] BondedInteractionsTest.NormalBondWorks

NOTE 1 [file /gpfs/homeb/padc/padc010/git/r2016/build-cmake-xlc-gpu-debug/src/programs/mdrun/tests/Testing/Temporary/BondedInteractionsTest_NormalBondWorks_input.mdp, line 1]:
  /gpfs/homeb/padc/padc010/git/r2016/build-cmake-xlc-gpu-debug/src/programs/mdrun/tests/Testing/Temporary/BondedInteractionsTest_NormalBondWorks_input.mdp did not specify a value for the .mdp option "cutoff-scheme". Probably it
  was first intended for use with GROMACS before 4.6. In 4.6, the Verlet
  scheme was introduced, but the group scheme was still the default. The
  default is now the Verlet scheme, so you will observe different behaviour.

Program received signal SIGSEGV, Segmentation fault.
__GI___pthread_mutex_lock (mutex=0x80) at ../nptl/pthread_mutex_lock.c:68
68    ../nptl/pthread_mutex_lock.c: No such file or directory.
(gdb) bt
#0  __GI___pthread_mutex_lock (mutex=0x80) at ../nptl/pthread_mutex_lock.c:68
#1  0x00003fffb62d1fd0 in tMPI_Thread_mutex_lock ()
   from /gpfs/homeb/padc/padc010/git/r2016/build-cmake-xlc-gpu-debug/lib/libgromacs.so.2
#2  0x00003fffb58f702c in gmx_fio_open ()
   from /gpfs/homeb/padc/padc010/git/r2016/build-cmake-xlc-gpu-debug/lib/libgromacs.so.2
#3  0x00003fffb58f75c0 in gmx_fio_fopen ()
   from /gpfs/homeb/padc/padc010/git/r2016/build-cmake-xlc-gpu-debug/lib/libgromacs.so.2
#4  0x00003fffb591709c in write_inpfile(char const*, int, t_inpfile*, int, warninp*) ()
   from /gpfs/homeb/padc/padc010/git/r2016/build-cmake-xlc-gpu-debug/lib/libgromacs.so.2
#5  0x00003fffb5c25c24 in get_ir(char const*, char const*, t_inputrec*, t_gromppopts*, warninp*) ()
   from /gpfs/homeb/padc/padc010/git/r2016/build-cmake-xlc-gpu-debug/lib/libgromacs.so.2
#6  0x00003fffb5be73c0 in gmx_grompp(int, char**) ()
   from /gpfs/homeb/padc/padc010/git/r2016/build-cmake-xlc-gpu-debug/lib/libgromacs.so.2
#7  0x0000000010052ae4 in gmx::test::SimulationRunner::callGromppOnThisRank(gmx::test::CommandLine const&) ()
#8  0x0000000010052c58 in gmx::test::SimulationRunner::callGrompp() ()
#9  0x0000000010073fec in gmx::test::BondedInteractionsTest_NormalBondWorks_Test::TestBody() ()
#10 0x00000000100f8dc0 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ()
#11 0x00000000100f8b80 in testing::Test::Run() ()
#12 0x00000000100fa8f0 in testing::TestInfo::Run() ()
#13 0x00000000100fc1a8 in testing::TestCase::Run() ()
#14 0x0000000010129ec4 in testing::internal::UnitTestImpl::RunAllTests ()
#15 0x0000000010107bac in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ()
#16 0x00000000101078e4 in testing::UnitTest::Run() ()
#17 0x00000000100983c8 in main ()

Given that the code works on POWER with gcc, and on every other platform we know of that supports pthreads, it seems likely there is an issue related to xlc. Any ideas, James? Can you reproduce this?

#14 Updated by James Ostrander over 2 years ago

Thanks for the updates, Mark. I was able to reproduce that issue. Here is my output without gdb (I tried release mode)

[u0017592@sys-85390 build-mpi]$ ./bin/mdrun-test
[==========] Running 24 tests from 9 test cases.
[----------] Global test environment set-up.
[----------] 6 tests from BondedInteractionsTest
[ RUN      ] BondedInteractionsTest.NormalBondWorks

NOTE 1 [file /home/u0017592/projects/gromacs/build-mpi/src/programs/mdrun/tests/Testing/Temporary/BondedInter
actionsTest_NormalBondWorks_input.mdp, line 1]:
  /home/u0017592/projects/gromacs/build-mpi/src/programs/mdrun/tests/Testing/Temporary/BondedInteractionsTest
_NormalBondWorks_input.mdp did not specify a value for the .mdp option "cutoff-scheme". Probably it
  was first intended for use with GROMACS before 4.6. In 4.6, the Verlet
  scheme was introduced, but the group scheme was still the default. The
  default is now the Verlet scheme, so you will observe different behaviour.

[sys-85390:01731] *** Process received signal ***
[sys-85390:01731] Signal: Segmentation fault (11)
[sys-85390:01731] Signal code: Address not mapped (1)
[sys-85390:01731] Failing at address: 0x90
[sys-85390:01731] [ 0] [0x3fffac180478]
[sys-85390:01731] [ 1] [0x3ffff304a800]
[sys-85390:01731] [ 2] /home/u0017592/projects/gromacs/build-mpi/lib/libgromacs_mpi.so.2(tMPI_Thread_mutex_lo
ck+0x50)[0x3fffabd70c90]
[sys-85390:01731] [ 3] /home/u0017592/projects/gromacs/build-mpi/lib/libgromacs_mpi.so.2(gmx_fio_open+0x1cc)[
0x3fffab396fec]
[sys-85390:01731] [ 4] /home/u0017592/projects/gromacs/build-mpi/lib/libgromacs_mpi.so.2(gmx_fio_fopen+0x20)[
0x3fffab397580]
[sys-85390:01731] [ 5] /home/u0017592/projects/gromacs/build-mpi/lib/libgromacs_mpi.so.2(_Z13write_inpfilePKc
iP9t_inpfileiP7warninp+0x3dc)[0x3fffab3b73dc]
[sys-85390:01731] [ 6] /home/u0017592/projects/gromacs/build-mpi/lib/libgromacs_mpi.so.2(_Z6get_irPKcS0_P10t_
inputrecP12t_gromppoptsP7warninp+0x358c)[0x3fffab6cca0c]
[sys-85390:01731] [ 7] /home/u0017592/projects/gromacs/build-mpi/lib/libgromacs_mpi.so.2(_Z10gmx_gromppiPPc+0
x40c)[0x3fffab68ddfc]
[sys-85390:01731] [ 8] ./bin/mdrun-test(_ZN3gmx4test16SimulationRunner20callGromppOnThisRankERKNS0_11CommandL
ineE+0x154)[0x10073f90]
[sys-85390:01731] [ 9] ./bin/mdrun-test(_ZN3gmx4test16SimulationRunner10callGromppEv+0xa8)[0x100741dc]
[sys-85390:01731] [10] ./bin/mdrun-test(_ZN3gmx4test43BondedInteractionsTest_NormalBondWorks_Test8TestBodyEv+
0x74)[0x1004b37c]
[sys-85390:01731] [11] ./bin/mdrun-test(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvE
ET0_PT_MS4_FS3_vEPKc+0xa0)[0x100f2470]
[sys-85390:01731] [12] ./bin/mdrun-test(_ZN7testing4Test3RunEv+0x19c)[0x100f21ac]
[sys-85390:01731] [13] ./bin/mdrun-test(_ZN7testing8TestInfo3RunEv+0x314)[0x100f3cf4]
[sys-85390:01731] [14] ./bin/mdrun-test(_ZN7testing8TestCase3RunEv+0x198)[0x100f5108]
[sys-85390:01731] [15] ./bin/mdrun-test[0x10120d68]
[sys-85390:01731] [16] ./bin/mdrun-test(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS0_12UnitT
estImplEbEET0_PT_MS4_FS3_vEPKc+0x9c)[0x1010045c]
[sys-85390:01731] [17] ./bin/mdrun-test(_ZN7testing8UnitTest3RunEv+0x98)[0x10100198]
[sys-85390:01731] [18] ./bin/mdrun-test(main+0x50)[0x1006ee48]
[sys-85390:01731] [19] /lib64/libc.so.6(+0x24700)[0x3fffaa3f4700]
[sys-85390:01731] [20] /lib64/libc.so.6(__libc_start_main+0xc4)[0x3fffaa3f48f4]
[sys-85390:01731] *** End of error message ***
Segmentation fault

I configured with:

[u0017592@sys-85390 build-mpi]$ cmake3 -j 2 -DREGRESSIONTEST_DOWNLOAD=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_C
_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx -DGMX_MPI=ON ..

#15 Updated by Mark Abraham over 2 years ago

OK thanks, James. If you know where to lodge an effective bug report, then it should be easy for us to isolate repro cases for these pthread_mutex_lock and tan (and maybe atan2) bugs. Do you?

#16 Updated by James Ostrander over 2 years ago

If you can add your repro cases here, I'll gladly see to it that they're forwarded along to the XL team.

#17 Updated by Gerrit Code Review Bot over 2 years ago

Gerrit received a related patchset '1' for Issue #2102.
Uploader: Erik Lindahl ()
Change-Id: gromacs~master~I60449e9f16fb1ab79486927a3e9993da0cce937f
Gerrit URL: https://gerrit.gromacs.org/6465

#18 Updated by Gerrit Code Review Bot over 2 years ago

Gerrit received a related patchset '1' for Issue #2102.
Uploader: Erik Lindahl ()
Change-Id: gromacs~release-2016~I60449e9f16fb1ab79486927a3e9993da0cce937f
Gerrit URL: https://gerrit.gromacs.org/6466

#19 Updated by Mark Abraham over 2 years ago

James, I've uploaded further fixes at https://gerrit.gromacs.org/#/c/6444/4. Feedback on whether and where they build and pass tests for you would be welcome - the toolchains on JURON don't give me the impression they've been configured correctly (or I'm not using them as intended). We're interested in the outcome on xlc and gcc, with and without gpu, Release and Debug mode. I've tried a few combinations and everything seems to build now.

#20 Updated by James Ostrander over 2 years ago

I'm in the process of testing those changes now. I get the following when compiling with GCC+MPI (same result Release vs. Debug build type):

[u0017592@sys-85390 build-release-gcc]$ make
Scanning dependencies of target git-version-info
[  0%] Generating git version information
[  0%] Built target git-version-info
[  0%] Generating baseversion-gen.c
Scanning dependencies of target libgromacs
[  0%] Building CXX object src/gromacs/CMakeFiles/libgromacs.dir/listed-forces/bonded.cpp.o
In file included from /home/u0017592/projects/gromacs/src/gromacs/simd/impl_ibm_vsx/impl_ibm_vsx.h:46:0,
                 from /home/u0017592/projects/gromacs/src/gromacs/simd/simd.h:123,
                 from /home/u0017592/projects/gromacs/src/gromacs/pbcutil/pbc-simd.h:51,
                 from /home/u0017592/projects/gromacs/src/gromacs/listed-forces/bonded.cpp:65:
/home/u0017592/projects/gromacs/src/gromacs/simd/impl_ibm_vsx/impl_ibm_vsx_simd_double.h: In function ‘void gmx::store(int32_t*, gmx::SimdDInt32)’:
/home/u0017592/projects/gromacs/src/gromacs/simd/impl_ibm_vsx/impl_ibm_vsx_simd_double.h:157:42: sorry, unimplemented: unexpected AST of kind compound_literal_expr
     m[0] = vec_extract(x.simdInternal_, 0);
                                          ^
/home/u0017592/projects/gromacs/src/gromacs/simd/impl_ibm_vsx/impl_ibm_vsx_simd_double.h:157: confused by earlier errors, bailing out
Preprocessed source stored into /tmp/ccXvhzoO.out file, please attach this to your bugreport.
make[2]: *** [src/gromacs/CMakeFiles/libgromacs.dir/listed-forces/bonded.cpp.o] Error 1
make[1]: *** [src/gromacs/CMakeFiles/libgromacs.dir/all] Error 2
make: *** [all] Error 2

I won't be able to test GPU yet, since the VMs I'm currently working from don't have hardware acceleration.

#21 Updated by James Ostrander over 2 years ago

With XLC+MPI in release mode, it failed to generate shell completions:

[100%] Building CXX object src/programs/CMakeFiles/gmx.dir/legacymodules.cpp.o
warning: 1540-5203 Unrecognized value "e" specified with option "halt".
1 warning generated.
[100%] Linking CXX executable ../../bin/gmx_mpi
[100%] Built target gmx
Scanning dependencies of target completion
[100%] Generating command-line completions for programs
[sys-85390:02528] *** Process received signal ***
[sys-85390:02528] Signal: Segmentation fault (11)
[sys-85390:02528] Signal code: Address not mapped (1)
[sys-85390:02528] Failing at address: 0x500000014
[sys-85390:02528] [ 0] [0x3fff86180478]
[sys-85390:02528] [ 1] /home/u0017592/projects/gromacs/build-release/lib/libgromacs_mpi.so.2(_ZN3gmx7CpuInfo16s_vendorStrings_E+0x0)[0x3fff8617fa10]
[sys-85390:02528] [ 2] /home/u0017592/projects/gromacs/build-release/lib/libgromacs_mpi.so.2(tMPI_Thread_mutex_destroy+0x28)[0x3fff85d86c28]
[sys-85390:02528] [ 3] /home/u0017592/projects/gromacs/build-release/lib/libgromacs_mpi.so.2(_ZN4tMPI5mutexD2Ev+0x4c)[0x3fff85138f7c]
[sys-85390:02528] [ 4] /home/u0017592/projects/gromacs/build-release/lib/libgromacs_mpi.so.2(+0xe44000)[0x3fff85d74000]
[sys-85390:02528] [ 5] /lib64/libc.so.6(__cxa_finalize+0x10c)[0x3fff84453adc]
[sys-85390:02528] [ 6] /home/u0017592/projects/gromacs/build-release/lib/libgromacs_mpi.so.2(+0x1bf470)[0x3fff850ef470]
[sys-85390:02528] [ 7] /lib64/ld64.so.2(+0x164c8)[0x3fff861b64c8]
[sys-85390:02528] [ 8] /lib64/libc.so.6(+0x435a4)[0x3fff844535a4]
[sys-85390:02528] [ 9] /lib64/libc.so.6(exit+0x24)[0x3fff844535f4]
[sys-85390:02528] [10] /lib64/libc.so.6(+0x24708)[0x3fff84434708]
[sys-85390:02528] [11] /lib64/libc.so.6(__libc_start_main+0xc4)[0x3fff844348f4]
[sys-85390:02528] *** End of error message ***
Failed to generate shell completions, will build GROMACS without. Set GMX_BUILD_HELP=OFF if you want to skip this notification and warnings during installation.
[100%] Built target completion
Scanning dependencies of target mdrun_test_objlib
[100%] Building CXX object src/programs/mdrun/tests/CMakeFiles/mdrun_test_objlib.dir/mdruncomparisonfixture.cpp.o
warning: 1540-5203 Unrecognized value "e" specified with option "halt".
1 warning generated.
[100%] Building CXX object src/programs/mdrun/tests/CMakeFiles/mdrun_test_objlib.dir/moduletest.cpp.o
warning: 1540-5203 Unrecognized value "e" specified with option "halt".
1 warning generated.
[100%] Building CXX object src/programs/mdrun/tests/CMakeFiles/mdrun_test_objlib.dir/terminationhelper.cpp.o
warning: 1540-5203 Unrecognized value "e" specified with option "halt".
1 warning generated.
[100%] Built target mdrun_test_objlib

#22 Updated by James Ostrander over 2 years ago

The GCC error I mentioned in #20 occurs regardless of whether MPI is used.

#23 Updated by Mark Abraham over 2 years ago

James Ostrander wrote:

I'm in the process of testing those changes now. I get the following when compiling with GCC+MPI (same result Release vs. Debug build type):

Yes, I get that also (in a non-MPI build, but that fact doesn't matter for this) for gcc 4.8.5, but gcc 5.3.1 in at9.0 is fine. From our git history, it looks like the testing on Power8 was done with gcc 4.9, so if you're also using 4.8.5 then we can just require 4.9 and move on.

#24 Updated by Mark Abraham over 2 years ago

James Ostrander wrote:

With XLC+MPI in release mode, it failed to generate shell completions:

[...]

Yes, this is running one of the GROMACS binaries to generate user-facing conveniences that are guaranteed to match what the code can do. If you had a deeper backtrace, I expect it would look similar to those of comments 13 and 14. GROMACS requires that standard pthread mutexes work. Clearly they can be made to work, else you couldn't have a conformant C++11 xlC compiler with std::mutex, but something about the toolchain setup isn't working correctly yet.

#25 Updated by James Ostrander over 2 years ago

You're right, I'm running gcc 4.8.5, which is the latest rpm for RHEL 7.3. I'll try on Ubuntu ppc64le, where I have a more up to date gcc available.

RE: The pthread issue, is there a simple and isolated way to reproduce the problem, or should I send that along to the XL team with instructions to reproduce it with a GROMACS build?

#26 Updated by Mark Abraham over 2 years ago

James Ostrander wrote:

You're right, I'm running gcc 4.8.5, which is the latest rpm for RHEL 7.3. I'll try on Ubuntu ppc64le, where I have a more up to date gcc available.

OK, I'll patch release 2016 to require gcc 4.9.

RE: The pthread issue, is there a simple and isolated way to reproduce the problem, or should I send that along to the XL team with instructions to reproduce it with a GROMACS build?

I still suspect it is a toolchain configuration issue, e.g. the xlc binary is getting the pthread implementation from at9 with obvious potential for disaster. I have a request open at JURON to explore this, to which other IBM people have contributed (e.g. https://trac.version.fz-juelich.de/hbp-pcp/ticket/30 if that's publicly visible). We have a cmake-time compile tests for pthreads functionality which must have passed for both of us. But it does not check that linking works, which suggests that the problem does not arise until linking a multi-source binary. I'll see what they say and consider making a repro case. But really this should be IBM's job to make work, even in pre-production access :-)

#27 Updated by James Ostrander over 2 years ago

On the subject of XLC causing printf(tan(0)) to return -0 at -O3, -O3 in XLC enables -qnostrict, which apparently relaxes the restrictions enough to change the sign of the zero.

#28 Updated by Mark Abraham over 2 years ago

Nobody knows how to get a reliable xlc toolchain working at the only site where I have access to xlc+power, so for now that combination is not supported.

#29 Updated by Mark Abraham almost 2 years ago

  • Tracker changed from Bug to Task
  • Subject changed from xlc/xlC: cmake fails due to invalid flags to decide future of xlc+power support
  • Target version set to 2018
  • Affected version deleted (git master)

Decide how to document that xlc is not tested / not supported on POWER architectures in GROMACS 2018.

My suggestion is to say it is unsupported, recommend gcc, and leave it open to anybody who wants to make it work to show that the performance of xlc over e.g. gcc is worth the work.

#30 Updated by Gerrit Code Review Bot almost 2 years ago

Gerrit received a related patchset '1' for Issue #2102.
Uploader: Paul Bauer ()
Change-Id: gromacs~release-2018~I1963a2fdaa6e27f4d9521c28088fc1c1f7eabe97
Gerrit URL: https://gerrit.gromacs.org/7310

#31 Updated by Paul Bauer almost 2 years ago

As this one has been added now, should the issue be closed or left open if someone wants to step up the support in the future?

#32 Updated by Paul Bauer almost 2 years ago

Addressed in changeset I1963a2fdaa6e27f4d9521c28088fc1c1f7eabe97

#33 Updated by Mark Abraham almost 2 years ago

  • Status changed from Accepted to Closed

Closed for now - if there's reason for someone to do the work, then we're still open to supporting xlc in future.

#34 Updated by Mark Abraham over 1 year ago

Note that various gcc+POWER fixes have happened in release-2018 branch, but we don't have any access to a working xlc compiler at the moment.

Also available in: Atom PDF