regressiontests/freeenergy coulandvdwsequential_vdw failing on Power8
FAILED. Check checkforce.out (2 errors) file(s) in coulandvdwsequential_vdw for coulandvdwsequential_vdw Mdrun cannot use the requested (or automatic) number of ranks, retrying with 8.
$ bin/gmx -version :-) GROMACS - gmx, 2019-beta3-dev-20181108-d536de3 (-: GROMACS is written by: Emile Apol Rossen Apostolov Paul Bauer Herman J.C. Berendsen Par Bjelkmar Christian Blau Viacheslav Bolnykh Kevin Boyd Aldert van Buuren Rudi van Drunen Anton Feenstra Gerrit Groenhof Anca Hamuraru Vincent Hindriksen M. Eric Irrgang Aleksei Iupinov Christoph Junghans Joe Jordan Dimitrios Karkoulis Peter Kasson Jiri Kraus Carsten Kutzner Per Larsson Justin A. Lemkul Viveca Lindahl Magnus Lundborg Erik Marklund Pascal Merz Pieter Meulenhoff Teemu Murtola Szilard Pall Sander Pronk Roland Schulz Michael Shirts Alexey Shvetsov Alfons Sijbers Peter Tieleman Teemu Virolainen Christian Wennberg Maarten Wolf and the project leaders: Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel Copyright (c) 1991-2000, University of Groningen, The Netherlands. Copyright (c) 2001-2018, The GROMACS development team at Uppsala University, Stockholm University and the Royal Institute of Technology, Sweden. check out http://www.gromacs.org for more information. GROMACS is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. GROMACS: gmx, version 2019-beta3-dev-20181108-d536de3 Executable: /home/pszilard/gromacs-19/build_p8_gcc7_fftw337/bin/gmx Data prefix: /home/pszilard/gromacs-19 (source tree) Working dir: /home/pszilard/gromacs-19/build_p8_gcc7_fftw337 Command line: gmx -version GROMACS version: 2019-beta3-dev-20181108-d536de3 GIT SHA1 hash: d536de3b5125b79d4222768e356c4914e0758d5a Precision: single Memory model: 64 bit MPI library: thread_mpi OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64) GPU support: disabled SIMD instructions: IBM_VSX FFT library: fftw-3.3.8 RDTSCP usage: disabled TNG support: enabled Hwloc support: hwloc-1.11.8 Tracing support: disabled C compiler: /home/pszilard/programs/gcc/7.3/bin/gcc GNU 7.3.0 C compiler flags: -mcpu=power8 -mpower8-vector -mpower8-fusion -mdirect-move -mvsx -Werror=format-overflow -Wundef -Wextra -Wno-missing-field-initializers -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value -Wunused-parameter -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast -Wno-array-bounds C++ compiler: /home/pszilard/programs/gcc/7.3/bin/g++ GNU 7.3.0 C++ compiler flags: -mcpu=power8 -mpower8-vector -mpower8-fusion -mdirect-move -mvsx -std=c++11 -Wformat-overflow -Wundef -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wmissing-declarations -Wall -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast -Wno-array-bounds
#1 Updated by Szilárd Páll 2 months ago
Possibly related to this or #2747, with gcc 8:
FAILED. Check checkpot.out (2 errors), checkforce.out (1869 errors) file(s) in coulandvdwsequential_coul for coulandvdwsequential_coul FAILED. Check checkpot.out (11 errors), checkforce.out (404 errors) file(s) in expanded for expanded
$ cat tests/regressiontests-release-2019-51d7202/freeenergy/coulandvdwsequential_coul/checkpot.out comparing energy file ./reference_s.edr and ener.edr There are 51 and 52 terms in the energy files enm (- - Conserved En.) There are 10 terms to compare in the energy files Coul. recip. step 40: -3352.61, step 40: 163097 Potential step 40: -30117.5, step 40: 136332 Files read successfully
$ cat tests/regressiontests-release-2019-51d7202/freeenergy/expanded/checkpot.out comparing energy file ./reference_s.edr and ener.edr There are 40 terms in the energy files There are 11 terms to compare in the energy files Coul. recip. step 0: 321.338, step 0: 336.118 Coul. recip. step 1: 325.129, step 1: 325.502 Coul. recip. step 2: 325.259, step 2: 325.845 Coul. recip. step 3: 322.506, step 3: 323.083 Coul. recip. step 4: 318.435, step 4: 318.805 Coul. recip. step 6: 313.25, step 6: 312.932 Coul. recip. step 7: 314.874, step 7: 314.292 Coul. recip. step 8: 320.291, step 8: 319.605 Coul. recip. step 9: 329.26, step 9: 328.631 Coul. recip. step 10: 340.449, step 10: 339.993 Coul. recip. step 68: 321.619, step 68: 321.962 Files read successfully
#2 Updated by Szilárd Páll 2 months ago
Update: several free neergy tests fail intermittently with gcc 8 too, also with GMX_SIMD=None and GMX_FFT_LIBRARY=fftpack, but this time some bonded energy terms are not matching:
$ cat tests/regressiontests-release-2019-51d7202/freeenergy/coulandvdwtogether/checkpot.out comparing energy file ./reference_s.edr and ener.edr There are 49 terms in the energy files There are 10 terms to compare in the energy files Angle step 16: 8.28424, step 16: 8.33762 Angle step 17: 7.8723, step 17: 7.92783 Angle step 18: 7.31895, step 18: 7.37155 Files read successfully
$ less tests/regressiontests-release-2019-51d7202/freeenergy/coulandvdwsequential_coul/checkpot.out comparing energy file ./reference_s.edr and ener.edr There are 51 and 52 terms in the energy files enm (- - Conserved En.) There are 10 terms to compare in the energy files Coul. recip. step 12: -3356.1, step 12: 46807.7 Potential step 12: -30089.6, step 12: 20074.2 Coul. recip. step 20: -3350.21, step 20: -3345.8 Coul. recip. step 21: -3349.77, step 21: -3344.29 Coul. recip. step 22: -3349.32, step 22: -3343.03 Potential step 22: -30046.6, step 22: -30014.8 Coul. recip. step 23: -3348.88, step 23: -3342.08 Potential step 23: -30034.9, step 23: -30001.2 Coul. recip. step 24: -3348.49, step 24: -3341.54 Potential step 24: -30025.1, step 24: -29991.2 Coul. recip. step 25: -3348.19, step 25: -3341.42 Potential step 25: -30018.2, step 25: -29985.7 Coul. recip. step 26: -3348.01, step 26: -3341.73 Coul. recip. step 27: -3348, step 27: -3342.43 Coul. recip. step 28: -3348.15, step 28: -3343.46 Coul. recip. step 29: -3348.47, step 29: -3344.65 Coul. recip. step 37: -3351.75, step 37: -3347.99 Coul. recip. step 38: -3351.97, step 38: -3347.57 Ryckaert-Bell. step 39: 4.96147, step 39: 4.91132 Coul. recip. step 39: -3352.24, step 39: -3347.31 Ryckaert-Bell. step 40: 5.04869, step 40: 4.99534 Coul. recip. step 40: -3352.61, step 40: -3347.35 Potential step 40: -30117.5, step 40: -30086.2 Files read successfully
#6 Updated by Szilárd Páll 15 days ago
Mark Abraham wrote:
I suggest we stop supporting power 8. There's essentially zero HPC usage, so it just isn't a priority.
Setting a target so that we make a decision about the support offered.
None of this is an effort to support Power8 just as our testing on other non-mainstream platforms isn't support for those. With the same reasoning neither ARMv7, even ARMv8, anything 32-bit, anything Intel older Intel Sandy Bridge (or Ivy), let alone Windows should be a priority.
Portability does not mean the code can in theory be ported (and if it happens to not work we claim it is unsupported), but that it actually does work across different platforms that meet the common requirements for the codebase to compile and function correctly. We're using vanilla GNU toolchain on a vanilla ppc64 Linux distribution, so nothing custom or vendor-specific is involved that would point to an effort beyond ensuring portability, hence explicit Power8 platform support is not a concern here, I think.
Of course, if such observations do not reproduce in other cases, we can flag this as a "known issue and consider it solved.
PS: ORNL and other US labs do use Power8 GPU clusters for some testing, e.g. without a live project affiliation AFAIK even to ORNL employees only Summitdev (Power S822LC, that is Power8 + P100) is available.