Bug #2734
regressiontests/kernel core dumps on ppc64le
Description
GROMACS: gmx mdrun, version 2018.3 (double precision) Executable: /builddir/build/BUILD/gromacs-2018.3/serial_d/bin/gmx_d Data prefix: /builddir/build/BUILD/gromacs-2018.3 (source tree) Working dir: /builddir/build/BUILD/gromacs-2018.3/serial_d/tests Command line: gmx_d mdrun -h Thanx for Using GROMACS - Have a Nice Day sh: line 1: 833 Aborted (core dumped) gmx_d mdrun -nb cpu -notunepme > mdrun.out 2>&1 Abnormal return value for ' gmx_d mdrun -nb cpu -notunepme >mdrun.out 2>&1' was -1 FAILED. Check mdrun.out, md.log file(s) in nb_kernel_ElecEwSh_VdwLJSh_GeomW4P1 for nb_kernel_ElecEwSh_VdwLJSh_GeomW4P1 1 out of 142 kernel tests FAILED
Details here: https://koji.fedoraproject.org/koji/taskinfo?taskID=30691834
Related issues
Associated revisions
History
#1 Updated by Paul Bauer about 2 years ago
I added some information for the build, could you try to run the failing test on its own to see where it crashes? Thanks!
Compiler: gcc-8.2.1
BLAS: openblas
LAPACK: openblas
SIMD: None
Doulbe: ON
fftw: 3.3.8
#2 Updated by Christoph Junghans about 2 years ago
Interestingly with work with GMX_SIMD=IBM_VSX on ppc64le.
As this is inside an non-interactive rpm build, what exactly do I need to run?
#3 Updated by Paul Bauer about 2 years ago
Very interesting, indeed.
Best would be running only the failing test in a memory checker, but I don't think this is possible if you can't go physically on the machine. Will check with people here if someone can try to reproduce this.
#4 Updated by Christoph Junghans about 2 years ago
Yeah, no interactive mode, sorry!
#5 Updated by Dominik Mierzejewski about 2 years ago
Paul Bauer wrote:
Very interesting, indeed.
Best would be running only the failing test in a memory checker, but I don't think this is possible if you can't go physically on the machine. Will check with people here if someone can try to reproduce this.
I have a ppc64le VM if anyone wants to debug this hands-on. Just send me your public ssh key.
#6 Updated by Paul Bauer about 2 years ago
I tried reproducing this on the VM that Dominik helpfully provided, with the current head of release-2018, using the same cmake instructions.
Running in valgrind shows some invalid reads when running the code, but it didn't crash for me so far.
#7 Updated by Paul Bauer about 2 years ago
Ok, tried more things but can't get the test to crash. The invalid reads where because I didn't load the correct libgromacs for each build, and don't show up when done correctly. This was now again with the current head of release-2018.
#8 Updated by Paul Bauer about 2 years ago
- Status changed from New to Feedback wanted
- Target version changed from 2018.4 to 2019
I'll retarget this on 2019, because I was unable to reproduce the issue on the similar VM with the build configuration used during the package build.
#9 Updated by Szilárd Páll about 2 years ago
I can't repro crashes on Power8 either, but I did produce a bunch of failing regressiontests, see #2746, #2747. There may be something here that sometimes causes only wrong results and occasionally crashes too.
@Christoph: have the runs been repeated in the fedora system, do you see incorrect results in some cases?
#10 Updated by Christoph Junghans about 2 years ago
I was just trying to build the rpm package and this issue came up in the `%check` block. Maybe Dominik has another idea.
#11 Updated by Paul Bauer about 2 years ago
- Target version changed from 2019 to 2020
this is very likely to be postponed, because it is not clear what the actual issue is
#12 Updated by Mark Abraham about 2 years ago
- Related to Bug #2746: regressiontests/freeenergy coulandvdwsequential_vdw failing on Power8 added
#13 Updated by Mark Abraham about 2 years ago
- Related to Bug #2747: nb_kernel_ElecEwSw_VdwBhamSw_GeomW4W4 regressiontest failing on Power8 added
#14 Updated by Gerrit Code Review Bot almost 2 years ago
Gerrit received a related patchset '1' for Issue #2734.
Uploader: Szilárd Páll (pall.szilard@gmail.com)
Change-Id: gromacs~release-2019~I56f50e54db47f4fe30c42488f4c4f79ac474518a
Gerrit URL: https://gerrit.gromacs.org/9104
#15 Updated by Gerrit Code Review Bot almost 2 years ago
Gerrit received a related patchset '1' for Issue #2734.
Uploader: Szilárd Páll (pall.szilard@gmail.com)
Change-Id: gromacs~release-2018~I56f50e54db47f4fe30c42488f4c4f79ac474518a
Gerrit URL: https://gerrit.gromacs.org/9105
#16 Updated by Mark Abraham almost 2 years ago
- Status changed from Feedback wanted to Fix uploaded
- Target version changed from 2020 to 2019.1
#17 Updated by Szilárd Páll almost 2 years ago
- Status changed from Fix uploaded to Feedback wanted
@Christoph: can you check if the change uploaded fixes the failing tests?
#18 Updated by Christoph Junghans almost 2 years ago
Can I test this in 2019.1, the rpm already has too many patches in it?
#19 Updated by Mark Abraham almost 2 years ago
Christoph Junghans wrote:
Can I test this in 2019.1, the rpm already has too many patches in it?
Sure, that sounds great.
#20 Updated by Szilárd Páll almost 2 years ago
- Status changed from Feedback wanted to Resolved
Applied in changeset 4a7281ef0020d0ff454608a3ee98b9984a6ac11e.
#21 Updated by Paul Bauer almost 2 years ago
- Status changed from Resolved to Closed
#22 Updated by Szilárd Páll almost 2 years ago
Not really ready to close until we get feedback whether the issue is solved, but I guess leaving it on "Feedback wanted" will mean it remains a release blocker?
#23 Updated by Mark Abraham almost 2 years ago
- Status changed from Closed to Feedback wanted
- Target version changed from 2019.1 to 2019.2
Good idea. Postponed
#24 Updated by Szilárd Páll almost 2 years ago
- Status changed from Feedback wanted to Resolved
Applied in changeset 1ce795fe5693e9790eb7c33896f47d4953867127.
#25 Updated by Paul Bauer almost 2 years ago
- Status changed from Resolved to Closed
#26 Updated by Christoph Junghans over 1 year ago
- File build.log build.log added
- Status changed from Closed to In Progress
- Target version changed from 2019.2 to future
- Affected version changed from 2018.3 to 2019.3
It is back in 2019.3:
GROMACS: gmx mdrun, version 2019.3 Executable: /builddir/build/BUILD/gromacs-2019.3/serial/bin/gmx Data prefix: /builddir/build/BUILD/gromacs-2019.3 (source tree) Working dir: /builddir/build/BUILD/gromacs-2019.3/serial/tests Command line: gmx mdrun -h Thanx for Using GROMACS - Have a Nice Day sh: line 1: 16588 Aborted (core dumped) gmx mdrun -nb cpu -notunepme > mdrun.out 2>&1 Abnormal return value for ' gmx mdrun -nb cpu -notunepme >mdrun.out 2>&1' was -1 FAILED. Check mdrun.out, md.log file(s) in nb_kernel_ElecEwSw_VdwBhamSw_GeomW4W4 for nb_kernel_ElecEwSw_VdwBhamSw_GeomW4W4 1 out of 142 kernel tests FAILED
See https://koji.fedoraproject.org/koji/taskinfo?taskID=35545387 and attached build.log
#27 Updated by Szilárd Páll over 1 year ago
- Related to Task #3057: re-enable fusion on Power8/9 added
#28 Updated by Christoph Junghans over 1 year ago
- Related to Bug #3116: regressiontests/freeenergy core dumps on ppc64le added
Disable instruction fusion on Power8
The -mpower8-fusion flag seems to be the source of incorrect code; not
confirmed, but likely a codegen issue that also affects Power9 with the
similar flag used.
Fixes #2747 #2746 #2734
Change-Id: I56f50e54db47f4fe30c42488f4c4f79ac474518a