Project

General

Profile

Bug #2734

regressiontests/kernel core dumps on ppc64le

Added by Christoph Junghans 15 days ago. Updated 11 days ago.

Status:
Feedback wanted
Priority:
Normal
Assignee:
Category:
testing
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

GROMACS:      gmx mdrun, version 2018.3 (double precision)
Executable:   /builddir/build/BUILD/gromacs-2018.3/serial_d/bin/gmx_d
Data prefix:  /builddir/build/BUILD/gromacs-2018.3 (source tree)
Working dir:  /builddir/build/BUILD/gromacs-2018.3/serial_d/tests
Command line:
  gmx_d mdrun -h
Thanx for Using GROMACS - Have a Nice Day
sh: line 1:   833 Aborted                 (core dumped) gmx_d mdrun -nb cpu -notunepme > mdrun.out 2>&1
Abnormal return value for ' gmx_d mdrun    -nb cpu   -notunepme >mdrun.out 2>&1' was -1
FAILED. Check mdrun.out, md.log file(s) in nb_kernel_ElecEwSh_VdwLJSh_GeomW4P1 for nb_kernel_ElecEwSh_VdwLJSh_GeomW4P1
1 out of 142 kernel tests FAILED

Details here: https://koji.fedoraproject.org/koji/taskinfo?taskID=30691834

History

#1 Updated by Paul Bauer 13 days ago

I added some information for the build, could you try to run the failing test on its own to see where it crashes? Thanks!

Compiler: gcc-8.2.1
BLAS: openblas
LAPACK: openblas
SIMD: None
Doulbe: ON
fftw: 3.3.8

#2 Updated by Christoph Junghans 13 days ago

Interestingly with work with GMX_SIMD=IBM_VSX on ppc64le.

As this is inside an non-interactive rpm build, what exactly do I need to run?

#3 Updated by Paul Bauer 13 days ago

Very interesting, indeed.
Best would be running only the failing test in a memory checker, but I don't think this is possible if you can't go physically on the machine. Will check with people here if someone can try to reproduce this.

#4 Updated by Christoph Junghans 13 days ago

Yeah, no interactive mode, sorry!

#5 Updated by Dominik Mierzejewski 13 days ago

Paul Bauer wrote:

Very interesting, indeed.
Best would be running only the failing test in a memory checker, but I don't think this is possible if you can't go physically on the machine. Will check with people here if someone can try to reproduce this.

I have a ppc64le VM if anyone wants to debug this hands-on. Just send me your public ssh key.

#6 Updated by Paul Bauer 12 days ago

I tried reproducing this on the VM that Dominik helpfully provided, with the current head of release-2018, using the same cmake instructions.
Running in valgrind shows some invalid reads when running the code, but it didn't crash for me so far.

#7 Updated by Paul Bauer 12 days ago

Ok, tried more things but can't get the test to crash. The invalid reads where because I didn't load the correct libgromacs for each build, and don't show up when done correctly. This was now again with the current head of release-2018.

#8 Updated by Paul Bauer 12 days ago

  • Status changed from New to Feedback wanted
  • Target version changed from 2018.4 to 2019

I'll retarget this on 2019, because I was unable to reproduce the issue on the similar VM with the build configuration used during the package build.

#9 Updated by Szilárd Páll 11 days ago

I can't repro crashes on Power8 either, but I did produce a bunch of failing regressiontests, see #2746, #2747. There may be something here that sometimes causes only wrong results and occasionally crashes too.

@Christoph: have the runs been repeated in the fedora system, do you see incorrect results in some cases?

#10 Updated by Christoph Junghans 11 days ago

I was just trying to build the rpm package and this issue came up in the `%check` block. Maybe Dominik has another idea.

Also available in: Atom PDF