regressiontests/kernel core dumps on ppc64le
GROMACS: gmx mdrun, version 2018.3 (double precision) Executable: /builddir/build/BUILD/gromacs-2018.3/serial_d/bin/gmx_d Data prefix: /builddir/build/BUILD/gromacs-2018.3 (source tree) Working dir: /builddir/build/BUILD/gromacs-2018.3/serial_d/tests Command line: gmx_d mdrun -h Thanx for Using GROMACS - Have a Nice Day sh: line 1: 833 Aborted (core dumped) gmx_d mdrun -nb cpu -notunepme > mdrun.out 2>&1 Abnormal return value for ' gmx_d mdrun -nb cpu -notunepme >mdrun.out 2>&1' was -1 FAILED. Check mdrun.out, md.log file(s) in nb_kernel_ElecEwSh_VdwLJSh_GeomW4P1 for nb_kernel_ElecEwSh_VdwLJSh_GeomW4P1 1 out of 142 kernel tests FAILED
#5 Updated by Dominik Mierzejewski 11 months ago
Paul Bauer wrote:
Very interesting, indeed.
Best would be running only the failing test in a memory checker, but I don't think this is possible if you can't go physically on the machine. Will check with people here if someone can try to reproduce this.
I have a ppc64le VM if anyone wants to debug this hands-on. Just send me your public ssh key.
#9 Updated by Szilárd Páll 11 months ago
I can't repro crashes on Power8 either, but I did produce a bunch of failing regressiontests, see #2746, #2747. There may be something here that sometimes causes only wrong results and occasionally crashes too.
@Christoph: have the runs been repeated in the fedora system, do you see incorrect results in some cases?
#26 Updated by Christoph Junghans 3 months ago
- File build.log build.log added
- Status changed from Closed to In Progress
- Target version changed from 2019.2 to future
- Affected version changed from 2018.3 to 2019.3
It is back in 2019.3:
GROMACS: gmx mdrun, version 2019.3 Executable: /builddir/build/BUILD/gromacs-2019.3/serial/bin/gmx Data prefix: /builddir/build/BUILD/gromacs-2019.3 (source tree) Working dir: /builddir/build/BUILD/gromacs-2019.3/serial/tests Command line: gmx mdrun -h Thanx for Using GROMACS - Have a Nice Day sh: line 1: 16588 Aborted (core dumped) gmx mdrun -nb cpu -notunepme > mdrun.out 2>&1 Abnormal return value for ' gmx mdrun -nb cpu -notunepme >mdrun.out 2>&1' was -1 FAILED. Check mdrun.out, md.log file(s) in nb_kernel_ElecEwSw_VdwBhamSw_GeomW4W4 for nb_kernel_ElecEwSw_VdwBhamSw_GeomW4W4 1 out of 142 kernel tests FAILED
See https://koji.fedoraproject.org/koji/taskinfo?taskID=35545387 and attached build.log