Project

General

Profile

Bug #2093

possible unstable listedforces-test BondedTest.IfuncBondsPbcNo

Added by Mark Abraham almost 4 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
testing
Target version:
-
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Numerical exception seen recently on Jenkins on a change that only touched unrelated parts of the build system (at
http://jenkins.gromacs.org/job/Gromacs_Gerrit_master_nrwpo/2297/OPTIONS=gcc-5.2%20openmp%20opencl%20simd=avx_128_fma%20amdappsdk-3.0%20host=bs_nix-amd_gpu,label=bs_nix-amd_gpu/testReport/junit/(root)/CTest/ListedForcesTest/)

Note that all other builds were unaffected

CTest.ListedForcesTest

Failing for the past 1 build (Since Unstable#2297 )
Took 0.12 sec.
Error Message

NUMERICAL
Standard Output

[==========] Running 15 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 15 tests from BondedTest
[ RUN      ] BondedTest.BondAnglePbcNone
[       OK ] BondedTest.BondAnglePbcNone (1 ms)
[ RUN      ] BondedTest.BondAnglePbcXy
[       OK ] BondedTest.BondAnglePbcXy (0 ms)
[ RUN      ] BondedTest.BondAnglePbcXyz
[       OK ] BondedTest.BondAnglePbcXyz (0 ms)
[ RUN      ] BondedTest.DihedralAnglePbcNone
[       OK ] BondedTest.DihedralAnglePbcNone (0 ms)
[ RUN      ] BondedTest.DihedralAnglePbcXy
[       OK ] BondedTest.DihedralAnglePbcXy (1 ms)
[ RUN      ] BondedTest.DihedarlAnglePbcXyz
[       OK ] BondedTest.DihedarlAnglePbcXyz (0 ms)
[ RUN      ] BondedTest.IfuncBondsPbcNo

Associated revisions

Revision dfa03965 (diff)
Added by Aleksei Iupinov over 3 years ago

Add initialization to dvdlambda in bonded forces tests

Should fix #2093.

Change-Id: I7c99200443bd13c26757615d3101da1c52c8859d

History

#1 Updated by Mark Abraham almost 4 years ago

  • Description updated (diff)

#2 Updated by Aleksei Iupinov over 3 years ago

Same output, same test case, same test, same node. Went away after re-triggering.

http://jenkins.gromacs.org/job/Gromacs_Gerrit_master_nrwpo/2516/OPTIONS=gcc-5.2%20openmp%20opencl%20simd=avx_128_fma%20amdappsdk-3.0%20host=bs_nix-amd_gpu,label=bs_nix-amd_gpu/testReport/(root)/CTest/ListedForcesTest/

Just looking at the code, there's mostly uninitialized union t_iparams iparams in each test, but memsetting it to garbage didn't break anything on my machine.
The test might be hitting the real issue :-)

#3 Updated by Aleksei Iupinov over 3 years ago

Rather intrigued by this, but still can't find anything, neither can valgrind with various initialization tracing flags.
dvdlambda is declared at src/gromacs/listed-forces/tests/bonded.cpp:131 without initialization, and then is added to within ifunc() (the value being added is 0); not enough to cause an exception with some invalid starting value, I suppose?
Worth noting that that first ifunc() test failed both times.
Mark, wouldn't it be helpful to pass --gtest_shuffle to all the GTest binaries to randomize the execution order within test cases?
Would help to narrow down habitats of such elusive bugs.

#4 Updated by Mark Abraham over 3 years ago

I spammed a million repeats of the listedforces-test binary built on this node and had no failures.

Good try, but dvdlambda isn't used unless the inputs activate the free-energy code pathways, which don't happen for any of these inputs.

The use of the fields of t_iparams is controlled by the ftype, the required fields look good to me.

If we would randomize, and the same individual test fails each time, then we know that there is an error in that test, or how it uses the infrastructure.
If we would randomize, and the first of a group of tests fails each time, then we know that there is an error in that test, or how it uses the infrastructure.
Where is there a gain?

#5 Updated by Aleksei Iupinov over 3 years ago

I've also tried memsetting pbc to 0xFF...

By shuffling we would only know if the failure might be related specifically to (epbc == epbcNONE) codepath (or even more weirdly, to specific valid values in iparams.harmonic which differ between the tests).

#7 Updated by Gerrit Code Review Bot over 3 years ago

Gerrit received a related patchset '5' for Issue #2093.
Uploader: Aleksei Iupinov ()
Change-Id: gromacs~master~Idf8814804f04be21ef59db801030006c5b319651
Gerrit URL: https://gerrit.gromacs.org/6447

#8 Updated by Aleksei Iupinov over 3 years ago

So that backtrace bit (bonds+0x1f6) allowed me to brush up my superficial GDB knowledge :-)
That is indeed the lines 384-388 of bonded.cpp where the uninitialized dvdlambda gets added to.
Submitted up there is the test change where dvdlambda starts at sNaN.
Hopefully it throws the same exception on Jenkins, at least it does on my machine.

#9 Updated by Mark Abraham over 3 years ago

Aleksei Iupinov wrote:

So that backtrace bit (bonds+0x1f6) allowed me to brush up my superficial GDB knowledge :-)
That is indeed the lines 384-388 of bonded.cpp where the uninitialized dvdlambda gets added to.
Submitted up there is the test change where dvdlambda starts at sNaN.
Hopefully it throws the same exception on Jenkins, at least it does on my machine.

Ah, perhaps that explains #2112 also - perhaps pairs are setting dvdl[efptCOUL] similarly inappropriately

#10 Updated by Aleksei Iupinov over 3 years ago

Here comes the real patch.
(Just adding things to a signaling Nan can throw exceptions, how cool is that for debugging!)

#11 Updated by Gerrit Code Review Bot over 3 years ago

Gerrit received a related patchset '1' for Issue #2093.
Uploader: Aleksei Iupinov ()
Change-Id: gromacs~master~I7c99200443bd13c26757615d3101da1c52c8859d
Gerrit URL: https://gerrit.gromacs.org/6464

#12 Updated by Erik Lindahl almost 3 years ago

  • Status changed from New to Closed

No comments for 10 months since Aleksei uploaded the change that should fix it, so I'll assume it did the job and close this.

Also available in: Atom PDF