Project

General

Profile

Bug #2862

Division by zero in restrained dihedrals

Added by David van der Spoel 7 months ago. Updated 6 months ago.

Status:
In Progress
Priority:
Normal
Assignee:
-
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

The following code in restcbt.cpp, line 154 can lead to division by zero (variable sine_phi_sq is set to zero).

/*    It is possible that cosine_phi is slightly bigger than 1.0 due to round-off errors. */
if (sine_phi_sq < 0.0) {
sine_phi_sq = 0.0;
}
< snip >
/*      Computation of the prefactor - common term for all forces */
*prefactor_phi = -(k_torsion) * delta_cosine * norm_phi * term_phi_phi0 / (sine_phi_sq * sine_phi_sq);

Associated revisions

Revision dfa21149 (diff)
Added by David van der Spoel 7 months ago

Precision fix for rescbt code.

The compute_restangles function was not sufficiently
precise. In fact it return completely different
results in single and double precision. By making part
of the function work in double the issue is fixed.

New tests added.

Part of #2795
Fixes #2862

Change-Id: Iad7f2bc45be996ba3e16358aab838c5427b157b8

History

#1 Updated by Mark Abraham 7 months ago

Nico, should we improve something here?

#2 Updated by Berk Hess 7 months ago

Unrelated to the division by zero, but the factor (sine_theta_sq)^2 in the denominator of prefactor looks suspicious to me. I would expect a factor sin^2, not a factor sin^4.

#3 Updated by Berk Hess 7 months ago

Maybe the sin^4 factor could be correct, but that is very difficult to verify without the definitions of any of the variables in the code.

The potential diverges at theta=0, so you should simply never put in such a configuration!

#4 Updated by David van der Spoel 7 months ago

Here is an issue related to the same code, where single and double produce completely different results.

https://gerrit.gromacs.org/#/c/9128/

#5 Updated by David van der Spoel 7 months ago

For sure the code is not sufficiently robust. Maybe some variables need to be made double precision.

#6 Updated by Gerrit Code Review Bot 7 months ago

Gerrit received a related patchset '1' for Issue #2862.
Uploader: David van der Spoel ()
Change-Id: gromacs~master~Iad7f2bc45be996ba3e16358aab838c5427b157b8
Gerrit URL: https://gerrit.gromacs.org/9148

#7 Updated by David van der Spoel 7 months ago

  • Status changed from New to Resolved

#8 Updated by Erik Lindahl 7 months ago

  • Status changed from Resolved to In Progress

I'm reopening this again so we do not close it, but fix it properly.

While moving to double might have fixed one particular case, that is never a proper solution to make numerically sensitive operatios robust - it just means the error will be rarer. To fix it, we have to understand

  • Is it allowed for the dihedrals to have values in the range where this can happen? (likely yes, even if it is rare/stupid). If the answer to that is "no", we should add tests to catch the bad configurations.
  • How do we fix the equations so we can handle the case where theta approaches 0?
  • Given those fixes, the obvious test is to confirm the code works fine in single precision

For the record, doing some calculations in double is really horrible from a performance p-o-v since it forces single-to-double-to-single conversions that are very slow.

#9 Updated by Nicolae goga 6 months ago

Friends,

Sorry again for the late reply. Code was done according to the physics described in the cited article. So that the formula with cosinus at power 4 should be fine.

Nicu

#10 Updated by Mark Abraham 6 months ago

Nicolae goga wrote:

Sorry again for the late reply. Code was done according to the physics described in the cited article. So that the formula with cosinus at power 4 should be fine.

Nicu, what happens as theta approaches zero? How should we implement this stably?

#11 Updated by Mark Abraham 6 months ago

Berk Hess wrote:

Maybe the sin^4 factor could be correct, but that is very difficult to verify without the definitions of any of the variables in the code.

Also, Nicu can you please have a look at the code and propose some better comments? If the core team can't maintain code because we can't understand it, then we will have to remove it.

#12 Updated by Nicolae goga 6 months ago

Friends,

I was discussing with Monica Bulacu, the main author of the theory. The if part with
if (sine_phi_sq < 0.0) {
sine_phi_sq = 0.0;
}

should be removed. Citing Monica, this will never happen and it was an error of translating the code from Fortrand (her code) to Gromacs.

I hope that this helps

Keep in touch
Nicu

#13 Updated by Erik Lindahl 6 months ago

I had a closer look at this code, and it appears to be written in a very fragile way.

  • First, sin^2(x) should NEVER be calculated as 1.0-cos^2(x) if there is a risk x is anywhere close to zero, since it will lead to a drastic loss-of-precision. The proper solution is rather to calculate sin(x) directly from a cross product instead.
  • Second, if "Code was done according to the physics described in the cited article", that might be a sign the equations are also bad and do not properly take into account what happens if the value approaches 0 (which I would assume means linear configurations). Even if that is not expected to happen in a stable restrained system, it might definitely happen for an initial system we want to restrain.
  • As Mark asks, the code has to be implemented in a way where both the potential and derivative are stable as the value approaches zero.

Notably, this appears to be the case for all three routines. David's earlier fix of using double precision will not fix the problem - it just makes it slightly more rare, and created "reproducibility" in the sense that we get the same errors in single and double precision builds (since both use double).

#14 Updated by Erik Lindahl 6 months ago

PS Nicu: I very much agree with Mark this code urgently needs proper documentation, being made stable, and proper unit tests.

Also available in: Atom PDF