Project

General

Profile

Bug #2164

SIMD sqrt in double-precision build does not work correctly

Added by Mark Abraham over 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
core library
Target version:
Affected version - extra info:
likely all versions since the new SIMD layer came about
Affected version:
Difficulty:
uncategorized
Close

Description

For a double precision build, SIMD sqrt(x) for a double x is implemented by x * rsqrt(x). rsqrt(x) is implemented via truncation to single, which produces inf at some point if x is suitably small (presumably x < smallest double that is representable in float).

code in simd_math.h is

static inline SimdDouble gmx_simdcall
sqrt(SimdDouble x)
{
    return x * maskzInvsqrt(x, setZero() < x);
}

If we're happy with the SIMD library producing the value zero for the sqrt of such small values of x, then we can fix it by instead clamping the range to "less than some small positive value of x" rather than zero (not sure offhand what value is best).

Found by Carsten's essentialdynamics linfix regressiontest in double on avx_256. This has a single dihedral that coincidentally produces a nearly zero angle on step 5.


Related issues

Related to GROMACS - Task #2134: assess whether Jenkins is testing multi-rank runs appropriatelyClosed
Related to GROMACS - Feature #2163: Use better input/output value choices for SIMD testsClosed

Associated revisions

Revision e34ead15 (diff)
Added by Erik Lindahl almost 2 years ago

More SIMD math argument checking, added unsafe options

This change adds more argument checking and safeguards
for sqrt, exp2, and exp-related SIMD math functions, and
properly documents allowed values. These functions now
have an (optional) template parameter that makes it possible
to avoid the checks where it is important to save every cycle,
and the developer is certain that this usage is fine. For
now we only use the unsafe versions in the nonbonded kernels.
The SIMD function test code has also been extended with options
to allow denormals to be considered zero.

Fixes #2164.
Refs #2163.

Change-Id: I93ddadf74dd0fa013f61cf27fd1993f11cde28bc

Revision 6d32275c (diff)
Added by Berk Hess almost 2 years ago

Avoid inf in SIMD double sqrt()

Arguments >0 and <float_min to double precision SIMD sqrt()
would produce inf on many SIMD architectures. Now sqrt() will
return 0 for arguments in this range, which is not fully correct,
but should be unproblematic.
Updated the tests to check for this range and to produce output
that checks all double precision mantissa bits.

Fixes #2164.
Refs #2163.

Change-Id: Ic6d2c6d4102d602703b40e7e8bcc1974a7283f7c

History

#1 Updated by Mark Abraham over 2 years ago

  • Related to Task #2134: assess whether Jenkins is testing multi-rank runs appropriately added

#2 Updated by Mark Abraham over 2 years ago

  • Related to Feature #2163: Use better input/output value choices for SIMD tests added

#3 Updated by Berk Hess over 2 years ago

The clamping you suggest is not nice, as you often take the sqrt of a square, you get issues at values of float_min (1e-38) instead of double_min (1e-308). Still one could wonder if there are cases where it matters that one gets 0 instead of 1e-38.
A somewhat better solution would be to clamp the value passed to (inv)sqrt to float_min or to have a different maskRsqrt function that return sqrt(float_max) when masked. I don't know how many extra instructions that would take.

An even simpler solution is to use double rsqrt. I have done timings, but from the Intel SIMD intrinsics guide it would seem that the two conversions + single rsqrt already take more time than a double rsqrt on Haswell. On Skylake double rsqrt is as fast as single, so there we should surely use the double rsqrt.

#4 Updated by Berk Hess over 2 years ago

Ignore the last paragraph in my previous comment. There is no rsqrt in double on x86.

#5 Updated by Berk Hess over 2 years ago

Also ignore my other suggestions (I need more coffee, I think).

The clamping to zero for float_min will switch to zero for return values smaller than 1e-19, which is relatively early, but I don't see cases where it would really matter. But masking invsqrt with sqrt(float_max) doesn't help accuracy much.
So clamping for input to sqrt <float_min is certainly better than what we do now (and not much more expensive). I don't think there are better solutions that are not at least double the cost.

#6 Updated by Gerrit Code Review Bot over 2 years ago

Gerrit received a related patchset '1' for Issue #2164.
Uploader: Berk Hess ()
Change-Id: gromacs~release-2016~Ic6d2c6d4102d602703b40e7e8bcc1974a7283f7c
Gerrit URL: https://gerrit.gromacs.org/6599

#7 Updated by Erik Lindahl over 2 years ago

Actually, there is a double version of it, but not until Skylake with AVX512 - in that case we already use it.

However, the problem isn't at all in sqrt(). That routine only works with double, and all comparisons are correct. The main issue occurs with the double-to-float-to-double conversions inside the rsqrt() call of invsqrt() - and this routine is in the critical path.

Over the last few years I think we've changed attitude to say that performance is irrelevant unless we can guarantee that routines always provide correct results.

The proper way to fix it is to clamp values to single precision before calling either rsqrt or rcp. Since the result from that call will still be valid to within the bits specified in the lookup table, the N-R iterations will provide a correct result.

#8 Updated by Erik Lindahl over 2 years ago

Actually, even that won't help, since the clamping means we don't get enough valid bits in the lookup. I need to think a bit about the math here, but this won't be a "free lunch" fix.

#9 Updated by Erik Lindahl over 2 years ago

Actually, as far as I can see, all SIMD architectures except x86 have the lookup work right (i.e., it will provide correct results with double precision), and even on x86 it works from AVX512.

Based on that, the way to fix it is that we will need to add code to the rsqrt() and rcp() SIMD calls for SSE & AVX so the double precision lookups provide correct results.

#10 Updated by Erik Lindahl over 2 years ago

I've had a look to see what compilers generate internally, and unfortunately there doesn't appear to be any easy fix. We might have to solve it my providing both normal and "fast" versions of sqrt/invsqrt.

#11 Updated by Mark Abraham over 2 years ago

It's quite reasonable to say we clamp either sqrt or rsqrt to zero when the value is super small. We could choose not to use that rsqrt implementation in normal kernels. We just can't return inf when a dihedral has nearly zero angle, or someone produced coordinates that almost overlap. Presumably the inf arises from UB when the truncation to single precision cannot succeed?

#12 Updated by Gerrit Code Review Bot about 2 years ago

Gerrit received a related patchset '1' for Issue #2164.
Uploader: Erik Lindahl ()
Change-Id: gromacs~master~I93ddadf74dd0fa013f61cf27fd1993f11cde28bc
Gerrit URL: https://gerrit.gromacs.org/6661

#13 Updated by Szilárd Páll about 2 years ago

  • Status changed from New to In Progress

#14 Updated by Erik Lindahl about 2 years ago

  • Status changed from In Progress to Fix uploaded
  • Target version changed from 2016.4 to 2018

Changing the target to 2017, since this requires fairly large changes both to code and the specifications for what SIMD values we allow in double precision builds.

Since this doesn't seem to have hurt any actual simulations, it seems like a reasonable balance to focus on future versions instead.

#15 Updated by Erik Lindahl almost 2 years ago

  • Status changed from Fix uploaded to Resolved

#17 Updated by Aleksei Iupinov almost 2 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF