Project

General

Profile

Bug #947

PowerPC invsqrt is never used

Added by Teemu Murtola about 7 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
build system
Target version:
Affected version - extra info:
4.5.*
Affected version:
Difficulty:
uncategorized
Close

Description

Both autoconf and CMake have an option to build with PowerPC hardware 1/sqrt, and both set a variable named GMX_POWERPC_INVSQRT. However, vec.h (the only place where these invsqrt defines are used), expects a GMX_POWERPC_SQRT. Also, there is logic in vec.h to use one extra iteration for the calculation, but there is no logic in CMake or autoconf files to set this.

This seems to have been broken since fc26c87a in 2009, which renamed the define in configure.ac. This change also removed the extra iteration from the calculation, which was earlier the default.

Either the PowerPC-specific invsqrt should be removed, or the #ifdef in vec.h corrected. Also, I think that the logic for the extra iteration should be either removed from vec.h or added to the build system.

Associated revisions

Revision c0ccb9cd (diff)
Added by Teemu Murtola over 6 years ago

Remove non-functional PowerPC invsqrt.

Also remove remaining Power6-related CMake code now that there is no
Power6 CPU acceleration option.

Fixes #947, see that issue for justification.

Change-Id: I6466f3ca7297f96757097119ee7b10e7e467bfe7

History

#1 Updated by Erik Lindahl about 7 years ago

We should remove it. Partly because PowerPC isn't a very important platform anymore, but the most important reason is that there should never be any config.h-dependent #ifdefs in header files - a user can easily include a public file in their own code, but if they don't set the correct defines they will get buggy results!

#2 Updated by Teemu Murtola about 7 years ago

In the top-level CMakeLists.txt, it is also enabled for BlueGene and Power6. Is this correct/performance-relevant or not? I'm all for removing unnecessary code, just wanting to be sure.

It was actually in the process of cleaning up those config.h dependencies from headers when I came across this. There are still some left, although the majority was cleaned up a few years back. This particular one actually doesn't break anything even if not set, since the user simply gets a default gmx_invsqrt() that behaves as expected. And GMX_SOFTWARE_INVSQRT is still left in the same header.

An alternative for removing these is to install a minimal version of config.h (there is already a skeleton one, gmx_header_config.h), which is included from all headers that actually depend on compile-time defines. Of course, it's better to avoid them in the first place, but I think it's better to first clean it up, and then work to remove anything unnecessary from gmx_header_config.h. There are probably cases where a gmx_header_config.h approach is more flexible and more robust than trying to detect stuff independently in headers.

#3 Updated by Erik Lindahl about 7 years ago

For the kernels we can actually do it, since they are controlled entirely at compile-time. We should look into and possibly update that later - right now I'm a bit too busy with the x86 kernels.

We should work on removing gmx_header_config.h completely - it is quite dangerous if the user places the gromacs installation on a shared NFS volume with separate bin/lib directories; then the headers will randomly correspond to whatever the last installed architecture was, not to mention if compiler flags alter some behaviors. The only safe solution is to have headers that don't depend on specific Gromacs compile-time definitions.

#4 Updated by Teemu Murtola over 6 years ago

  • Status changed from New to In Progress
  • Assignee changed from Erik Lindahl to Teemu Murtola
  • Target version set to 4.6

Removal of the code is here: https://gerrit.gromacs.org/#/c/1972/

Probably the best way to stimulate discussion, or just get things done...

#5 Updated by Erik Lindahl over 6 years ago

Let's remove it for now, and when/if we get access to a power6/power7 machine we can check whether it's worth adding back.

#6 Updated by Mark Abraham over 6 years ago

PowerPC is a whole architecture, like x86. It's not synonymous with Power6 or Power7 or anything else. Choosing not to have accelerated kernels for Power6/7 doesn't mean we should rip out everything related. gmx_invsqrt() is called by the generic C group, verlet and GB kernels, and constraint and vsite code. So this is not corner-case functionality. Those kernels are the only ones available on PowerPC, which includes BlueGene, which is actually not an unimportant platform! Since even accelerated BlueGene kernels are blue-sky projects until I get some time, why wouldn't we want to leave enabled a GROMACS feature to support hardware features of all PowerPC chips that have been around for decades and useful in multiple aspects of GROMACS for years?

The stuff about extra N-R iterations was added by Erik a few years ago. AFAIK there's no PowerPC chips that actually want that extra N-R iteration, but I'd have to do some homework to be sure of that one.

Things were set up correctly for BlueGene (software invsqrt off because powerpc sqrt was on) but the patch we've now merged breaks that (only costs performance). So something must be done.

Clearly there are some issues above with the implementation of accelerated invsqrt code that need addressing. Also,
1) gmx_software_invsqrt does not state the conditions under which it is accurate, and we can't test its accuracy because we might be cross compiling, so it needs some documentation in the code
2) the required intrinsic for gmx_powerpc_invsqrt should be detected with a compiler test (in fact, __frsqrte is probably only available from xlc), and if present should be enabled by default, because it will always out-perform the table lookup.
3) there's some value in a test target that computes gmx_invsqrt over the relevant range and observes that the accuracy is acceptable
4) we should not use the name invsqrt, because it is not an inverse square root; it's a reciprocal square root

I think the correct implementation is for CMake to test for the powerpc intrinsic, use it if found, and fall back on the table lookup version. Currently vec.h falls back on some possible library functions if software invsqrt is deselected in CMake. Do we know if those are likely to be faster than our table lookup version anywhere, Erik? If so, then we should have the compiler test for them before falling back to the table lookup.

Currently, the practice of having the gmx_software_invsqrt definition in the header file is driving the "conditional header file" problem, because at some point we couldn't trust linkers to inline the code if it was in another compilation unit. Currently, we've thrown out gmx_powerpc_invsqrt partly because it contributed to this problem, but that's not worth doing unless we throw out gmx_software_invsqrt as well, and I don't think throwing either out is reasonable. I'm in favour of moving the reciprocal square root code to a real compilation unit and trusting the linker. We lose only on an architecture where the function can't be inlined, and the loss in kernel performance is limited to cases where we do not provide accelerated kernels.

#7 Updated by Mark Abraham over 6 years ago

Teemu Murtola wrote:

Both autoconf and CMake have an option to build with PowerPC hardware 1/sqrt, and both set a variable named GMX_POWERPC_INVSQRT. However, vec.h (the only place where these invsqrt defines are used), expects a GMX_POWERPC_SQRT. Also, there is logic in vec.h to use one extra iteration for the calculation, but there is no logic in CMake or autoconf files to set this.

This seems to have been broken since fc26c87a in 2009, which renamed the define in configure.ac. This change also removed the extra iteration from the calculation, which was earlier the default.

Either the PowerPC-specific invsqrt should be removed, or the #ifdef in vec.h corrected. Also, I think that the logic for the extra iteration should be either removed from vec.h or added to the build system.

Right, there was a mismatch such that powerpc reciprocal square root acceleration wasn't used in the main GROMACS repo, so my statement about it having been useful was inaccurate - it should have been useful.

I remember fixing this years ago in the bluegene-kernel-upgrade branch (which I created because I missed the 4.5 feature freeze, and didn't think of it again for the 4.6 feature list, but it's useless now anyway), but clearly I didn't recognize at the time that my fix should have been back-ported for all PowerPC.

#8 Updated by Erik Lindahl over 6 years ago

Although it would be nice to have long term, I agree with Teemu's reason for throwing it out. The code hasn't been tested for years, and the very latest XLC documentation still claims that the RSQRTE instruction is only guaranteed to provide 6 bits of accuracy. Right now I have no idea if the routine provided accurate enough results unless GMX_POWERPC_SQRT=2 was defined.

For any modern architecture that provides hardware instructions to perform sqrt-related functionality I would assume it is better to use those than our own table version. Instructions such as rsqrf(x) are mainly hints to the compiler, i.e. that it is sufficient with single precision and that we really want the reciprocal square root. It is likely time to move away from our homegrown table version except for embedded platforms. When it comes to headers, gmx_invsqrt(x) is our single most performance-critical function (NOT limited to kernels - it is used for all bonded functions, constraints, etc.), and if it is moved to a separate compilation unit it will failed to be inlined on all compilers that do not support link-time optimization & inlining.

I don't mind adding back a proven-to-work square root for powerpc, but then we are talking about days before it has to be done. When it comes to adding more CMake detection, regressiontests and changing the nomenclature it's too late for release-4.6.

#9 Updated by Teemu Murtola over 6 years ago

Well, at least it seems that I finally managed to prompt some discussion for this. ;) I actually don't care that much whether the code is included or not, but since the only comment for the past seven months was for removing it, I simply took that path to propose solution to this issue and not leave it hanging. I serves no purpose to have non-functional code around, and only adds maintenance cost and possible user confusion (like in this case, when the cmake option had no effect whatsoever).

The issue with conditional compilation in headers has very little to do with this issue (except that I found the problem while cleaning config.h related to that). There are multiple solutions to that problem; the easiest is to simply declare gmx_invsqrt() in a header that is not installed, and use that internally. I don't think we want to be in the business of providing a generic, low-level hardware-optimized math library, and maintain that as part of the public interface of the Gromacs library...

#10 Updated by Teemu Murtola over 6 years ago

  • Status changed from In Progress to Feedback wanted

#11 Updated by Erik Lindahl over 6 years ago

  • Status changed from Feedback wanted to Closed

Since we now have access to a BG/Q and Mark will test the current actual (instead of formally documented) precision of the inverse square root lookups I think we should move the continued discussion there and focus on things we should implement for what architectures. Since the old code is properly removed thanks to Teemu's efforts I'm closing this one for now!

#12 Updated by Mark Abraham over 6 years ago

OK. Created a home for the ongoing issue at #1111

Also available in: Atom PDF