Project

General

Profile

Task #1907

keeping compile- and run-time CPU/arch detection consistent

Added by Szilárd Páll almost 4 years ago. Updated over 3 years ago.

Status:
Accepted
Priority:
Low
Assignee:
-
Category:
build system
Target version:
-
Difficulty:
uncategorized
Close

Description

With the current master code the CPU architecture detection is not performed with the same code both at compile- and runtime. This can easily lead to inconsistencies especially as the detection code becomes more complex and tries to cover more architectures.

We should consider whether a separate simplistic CMake-based CPU arch detection is realistic to keep up-to-date and in sync with runtime detection.


Related issues

Related to GROMACS - Bug #1906: incorrect CPU detection on AMD Jaguar/Puma architecturesClosed
Related to GROMACS - Feature #2072: target hardware description is silent about the resultClosed

History

#1 Updated by Szilárd Páll almost 4 years ago

  • Related to Bug #1906: incorrect CPU detection on AMD Jaguar/Puma architectures added

#2 Updated by Erik Lindahl over 3 years ago

It's fundamentally a chicken-and-egg problem that I don't think can be solved.
First, in release 5.1 the detection is performed with exactly the same code, but apparently we still got an error there.

The problem is that we need to do the SIMD detection very early in the build configuration since the required flags/settings can have huge influence on all other tests. By definition, this means that all the other tests haven't been run yet. In particular for things that aren't x86 we will do a way better job with detection of features once we know exactly what we are running.

At some point it's a matter of time and effort, and IMHO it simply isn't worth months of effort to rewrite things into e.g. first doing pre-detection, then having a separate large CMake configuration to build a separate full detection module with the same settings as Gromacs, and finally running the real Gromacs configuration.

I think it's far easier to fix the incorrect corner case bugs we do find during testing.

#3 Updated by Szilárd Páll over 3 years ago

Erik Lindahl wrote:

It's fundamentally a chicken-and-egg problem that I don't think can be solved.
First, in release 5.1 the detection is performed with exactly the same code, but apparently we still got an error there.

As I mentioned above, the error is in the assumptions/decisions the code makes based on the decision (i.e. AMD => AVX_128_FMA) rather than in the detection.

More generally, it is of course reasonable to implement a "best effort" automation and solve issues later, so I'm not saying that this AMD assumption should have never even occurred.

However, I do see an issue with an inherently inconsistency-prone detection code and it's not obvious (to me) why is this a chicken-egg problem. IIRC I proposed using the same code for both compile- and runtime detection back in the 4.6 era exactly to avoid inconsistencies. What makes it impossible in the new code to avoid a (crippled) re-implementation of the SIMD target picking? Is it C++ flags and related complications that would cause trouble? What about hwloc?

#4 Updated by Erik Lindahl over 3 years ago

Well, if you're not happy with the present solution that tries to use the same code when we can, but different paths when we have to, here's what you need to write:

First, deciding what SIMD acceleration to use is a feature that should be located to the SIMD module (to where we have now moved it), in particular so the rest of the does not make assumptions or interpretations about the internals of the SIMD module. That code in turn needs to call the low-level CPU detection code in the hardware module to match hardware capabilities with implemented software. In other words: Your detection would need to compile these two modules.

Using ifdefs around code (as we do now) and occasionally having different paths is not an option if you absolutely want the very same code to be executed.

Thus, the "only" thing required is to make sure that both these modules compile

1) For every single SIMD architecture
2) For every single compiler
3) For every single OS

at the very start of the CMake file. You can't rely on any CMake detection at all (since we need to know e.g. the SIMD architecture when setting the right compiler flags).
And, finally one shouldn't forget that the resulting code should be reasonably simple and easy to read.

If anybody wants to waste lots of time for something that is not likely to work anyway (IMHO), go right ahead - I'll be happy to have a look at the patch.

#5 Updated by Szilárd Páll over 3 years ago

OK. Still not clear why is better to have neatly organized modules and C++ class hierarchies than consistent detection that does not result in picking the wrong SIMD at compile-time and consequently crashing at run-time. If the runtime detection is not compiled in a way that it can always run and possibly warn (rather than throwing an illegal instruction), I see questionable benefit to a complicated (though C++d) detection iso the same simple detection as the compile-time one. Sure it will catch most compiled_SIMD < runtime_SIMD cases if the user made a lucky suboptimal pick or the CMake code had a lucky bug, but is that worth the hassle (esp. in the light of SIMD diversification e.g. the AVX 512 family)?

I raised a practical issue, but as the discussion does not seem to be on technical merits of it, I'd rather just close this issue. I can likely pick the right SIMD arch anyway :)

#6 Updated by Szilárd Páll over 3 years ago

  • Status changed from New to Rejected

#7 Updated by Mark Abraham over 3 years ago

  • Status changed from Rejected to Accepted
  • Priority changed from Normal to Low

If there was a way to have cmake's try_run take two source files, then I think it would be a simple matter of linking the code in hardware/cpuinfo.cpp and simd/support.cpp so that the detection runs, and gets passed to the SIMD module to see its decision. That would eliminate some duplication of logic and prevent inconsistencies (which are admittedly unlikely, so this would be a rainy Sunday afternoon job, at least until AVX512 flavours start proliferating in practice...)

Those source files looks pretty light on dependencies, so should work well with some extra ifdef GMX_CPUINFO_STANDALONE. But I don't think there's any way to get it done at cmake time unless we arrange to concatenate those two files (with cmake's file() command), or use two preprocessor includes into a third file, before passing the resulting single file to try_run. A cmake external_project could do the job less crudely, but I'm not sure there's a way to get the result back.

#8 Updated by Szilárd Páll over 3 years ago

Mark Abraham wrote:

That would eliminate some duplication of logic and prevent inconsistencies (which are admittedly unlikely, so this would be a rainy Sunday afternoon job, at least until AVX512 flavours start proliferating in practice...)

In any case, IMO it's a matter of when not if it is worth it, new architectures are coming up.

Those source files looks pretty light on dependencies, so should work well with some extra ifdef GMX_CPUINFO_STANDALONE. But I don't think there's any way to get it done at cmake time unless we arrange to concatenate those two files (with cmake's file() command), or use two preprocessor includes into a third file, before passing the resulting single file to try_run. A cmake external_project could do the job less crudely, but I'm not sure there's a way to get the result back.

Both preprocessing or simple concatenation sound like workable options!

#9 Updated by Erik Lindahl over 3 years ago

If there was a way to have cmake's try_run take two source files, then I think it would be a simple matter of linking the code in hardware/cpuinfo.cpp and simd/support.cpp

No, that won't fix it. Note how we might do a very simple detection in cpuinfo.cpp at CMake time (possibly even just checking /proc/cpuinfo), while we enable more advanced stuff once we have all the correct build options for the hardware.

#10 Updated by Mark Abraham over 3 years ago

  • Target version deleted (2016)

FWIW I disagree with Erik's last, but probably I have to try to write code to show it :-)

#11 Updated by Szilárd Páll about 3 years ago

  • Related to Feature #2072: target hardware description is silent about the result added

Also available in: Atom PDF