Bug #3249

cmake bad tests for avx512 on AMD

Added by Anton Shterenlikht about 1 year ago. Updated 11 months ago.

In Progress
Target version:
Affected version - extra info:
Affected version:


Building on AMD Rome for AMD Rome,
with Cray clang-based C and C++ compilers:

-- Performing Test C_mavx512f_mfma_FLAG_ACCEPTED
-- Performing Test C_mavx512f_mfma_FLAG_ACCEPTED - Success
-- Performing Test C_mavx512f_mfma_COMPILE_WORKS
-- Performing Test C_mavx512f_mfma_COMPILE_WORKS - Success

-- Performing Test CXX_mavx512f_mfma_FLAG_ACCEPTED
-- Performing Test CXX_mavx512f_mfma_FLAG_ACCEPTED - Success
-- Performing Test CXX_mavx512f_mfma_COMPILE_WORKS
-- Performing Test CXX_mavx512f_mfma_COMPILE_WORKS - Success

These tests are not sufficient,
and cause problems at build time -
binaries will segfault as avx512
is not supported on AMD.
The tests only check whether
compiler supports the flags,
which is does, just for a different target.

The workaround is to use


But I think the avx512 tests in

should be tightened for AMD,
e.g. just by testing whether
the compiled tests actually run.

stdout (2.09 MB) stdout Anton Shterenlikht, 01/06/2020 04:12 PM


#1 Updated by Erik Lindahl about 1 year ago

  • Assignee set to Erik Lindahl

#2 Updated by Erik Lindahl about 1 year ago

  • Status changed from New to Blocked, need info
  • Priority changed from Normal to Low

Those tests are quite intentional; we only check the flags to be able to compile a single file with AVX-512 support, and the routine in that file will never be called at runtime unless the AVX512F CPUID flag is set.

I just tested the latest version in the release-2019 branch, and it works just fine both to configure, compile, and run on first-generation EPYC/Ryzen.

If you actually do get a segfault, please

- Use the latest version (2019.5)
- Include full configuration output/logs
- If the segfault happens at runtime, include the command line that caused it, and ideally output or a debug showing the stack trace.

#3 Updated by Anton Shterenlikht about 1 year ago

Did you test with Cray?
If so - what version?

If you use GCC - are you relying on multi-versioning maybe?

2019.5 build still fails for me with the Cray CCE/10 compiler
in identifyavx512fmaunits
because the compiler is invoked with:

"-target-cpu" "znver2" "-target-feature" "+avx2" "-target-feature" "+avx512f"

I'll check with my managers what files I can share.

#4 Updated by Erik Lindahl about 1 year ago

No, we don't have any Cray compiler in our default CI/testing environment, but if Cray wants to donate a cluster I'm sure we could talk :-)

There should not be any need for multi-versioning since no symbol in that object file is ever called on a non-AVX512-host.

First, If that still happens, we need to see how CMake was invoked and that it did not detect it as an AVX512 host by mistake.
Second, we need to see whether it failed at build time or when running a normal Gromacs job.
Third, we'd like to see the head of the log file to make sure the configuration was sane, and double-check what source and libraries were used.
Fourth, we like to see what files use what compiler options, since general/Rome-specific flags are separate from the ones that should be used for a single specific AVX512-unit file.
Fifth, a debug trace helps us see where it fails. "identifyavx512fmaunits" is both a CMake test, a translation unit, and a function that in turn calls other functions. The stack trace would instantly tell whether it's a routine being called incorrectly at runtime (possibly due to a configuration error) or whether it's something subtle about a hidden initialisation routine being generated by the compiler.

We have very specific reasons for the AVX512 test in particular. It's there because the optimal SIMD set is different depending on whether the CPU has 1 or 2 AVX512 units, and Intel did not provide any way to check that without timing. Since that means we will use AVX2 on some AVX512-capable hardware, we still want to be able to check at runtime to make sure the frontend node wasn't low-end while the actual compute nodes have 2 AVX512 units.

For now this is not an issue on EPYC, but

1) we prefer to avoid unnecessary separate code paths
2) if it is an issue with the Cray compiler, it can be an issue on Intel CPUs too
3) we don't know what SIMD units AMD might add in the future

So, it's much more efficient to see all the information rather than guesses about what the error is ;-)



#5 Updated by Anton Shterenlikht about 1 year ago

cmake was invoked with

FLAGS="-O3 -Rpass=.* -fsave-loopmark" 
cmake \
-DGMX_BLAS_USER=$CRAY_LIBSCI_PREFIX_DIR/lib/libsci_cray_mpi.a \
-DMPIEXEC=`which srun` \
        -DREGRESSIONTEST_PATH=../../regressiontests-${version} ../../gromacs-${version}/

Attached is stdout with cmake and make output.

#6 Updated by Erik Lindahl about 1 year ago

  • Status changed from Blocked, need info to In Progress

Thanks a lot Anton, I'll try to have a look at this already tomorrow!

#7 Updated by Anton Shterenlikht 11 months ago

Any progress?

#8 Updated by Paul Bauer 11 months ago

Erik, can you give any more information here?

#9 Updated by Erik Lindahl 11 months ago

The detection in Cmake appears to be just fine. We correctly set it to avx, and then turn on the single separate file to do runtime checks for avx-512.

I can only imagine two causes:

1. Something goes wrong with the runtime detection, so we call the avx-512 routine by mistake. This is less likely, since we haven't seen it elsewhere.

2. The Cray compiler inserts some extra code using avx-512 instructions in the object file that is always executed, even if we don’t actually call the avx-512 routine. This is my main suspicion :-)

Can you run the code in a debugger and show the stack trace where it fails?

#10 Updated by Erik Lindahl 11 months ago

PS: I'm working on a potential fix that will use a wrapper file compiled w/o AVX512 enabled to decide if the tests should be run, and then a separate lowlevel file that uses actual AVX512 instructions. More later today.

Also available in: Atom PDF