Project

General

Profile

Bug #3249

cmake bad tests for avx512 on AMD

Added by Anton Shterenlikht 2 months ago. Updated 3 days ago.

Status:
In Progress
Priority:
Low
Assignee:
Category:
-
Target version:
-
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Building on AMD Rome for AMD Rome,
with Cray clang-based C and C++ compilers:

-- Performing Test C_mavx512f_mfma_FLAG_ACCEPTED
-- Performing Test C_mavx512f_mfma_FLAG_ACCEPTED - Success
-- Performing Test C_mavx512f_mfma_COMPILE_WORKS
-- Performing Test C_mavx512f_mfma_COMPILE_WORKS - Success

-- Performing Test CXX_mavx512f_mfma_FLAG_ACCEPTED
-- Performing Test CXX_mavx512f_mfma_FLAG_ACCEPTED - Success
-- Performing Test CXX_mavx512f_mfma_COMPILE_WORKS
-- Performing Test CXX_mavx512f_mfma_COMPILE_WORKS - Success

These tests are not sufficient,
and cause problems at build time -
binaries will segfault as avx512
is not supported on AMD.
The tests only check whether
compiler supports the flags,
which is does, just for a different target.

The workaround is to use

-DGMX_ENABLE_AVX512_TESTS=OFF

But I think the avx512 tests in

https://github.com/gromacs/gromacs/blob/master/cmake/gmxManageSimd.cmake

should be tightened for AMD,
e.g. just by testing whether
the compiled tests actually run.

stdout (2.09 MB) stdout Anton Shterenlikht, 01/06/2020 04:12 PM

History

#1 Updated by Erik Lindahl about 2 months ago

  • Assignee set to Erik Lindahl

#2 Updated by Erik Lindahl about 2 months ago

  • Status changed from New to Blocked, need info
  • Priority changed from Normal to Low

Those tests are quite intentional; we only check the flags to be able to compile a single file with AVX-512 support, and the routine in that file will never be called at runtime unless the AVX512F CPUID flag is set.

I just tested the latest version in the release-2019 branch, and it works just fine both to configure, compile, and run on first-generation EPYC/Ryzen.

If you actually do get a segfault, please

- Use the latest version (2019.5)
- Include full configuration output/logs
- If the segfault happens at runtime, include the command line that caused it, and ideally output or a debug showing the stack trace.

#3 Updated by Anton Shterenlikht about 2 months ago

Did you test with Cray?
If so - what version?

If you use GCC - are you relying on multi-versioning maybe?
(https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/Function-Multiversioning.html)

2019.5 build still fails for me with the Cray CCE/10 compiler
in identifyavx512fmaunits
because the compiler is invoked with:

"-target-cpu" "znver2" "-target-feature" "+avx2" "-target-feature" "+avx512f"

I'll check with my managers what files I can share.

#4 Updated by Erik Lindahl about 2 months ago

No, we don't have any Cray compiler in our default CI/testing environment, but if Cray wants to donate a cluster I'm sure we could talk :-)

There should not be any need for multi-versioning since no symbol in that object file is ever called on a non-AVX512-host.

First, If that still happens, we need to see how CMake was invoked and that it did not detect it as an AVX512 host by mistake.
Second, we need to see whether it failed at build time or when running a normal Gromacs job.
Third, we'd like to see the head of the log file to make sure the configuration was sane, and double-check what source and libraries were used.
Fourth, we like to see what files use what compiler options, since general/Rome-specific flags are separate from the ones that should be used for a single specific AVX512-unit file.
Fifth, a debug trace helps us see where it fails. "identifyavx512fmaunits" is both a CMake test, a translation unit, and a function that in turn calls other functions. The stack trace would instantly tell whether it's a routine being called incorrectly at runtime (possibly due to a configuration error) or whether it's something subtle about a hidden initialisation routine being generated by the compiler.

We have very specific reasons for the AVX512 test in particular. It's there because the optimal SIMD set is different depending on whether the CPU has 1 or 2 AVX512 units, and Intel did not provide any way to check that without timing. Since that means we will use AVX2 on some AVX512-capable hardware, we still want to be able to check at runtime to make sure the frontend node wasn't low-end while the actual compute nodes have 2 AVX512 units.

For now this is not an issue on EPYC, but

1) we prefer to avoid unnecessary separate code paths
2) if it is an issue with the Cray compiler, it can be an issue on Intel CPUs too
3) we don't know what SIMD units AMD might add in the future

So, it's much more efficient to see all the information rather than guesses about what the error is ;-)

Cheers,

Erik

#5 Updated by Anton Shterenlikht about 2 months ago

cmake was invoked with

FLAGS="-O3 -Rpass=.* -fsave-loopmark" 
CC=cc
CPP=CC
cmake \
-DCMAKE_BUILD_TYPE=DEBUG \
-DCMAKE_C_COMPILER=$CC -DCMAKE_C_FLAGS="${FLAGS}" \
-DCMAKE_CXX_COMPILER=$CPP -DCMAKE_CXX_FLAGS="${FLAGS}" \
-DGMX_GPU=OFF -DGMX_OPENMP=OFF -DGMX_MPI=ON -DGMX_SIMD=AVX_256 \
-DGMX_BUILD_OWN_FFTW=ON \
-DGMX_HWLOC=OFF -DGMX_BUILD_SHARED_EXE=OFF -DGMX_CYCLE_SUBCOUNTERS=ON \
-DGMX_EXTERNAL_BLAS=ON -DGMX_EXTERNAL_LAPACK=ON \
-DGMX_BLAS_USER=$CRAY_LIBSCI_PREFIX_DIR/lib/libsci_cray_mpi.a \
-DGMX_LAPACK_USER=$CRAY_LIBSCI_PREFIX_DIR/lib/libsci_cray_mpi.a \
-DMPIEXEC=`which srun` \
-DMPIEXEC_NUMPROC_FLAG="-n" \
        -DCMAKE_INSTALL_PREFIX=$jube_benchmark_home/gromacs/Stage${stage}/$version/$compiler\
        -DREGRESSIONTEST_PATH=../../regressiontests-${version} ../../gromacs-${version}/

Attached is stdout with cmake and make output.

#6 Updated by Erik Lindahl about 2 months ago

  • Status changed from Blocked, need info to In Progress

Thanks a lot Anton, I'll try to have a look at this already tomorrow!

#7 Updated by Anton Shterenlikht 3 days ago

Any progress?

Also available in: Atom PDF