Feature #914

CMake support for architecture-specific optimizations

Added by Szilárd Páll over 7 years ago. Updated about 7 years ago.

build system
Target version:


With the recent hardware architectures, especially with the verlet/nbnxn codepath in 4.6 architecture-specific optimizations have become important.

CMake support for this would involve:
  • the possibility to turn on automated detection and generate the appropriate compiler options for the CPU GROMACS is built on (using the detection implemented in #913)
  • manual selection of individual optimizations (e.g. SSE4 or AVX for the nbnxn PME kernels)

Associated revisions

Revision 5ba7125c (diff)
Added by Erik Lindahl about 7 years ago

New CPU detection & AVX/SSE code, removed raw assembly files.

Removed all raw assembly files and deprecated altivec support.
Removed support for NASM and other assemblers, and replaced
previous SSE detection code with a new module using CPUID instead.
Added detection for SSE2, SSE4.1, AVX 128-bit with FMA, and AVX 256-bit.
Added Cmake detection of build platform based on CPUID, and output this
to the log file. The executables now compare the compile-time platform
and selected acceleration with the run-time platform and most suitable
acceleration and warns the user if they do not match. The compiler
detection code has also been reordered slightly to produce more readable
warnings when OpenMP is not available, and correctly disable pragma

Added intrinsics code and math functions for SSE2, SSE4.1, AVX128/256
both in single and double precision. All math functions and permutation
code have been tested & verified. Single precision math functions are
correct apart from the least significant bit, and double precision has
roughly twice the accuracy.

This has forced me to temporarily disable the SSE & Fortran acceleration.
SSE will be added back soon based on new intrinsics-only kernels currently
in testing, and we will test if Fortran still makes sense then.

Finally, the patch includes a modification to gmx_rmsdist where
a regression issue was introduced recently by using sqrtf() for
the norm function. This caused the intel compiler to produce slightly
different results at high optimization leves, which got evident here.

Closes #926 - Raw assembly code has been removed.
Refs #923 - Old kernels removed, new will be added shortly.
Fixes #914 - Cmake now does architecture-speficic optimization.
Fixes #912, #913
Fixes #857 - We detect rdtscp support with CPUID and use it if possible.
Fixes #750
Closes #537, #574 - Altivec is now deprecated.

Change-Id: Icfca5a940762f8d82ae67b59c65b2d2ac683256d


#1 Updated by Roland Schulz about 7 years ago

  • Status changed from New to Closed

Fixed by 5ba7125c.

Also available in: Atom PDF