include list of compiler flags in the binary version information
It would be highly beneficial to have the full list of compiler options recorded in the binaries and printed in the version information and/or the log file. This will enable identifying performance issues caused by incorrect or sub-optimal compiler optimizations (e.g. on Bulldozer -march=bdver1).
New CPU detection & AVX/SSE code, removed raw assembly files.
Removed all raw assembly files and deprecated altivec support.
Removed support for NASM and other assemblers, and replaced
previous SSE detection code with a new module using CPUID instead.
Added detection for SSE2, SSE4.1, AVX 128-bit with FMA, and AVX 256-bit.
Added Cmake detection of build platform based on CPUID, and output this
to the log file. The executables now compare the compile-time platform
and selected acceleration with the run-time platform and most suitable
acceleration and warns the user if they do not match. The compiler
detection code has also been reordered slightly to produce more readable
warnings when OpenMP is not available, and correctly disable pragma
Added intrinsics code and math functions for SSE2, SSE4.1, AVX128/256
both in single and double precision. All math functions and permutation
code have been tested & verified. Single precision math functions are
correct apart from the least significant bit, and double precision has
roughly twice the accuracy.
This has forced me to temporarily disable the SSE & Fortran acceleration.
SSE will be added back soon based on new intrinsics-only kernels currently
in testing, and we will test if Fortran still makes sense then.
Finally, the patch includes a modification to gmx_rmsdist where
a regression issue was introduced recently by using sqrtf() for
the norm function. This caused the intel compiler to produce slightly
different results at high optimization leves, which got evident here.
Closes #926 - Raw assembly code has been removed.
Refs #923 - Old kernels removed, new will be added shortly.
Fixes #914 - Cmake now does architecture-speficic optimization.
Fixes #912, #913
Fixes #857 - We detect rdtscp support with CPUID and use it if possible.
Closes #537, #574 - Altivec is now deprecated.