gcc 4.4.something up to at least 4.6.0 can't compile AVX256 kernels
Added work-around for gcc bug in AVX intrinsincs formal parameter
Many relatively recent gcc versions (4.5.2, possibly 4.6.0) use incorrect
formal parameters for maskload and maskstore intrinsics. This patch
detects the bug during configuration and works around it with a define.
Refs #1058, and fixes it at least from gcc-4.5.2.
#2 Updated by Mark Abraham over 6 years ago
Or we could add more general checks for the intrinsics we use for the configured acceleration level. This is an area in which compilers will tend to suck a bit. Feature tests are of course preferred over version checks. git grep
_mm_, sed and uniq should be able to identify the ones we ever use.
Our kernel tests should serve as correctness tests for compiler intrinsics. (Once I get time to organize their distribution.)
#3 Updated by Christoph Junghans over 6 years ago
Similar problems on my core i7 with avx256 acceleration for double precision:
/var/tmp/portage/sci-chemistry/gromacs-4.6_beta2/work/gromacs-4.6-beta2/src/gmxlib/nonbonded/nb_kernel_avx_256_double/nb_kernel_ElecRFCut_VdwNone_GeomW3W3_avx_256_double.c: In function 'nb_kernel_ElecRFCut_VdwNone_GeomW3W3_F_avx_256_double':
/var/tmp/portage/sci-chemistry/gromacs-4.6_beta2/work/gromacs-4.6-beta2/src/gmxlib/nonbonded/nb_kernel_avx_256_double/nb_kernel_ElecRFCut_VdwNone_GeomW3W3_avx_256_double.c:1675:22: error: incompatible types when assigning to type '__m128' from type '__vector(2) double
#4 Updated by Berk Hess over 6 years ago
- Assignee changed from Mark Abraham to Erik Lindahl
- Priority changed from Normal to 6
I think all instances of gmx_mm256_set_m128 in the AVX double kernels should be replaced by gmx_mm256_set_m128d.
I guess that these kernels have never been compiled, nor tested then?
#6 Updated by Roland Schulz over 6 years ago
These are two different problems, correct? 1) Some gcc version fail to compile the single precision kernels. 2) the double precision kernels have a bug. I think it would be better to have one redmine issue for one problem and open a separate issue for the 2nd (double precision problem).
#8 Updated by Erik Lindahl over 6 years ago
Mostly fixed by gerrit 1944. There is one remaining related caveat, for the record: gcc-4.5.2 appear to produce buggy code for AVX128 FMA instructions. Not entirely surprising since it was released before the AMD hardware was available, and for that reason I'm not going to spend any time on tracking down exact versions where it was fixed. It works fine with gcc-4.6.3 which comes with e.g. ubuntu that we installed the first second the bulldozer hardware was available.