Bug #1097

nbnxn pme kernel fails when building 32-bit AVX_256 version on MSVC 2012

Added by Erik Lindahl about 8 years ago. Updated about 8 years ago.

Target version:
Affected version - extra info:
Affected version:


The debugger reports an access violation error on line 649 of nbnxn_kernel_simd_4xn_inner.h.

Associated revisions

Revision 42f6913f (diff)
Added by Berk Hess about 8 years ago

fixed nbnxn AVX-256 Ewald table pointer alignment

This could lead to access violations, but up till now
this was only observed with 32-bit MSVC builds.
Fixes #1097

Also added more documentation to nbnxn_kernel_simd_utils.h,
renamed macro variables and added minor table optimization
by replacing shuffle with movelh or unpack.

Change-Id: I9df2131cbabb0e6cb86f7224a5fd238acb2e7681

Revision 73dd31e8 (diff)
Added by Erik Lindahl about 8 years ago

Fixes SSE/AVX compilation under Windows

- 32-bit MSVC cannot handle more than 3 xmm/ymm register
arguments due to stack alignment, so some group kernel
routines have been copied into optional macros. These are
only used for 32-bit MSVC compiles; other alternatives including
icc on windows use the normal functions that are easier to debug.
- Since the windows compilers control 32 vs 64 bit with flags, a
new log file entry has been added to show whether the present
build is a 32 or 64 bit one.
- Minor fixes to enable double precision AVX_128_FMA builds on
- Replace use of explicit binary constants with _MM_SHUFFLE()
macro in nbnxn kernels to make it compile under windows.
With these fixes, both SSE2, SSE4, and AVX256 group kernels pass
regressiontests in single and double with MSVC2010, MSVC2012, and
icc 2013.1 used with visual studio 2012. The nbnxn kernels pass
all tests with the exception of 32-bit double precision AVX_256
where all three compilers still fail (Refs #1097).
Fixes #1092, #1093, #1068.

Change-Id: I6807b102af1db01cafba26a45284f5c38c7498fd


#1 Updated by Erik Lindahl about 8 years ago

nbnxn_pme and nbnxn_vsite fail with the same error message when using icc to create a 32-bit build in windows.

#2 Updated by Erik Lindahl about 8 years ago

Oops, almost forgot: This only occurs in double precision 32-bit builds.

In summary, the problem occurs both with MSVC 2012, MSVC 2010, and ICC 2013 used in MSVC 2012. Single precision builds are fine, as are all 64-bit builds.

#3 Updated by Erik Lindahl about 8 years ago

  • Assignee set to Berk Hess

#4 Updated by Berk Hess about 8 years ago

  • Status changed from New to Feedback wanted

I found and fixed an issue with the alignment of a pointer uses at that line. I assume this should fix it. Erik, could you check if my fix solves the crash?

#5 Updated by Roland Schulz about 8 years ago

  • Status changed from Feedback wanted to In Progress

No the problem is still there. The access violation is now at line 452 (GMX_MM_INVSQRT2_PD(rsq_SSE2,rsq_SSE3,rinv_SSE2,rinv_SSE3); rsq_SSE2 = {m256d_f64=0x049be720 }, rsq_SSE3 = {m256d_f64=0x049be6c0 }, rinv_SSE2 = {m256d_f64=0x049be700 }, rinv_SSE3 = {m256d_f64=0x049be6a0 }. Disass: vmovaps ymmword ptr [ir_SSE],ymm0 , ir_SSE = {m256_f32=0x049bdae0 }. So it still seems to be a aligning issue. Let me know if you need more info and I'll try to get MSVC to tell me which line in the macro is the problem.

#6 Updated by Berk Hess about 8 years ago

I don't see what could be wrong now.
Could you try if the single precision AVX kernel also fails by running with GMX_NBNXN_EWALD_TABLE=1

I will fix the double precision kernel by using gmx_mm_extract_epi32, as for 128bit. This is faster anyhow. But then the same bug might still be in the single precision AVX kernels.
I can also completely remove the ti aligned index by also using the then slower gmx_mm_extract_epi32 for the AVX-256 single precision Ewald table kernel, which is never used anyhow.

#7 Updated by Berk Hess about 8 years ago

I see now that my last comments were not relevant, as the issue is in a different macro.

GMX_MM256_INVSQRT2_PD operates on AVX128 and AVX256 register only and never loads or stores, so I don't see how there can be a problem there, unless the registers themselves are not properly aligned on the stack.

#8 Updated by Erik Lindahl about 8 years ago

After spending a lot of time tonight going through the entire kernel I had a second look at Berk's patch... and noticed there's one sizeof(real) remaining that hasn't been changed to sizeof(int) on line 382.

Once that is fixed it works fine in my virtual machine.

#9 Updated by Berk Hess about 8 years ago

Thanks for the effort!
I fixed this, now silly, mistake.
It would be so nice to be able to set aligned variables on the stack, but that's only supported on a few compilers now.

#10 Updated by Erik Lindahl about 8 years ago

  • Status changed from In Progress to Closed

Also available in: Atom PDF