Bug #926

Kernel010 in Single precision on Windows crashes with Segfault

Added by Roland Schulz over 7 years ago. Updated over 7 years ago.

Target version:
Affected version - extra info:
Affected version:


I just tried to make the regressiontests run on Windows too and I noticed that several of the complex (7/14), kernel (6/63) and pdb2gmx (32/42) fail. They are all OK on Mac/Linux in all configurations. One cause is that Kernel010 segfaults. It seems to be OK in double precision. It segfaults with latest 4.5 release and the "Compiler=msvc CompilerVersion=2010 host=bs_Win2008_64" and "Compiler=icc CompilerVersion=12.1.2 GMX_DLOPEN=OFF GMX_THREADS=OFF GMX_FFT_LIBRARY=fftpack host=bs_Win2008_64" configs. It is OK with "Compiler=msvc CompilerVersion=2008 GMX_DOUBLE=ON GMX_OPENMP=OFF GMX_THREADS=OFF GMX_FFT_LIBRARY=fftpack host=bs_Win2008_64".

Associated revisions

Revision 5ba7125c (diff)
Added by Erik Lindahl over 7 years ago

New CPU detection & AVX/SSE code, removed raw assembly files.

Removed all raw assembly files and deprecated altivec support.
Removed support for NASM and other assemblers, and replaced
previous SSE detection code with a new module using CPUID instead.
Added detection for SSE2, SSE4.1, AVX 128-bit with FMA, and AVX 256-bit.
Added Cmake detection of build platform based on CPUID, and output this
to the log file. The executables now compare the compile-time platform
and selected acceleration with the run-time platform and most suitable
acceleration and warns the user if they do not match. The compiler
detection code has also been reordered slightly to produce more readable
warnings when OpenMP is not available, and correctly disable pragma

Added intrinsics code and math functions for SSE2, SSE4.1, AVX128/256
both in single and double precision. All math functions and permutation
code have been tested & verified. Single precision math functions are
correct apart from the least significant bit, and double precision has
roughly twice the accuracy.

This has forced me to temporarily disable the SSE & Fortran acceleration.
SSE will be added back soon based on new intrinsics-only kernels currently
in testing, and we will test if Fortran still makes sense then.

Finally, the patch includes a modification to gmx_rmsdist where
a regression issue was introduced recently by using sqrtf() for
the norm function. This caused the intel compiler to produce slightly
different results at high optimization leves, which got evident here.

Closes #926 - Raw assembly code has been removed.
Refs #923 - Old kernels removed, new will be added shortly.
Fixes #914 - Cmake now does architecture-speficic optimization.
Fixes #912, #913
Fixes #857 - We detect rdtscp support with CPUID and use it if possible.
Fixes #750
Closes #537, #574 - Altivec is now deprecated.

Change-Id: Icfca5a940762f8d82ae67b59c65b2d2ac683256d


#1 Updated by Roland Schulz over 7 years ago

The non-segfault errors in the regression-tests were caused by a bug in the perl script fixed by This leaves only segfaults: 6/14 in complex (aminoacids, argon, butane, dec+water, fe_test, tip4pflex) , 1/63 in kernel (kernel010), and 32/42 in pdb2gmx.

#2 Updated by Roland Schulz over 7 years ago

#3 Updated by Roland Schulz over 7 years ago

It is also OK with the C-kernel or compiled in 32bit. For 4.6 #923 is probably automatic fixing this issue. But we probably still need to fix it for the 4.5 branch somehow. It seems that the stack gets trashed before it segfaults (stack depth is 1). And I'm not aware of any memory debugger for 64bit Windows to see where the kernel starts to go wrong.

#4 Updated by Roland Schulz over 7 years ago

I can reproduce this error with NASM 2.09.10 on Linux. Thus we probably simply need to see which Nasm version are affected and don't need to change/fix the Kernel.

#5 Updated by Roland Schulz over 7 years ago

On Linux reverting 0b8f869fe63dfe2bbb7 fixes the issue but it doesn't fix the issue on Windows. Thus these are different problems.
Also using a different NASM version doesn't fix the problem on Windows. The Segfault happens with 2.08.02, 2.09.10, and 2.10.

Is FAH only using 32bit binaries? Or why is this error not effecting FAH?

#6 Updated by Roland Schulz over 7 years ago

Also nasm 2.10 gives a lot of warnings "instruction is not lockable".

#7 Updated by Erik Lindahl over 7 years ago

  • Category set to mdrun
  • Priority changed from High to Normal

We'll look into this and all other 4.5.6-target bugs after 4.6 is out.

#8 Updated by Roland Schulz over 7 years ago

But it is effecting 4.6 too.

BTW: I'm really confused how no one has noticed this earlier. Almost any (somewhat complex) tpr segfaults on Win64 with the default build settings (single) on all native compilers. Is no one using Gromacs in 64bit on Windows?

#9 Updated by Erik Lindahl over 7 years ago

Roland Schulz wrote:

But it is effecting 4.6 too.

As you noticed in the third comment, this will be fixed by #923.

However, sorry for getting a little grumpy about this, but we are MONTHS past the complete feature freeze and reporting of new bugs for 4.6. Even if this would have affected 4.6, it would not have been fixed for 4.6.0 unless somebody volunteered. Please help with addressing currently open Redmine issues, but new issues opened now cannot have 4.6.0 as a target anymore.

#10 Updated by Roland Schulz over 7 years ago

True. I overlooked the #923 fix when writing that. Sorry.
Since 64bit windows single precision seem to have never worked with SSE it isn't a regression either.

My main interest in this was that not all Jenkins builds fail. I now simply changed that the configurations on Windows in single precision are build with GMX_ACCELERATION=none.

#11 Updated by Peter Kasson over 7 years ago

Erik's comments (and the #923 fix) stand, but you are correct in that no one is using Gromacs on 64-bit windows yet to my knowledge. We did some test builds for FAH, but the cores there remain 32-bit at the moment. That's something we should change (memory limitations with 32-bit get in the way), but it hasn't happened yet.

#12 Updated by Roland Schulz over 7 years ago

  • Status changed from New to Closed

Closed by 5ba7125c.

Also available in: Atom PDF