GB simulation explodes or crashes in nonbonded kernel
A GB simulation on a ~5k atom system (attached) keeps on exploding/crashing. The bug is 100% reproducible with the following conditions:
- git version 5922b72
- gcc 4.1.3, dynamically linked mdrun binary (NOT statically linked)
- optimization levels O3/2/1, but NOT O0
- also debug version
Tested on AMD X6 1090T + Ubuntu 9.04 x86_64 and the same binary on Core i5 750 + Ubuntu 9.10 x86_64.
Crash details (on AMD X6):
======== 1-3 thread(s) ========
step 700, remaining runtime: -15 s Warning: 1-4 interaction between 3907 and 3912 at distance 7.505 which is larger than the 1-4 table size 2.200 nm
These are ignored for the rest of the simulation
This usually means your system is exploding,
if not, you should increase table-extension in your mdp file
or with user tables increase the table size
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fa0d9d676f0 (LWP 2027)]
nb_kernel410_x86_64_sse (p_nri=<value optimized out>, iinr=0x1f1b2b0, jindex=0x1f56b00, jjnr=0x7fa0d8d8b010,
shift=0x1f42d90, shiftvec=0x1b47490, fshift=0x1b476c0, gid=0x1f2f020, pos=0x1aa4f30, faction=0x1dca520,
charge=0x1f0c3f0, p_facel=0x1b38148, p_krf=0x1b38150, p_crf=0x1b38154, vc=0x1dc9880, type=0x1f11350,
p_ntype=0x1b38328, vdwparam=0x1b478f0, vvdw=0x1dc9670, p_tabscale=0x1d867b8, VFtab=0x0,
invsqrta=0x1ce08c0, dvda=0x1ce5820, p_gbtabscale=0x1b38398, GBtab=0x7fa0d9b91020,
p_nthreads=0x7fff6c3fbdfc, count=0x1d86840, mtx=0x0, outeriter=0x7fff6c3fbdf8, inneriter=0x7fff6c3fbdf4,
work=0x7fff6c3fbde0) at /usr/lib/gcc/x86_64-linux-gnu/4.1.3/include/xmmintrin.h:876
876 return (_m128) *(_v4sf *)__P;
======== >=4 threads ========
The charge group starting at atom 3925 moved than the distance allowed by the domain decomposition (3.826748) in direction X
distance out of cell -15.151535
Old coordinates: 16.482 2.773 3.745
New coordinates: -2.707 -9.060 -36.987
Old cell boundaries in direction X: 12.381 16.151
New cell boundaries in direction X: 12.445 16.271
Program mdrun_gcc413_dynamic_debug, VERSION 4.0.99-dev-20100608-5922b72
Source code file: ../../../src/mdlib/domdec.c, line: 4081
#2 Updated by Berk Hess over 9 years ago
This is quite probably a gcc 4.1 optimization bug (in this case 4.1.3).
gcc 4.1.3 gives incorrect results with -O2 for the test program at:
So we are not 100% sure this is not a GB bug, but lets not waist more
time on this.
We will put a warning or error in the Gromacs configure script
when it detects gcc 4.1.?
#5 Updated by Per Larsson over 9 years ago
This was indeed a GB bug and not gcc 4.1.3
It was due to a faulty routine for updating two potential values at once using SSE.
I have replaced it with a routine that updates a single value, calling that routine twice.
The double-updating thing should work also of course (will look into that), but for now the code works again.