Error in nsgrid - Range checking error - Problem in box precision - Fixed
I am sending this to bugzilla since many in the users-list seem to face the same problem and it can be solved with a small fix.
Bug #109 seems relevant and I think it was not completely resolved.
The problem we had, was that simulations were crashing with a message:
"Range checking error:...
...Variable ci has value 583. It should have been within [ 0 .. 512 ]"
This was the case for simulations that:
- were performed parallel with double precision gromacs
- writing to XTC
- grid was fluctuating before crash (in the above case Grid: 9 x 9 x 9 cells to Grid: 8 x 8 x 8 cells)
After some searching, I realized that box dimensions were not entirely the same. The master process
had a lower box size (corresponding to 8x8x8) than the other processes (corresponding to 9x9x9).
The actual problem originated from writing to the XTC file using the address of the box variable, turning
the box size to single precision accuracy only for the master process.
Solution that worked (apart from the obvious which is to avoid any of the above conditions happening):
Edit routine /src/gmxlib/stat.c:
-Find "void write_xtc_traj"
-Add definition of a temporary matrix variable: static matrix boxcopy;
-Before the "if (write_xtc(xd,natoms,step,t,box,x_sel,prec) 0)" add
-Change the if statement to
if (write_xtc(xd,natoms,step,t,boxcopy,x_sel,prec) 0)
Compile and try it. I believe for the same reason the coordinates are copied to a temporary array.
Worked for us.
PS I am not sure whether anything should go to users list. Many seem to face the same problem
#1 Updated by Berk Hess about 12 years ago
I have fixed the problem in a more thorough way,
since it also affected the time variable and xtc writing
from any program in general.
I made fixed the routine that actually writes the single
floats not to do the conversion on the original variable,
but on a copy.
Unfortunately we just released version 3.3.2.
I will send a mail to the gmx-users list.
Often an even simpler solution is to use single precision :)
Double precision is almost never required.