Bug #59

Wrong LJ energies on Apple Mac OS X with Myrinet

Added by Goran Krilov over 14 years ago. Updated over 14 years ago.

Target version:
Affected version - extra info:
Affected version:


I have an issue with GROMACS on Apple PPC that I am trying to
resolve. Careful analysis of our data has shown that one particular
term of the energy function (LJ short range) gives a different result,
depending on which version of GROMACS we use.

Namely, if I use the binary precompiled by GROMACS developers on darwin 8.2.0,
I get the result 9.99750e+03. However if I use the binaries built with the
Myrinet MX and Apple 4.0 compiler on 8.3.0, that were built by us
we get 1.00229e+04 for the same term. That is a significant
difference. Especially so, because all the other energy terms are exactly the
same for the two versions.
Moreover, when we use GROMACS 3.3 and GROMACS 3.2.1 on Linux, we get
9.99750e+03. Also, independent calculation with a different package (TINKER)
gives us .99750e+03. In all cases we used the same input and data files, and
are identical except for the difference in the energy term noted.
So it would seem our Myrinet built GROMACS gives the wrong answer. Any ideas
could be going on?

spc.tpr (287 KB) spc.tpr tpr file for spc water (compiled on linux x86 with GROMACS 3.2.1 Goran Krilov, 03/29/2006 04:21 PM


#1 Updated by David van der Spoel over 14 years ago

As I said on the mailing list please upload the tpr file to bugzilla, but you
can also do some tests on your own system, like compiling gromacs without
myrinet. Also is this a parallel simulation? In that case Myrinet shouldn't
matter at all. Please note that neither of the develoeprs have the setup you are
having problems on so it may be problematic to debug. We'll try however.

#2 Updated by Goran Krilov over 14 years ago

Created an attachment (id=31)
tpr file for spc water (compiled on linux x86 with GROMACS 3.2.1

#3 Updated by David van der Spoel over 14 years ago

I get the correct value on my Apple G5: 9.99750e+03 (single prec.) and on my
Opteron (single & double prec.). My mac runs tiger (10.4.5) by the way. Have you
check the website for new drivers and/or libraries? Which
MPI library are you using by the way?

Another test you can do with the tpr file that you have is to
setenv DUMPNL 1
and then rerun mdrun. The neighborlist will be dumped in your md.log file. Then
do the same on one of your other platforms. In src/mdlib you can build a program
called compnl (typing: make compnl). Thsi can be used to compare neighborlists
from the two log files just to test whether they are indentical. You could also
compare the number of interactions in the md.log files (at the very end).

#4 Updated by Erik Lindahl over 14 years ago

I cannot repeat this on any of my system, including the one where I built the OS X package (now
updated to 10.4.5, using Apple gcc-4.0) - they all produce 9.99750e+03.

As David hinted above, this is probably due to the finite precision of neighborsearching. If a pair of
atoms are just at the cutoff the decision to put them in the list (or not) can depend on the optimization
level of the compiler. Check if you have been using different compiler flags when compiling with
Myrinet support.

The reason it only seems to affect LJ is that you are using both PME and a rlist buffer, which means the
coulomb interaction at the cutoff is completely negligible.

In that case it's not really a bug - it is for instance quite common to see this type of differences
between platforms, so unless you have other problems there is little point in recompiling with the
default Gromacs flags.

Also available in: Atom PDF