Project

General

Profile

Bug #416

All-vs-all kernel exclusion/1-4 interaction problem for small molecules

Added by Teemu Murtola over 9 years ago. Updated over 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Erik Lindahl
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Created an attachment (id=453)
tpr file that uses all-vs-all kernels

I think this still hasn't been fixed, so I'm adding a bugzilla so that it doesn't get forgotten.

It seems that in some cases, the all-vs-all kernels in the git master seem to be missing certain interactions. Below, the log output for the first configuration from two EM runs of a 21-atom system (an all-atom choline) without cutoffs where the only difference is that one uses energy monitoring groups (and hence no all-vs-all kernels):

Run 1: without energy groups
Energies (kJ/mol)
Bond Angle Proper Dih. Improper Dih. LJ-14
5.23069e+00 1.49518e+01 7.52361e-01 2.73133e-07 1.74453e+00
Coulomb-14 LJ (SR) Coulomb (SR) Potential Pressure (bar)
-1.10034e+03 -3.29903e+00 9.20951e+02 -1.60013e+02 0.00000e+00

Run 2: with multiple energy groups (no all-vs-all kernels)
Energies (kJ/mol)
Bond Angle Proper Dih. Improper Dih. LJ-14
5.23069e+00 1.49518e+01 7.52361e-01 2.73133e-07 1.74453e+00
Coulomb-14 LJ (SR) Coulomb (SR) Potential Pressure (bar)
-1.10034e+03 -3.04045e+00 9.78493e+02 -1.02212e+02 0.00000e+00

The LJ (SR) and Coulomb (SR) terms are both higher for the run without all-vs-all kernels, while other terms are identical. The reason why I think that the all-vs-all kernels are broken is that choline has a 3-fold symmetry around a bond, and the energies calculated without the all-vs-all kernels get this right, but the all-vs-all kernels don't.

Setting GMX_NOOPTIMIZEDKERNELS has no effect on the result, and neither has charge group assignment (I tried both with united-atom-like charge groups and all atoms in separate charge groups, identical energies).

Both kernels give identical energies if I set the number of excluded bonded neighbors to 2 instead of 3 (and remove the [pairs] section, but based on our earlier discussion, the exclusion handling is likely the culprit).

ref.tpr (7.98 KB) ref.tpr tpr file that uses all-vs-all kernels Teemu Murtola, 05/24/2010 12:44 PM
eg.tpr (8.22 KB) eg.tpr tpr file that doesn't use all-vs-all kernels Teemu Murtola, 05/24/2010 12:45 PM

History

#1 Updated by Teemu Murtola over 9 years ago

Created an attachment (id=454)
tpr file that doesn't use all-vs-all kernels

#2 Updated by Teemu Murtola over 9 years ago

Just to have all information related to the bug at one place, I'll also put some valgrind output here: valgrind reports an error from the all-vs-all kernels with the given tpr, as

9652 Invalid write of size 4
9652 at 0x5639484: setup_exclusions_and_indices_float
(nb_kernel_allvsall_sse2_single.c:323)
9652 by 0x5639F61: setup_aadata (nb_kernel_allvsall_sse2_single.c:492)
9652 by 0x563A1C2: nb_kernel_allvsall_sse2_single
(nb_kernel_allvsall_sse2_single.c:606)
9652 by 0x5553C1D: do_nonbonded (nonbonded.c:376)
9652 by 0x514A0AE: do_force_lowlevel (force.c:229)
9652 by 0x51E86ED: do_force (sim_util.c:702)
9652 by 0x515DE49: evaluate_energy (minimize.c:655)
9652 by 0x51638EA: do_steep (minimize.c:2068)
9652 by 0x41656E: mdrunner (runner.c:674)
9652 by 0x414FD4: mdrunner_threads (runner.c:188)
9652 by 0x4221B0: main (mdrun.c:625)
9652 Address 0x7df4284 is 4 bytes before a block of size 80 alloc'd
9652 at 0x4C25684: calloc (vg_replace_malloc.c:397)
9652 by 0x5504999: save_calloc (smalloc.c:159)
9652 by 0x56392E1: setup_exclusions_and_indices_float
(nb_kernel_allvsall_sse2_single.c:279)
9652 by 0x5639F61: setup_aadata (nb_kernel_allvsall_sse2_single.c:492)
9652 by 0x563A1C2: nb_kernel_allvsall_sse2_single
(nb_kernel_allvsall_sse2_single.c:606)
9652 by 0x5553C1D: do_nonbonded (nonbonded.c:376)
9652 by 0x514A0AE: do_force_lowlevel (force.c:229)
9652 by 0x51E86ED: do_force (sim_util.c:702)
9652 by 0x515DE49: evaluate_energy (minimize.c:655)
9652 by 0x51638EA: do_steep (minimize.c:2068)
9652 by 0x41656E: mdrunner (runner.c:674)
9652 by 0x414FD4: mdrunner_threads (runner.c:188)

This might provide a starting point for finding out what is going wrong.

#3 Updated by David van der Spoel over 9 years ago

FYI: there are two more bugs involvig all vs all code. 353 and 432, although the latter is very exotic. Since all vs all now is turned automatically it should be correct.

#4 Updated by Rossen Apostolov over 9 years ago

I disabled the all-vs-all kernels for small systems ( < 64 atoms). This is only a temporary fix and this bug should stay open.

#5 Updated by Erik Lindahl over 9 years ago

I believe this has been fixed now. It was a rather complex issue that had to deal with how we create the exclusion list, but at least it works for the system above now, and produces results identical to those with long cutoffs.

I have also updated the corresponding routines in the GB all-vs-all code, and committed to the git master branch.

Also available in: Atom PDF