Project

General

Profile

Bug #1791

Spurious interactions that should not be there / Table routines work with 1/r leading to NaN.

Added by David van der Spoel about 4 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Category:
-
Target version:
-
Affected version - extra info:
And all higher.
Affected version:
Difficulty:
uncategorized
Close

Description

Hi,

I'm trying to debug a topology that is somewhat out of the ordinary. All atoms interaction through Tabulated coulomb only and all Van der Waals parameters are zero. However, the MFlops accounting tells me:

LJ                                       0.000012           0.000     4.2
Coul(T) 0.000036 0.002 16.0

In other words, LJ interaction are being computed anyway (with zero constants), but since the distance between particles is 0 it gives me NaNs. The above output is from 4.5.7 but the same happens in 4.6 and master.

The issues is related to shell particles: the spurious interaction is computed between two shells in the same energy group.

spurious.tgz (425 KB) spurious.tgz David van der Spoel, 08/03/2015 04:21 PM

History

#1 Updated by David van der Spoel about 4 years ago

tar xzvf spurious.tgz
cd spurious
gmx grompp -n
gmx mdrun -v

#2 Updated by David van der Spoel about 4 years ago

As a further note, the interaction between shell particles 5 and 8 in this model does not show if one dumps the neighborlist. But where else do we compute VDW interactions?

#3 Updated by David van der Spoel about 4 years ago

More clues: following processing by grompp, all vsites and shells get the same atomtype. That means in my topology where there are different atom types defined for shells and vsites (because they should use different tables), all these particles get mapped onto type = 2 (0 being OW and 1 being HW).

#4 Updated by David van der Spoel about 4 years ago

Correction, the type renumbering is correct, it only refers to Van der Waals types. This is confusing nevertheless since atoms now have different atom types for different properties, but this is not clear from the name of the variables.

#5 Updated by David van der Spoel about 4 years ago

  • Subject changed from Spurious interactions that should not be there to Spurious interactions that should not be there / Table routines work with 1/r leading to NaN.

The NaN occurs due to two reasons, first predict_shells initiates the shell positions to the nucleus. This can be turned off though by setting GMX_NOPREDICT = 1 in the environment. Second, the inner loops compute a 1/r interaction even with tables, which leads to NaN with a distance of 0.0. This seems to be the more serious thing to fix.

#6 Updated by David van der Spoel about 4 years ago

  • Target version set to 5.x

#7 Updated by Mark Abraham about 4 years ago

David van der Spoel wrote:

The NaN occurs due to two reasons, first predict_shells initiates the shell positions to the nucleus. This can be turned off though by setting GMX_NOPREDICT = 1 in the environment. Second, the inner loops compute a 1/r interaction even with tables, which leads to NaN with a distance of 0.0. This seems to be the more serious thing to fix.

Which inner loops are these?

#8 Updated by David van der Spoel about 4 years ago

E.g. nb_kernel_ElecCSTab_VdwNone_GeomP1P1_avx_256_single.c
I guess these are the old ones?

#9 Updated by Mark Abraham about 4 years ago

Yes, that's a group scheme kernel. The table lookup is done on r, not r^2, so there is always 1/r computed. See nb_kernel_ElecCSTab_VdwNone_GeomP1P1_c.c for a more comprehensible version.

Obviously having VDW would mean this cannot arise, so this aspect of the problem looks like a design limitation of the group scheme. If so, then for such a model as yours, the solution for the shell code is not to place the shell at the position nucleus if they have an interaction. (But why should they have an interaction?)

(IIRC, in the Verlet scheme, we mask off r==0 in all current kernels so such a NaN cannot arise.)

#10 Updated by David van der Spoel about 4 years ago

Masking off r==0 is not a good option, since the energy can be finite both for Coulomb and Van der Waals.
This also means there can be incorrect interactions in soft-core models.
Maybe this bug explains older strange crashes in free energy calculations?

Anyway for User table kernels there can be no assumptions that r==0 will not occur.
Is anyone working on that?

#11 Updated by Mark Abraham about 4 years ago

David van der Spoel wrote:

Masking off r==0 is not a good option, since the energy can be finite both for Coulomb and Van der Waals.

Indeed. I checked the Verlet kernels, and my memory was incorrect.

This also means there can be incorrect interactions in soft-core models.

No, those lists use nb_free_energy.c, which sets both r and 1/r to 0 if r^2 is 0.

Maybe this bug explains older strange crashes in free energy calculations?

No, per above.

Anyway for User table kernels there can be no assumptions that r==0 will not occur.

Indeed, and that's an unattractive performance hit for niche requirements... Even if not, I can't see anyone wanting to fix that on the eve of the removal of the group scheme. For your purposes, hacking on nb_kernel_template_avx_256_single.pre to blend in zeroes (and calling the generator .py script in that directory) might be good enough.

Is anyone working on that?

Yes, user tables for the Verlet scheme (& GPUs) are a high priority for Alfredo in the fall.

#12 Updated by David van der Spoel about 4 years ago

  • Status changed from New to Rejected

Thanks for the feedback. We will leave this be for now then.

#13 Updated by David van der Spoel about 4 years ago

  • Status changed from Rejected to Closed

#14 Updated by Mark Abraham over 3 years ago

  • Target version changed from 5.x to future

#15 Updated by Mark Abraham over 3 years ago

  • Target version deleted (future)

Also available in: Atom PDF