Project

General

Profile

Bug #95

Problems with parallel PME and forces

Added by Henry David over 13 years ago. Updated over 13 years ago.

Status:
Closed
Priority:
High
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

I created a system composed by two negative charges. I added a print statement
in the update algorithm. Then I compiled mdrun in parallel and printed the
forces. The results are different if I use the 3.3.1 (or the 3.3.0 corrected)
than if I use the 3.2.1. (The same seems to happen with any other system)
The origin of this difference is apparently the do_pme subroutine. There is
posibly a problem with the parallel implementation, since the results are
correct if the same system is run with one node (np 1). I think that the
subroutine pmeredist is causing problems and might be responsible for all this.
This bug might be related to some of the earlier bugs reports, especially the
one that describes a lipid bilayer separation.
I hope I am wrong!! :
)
All the best,
Henry.

mdincl.mdp (297 Bytes) mdincl.mdp md input file Henry David, 08/07/2006 09:56 PM
topolcl.top (705 Bytes) topolcl.top topology Henry David, 08/07/2006 11:08 PM
cl.pdb (329 Bytes) cl.pdb pdb file of a Cl and Na ion Henry David, 08/07/2006 11:09 PM
gr512.tar.gz (276 KB) gr512.tar.gz pdb, topology, mdin, and forces of Water box of 512 water molecules Henry David, 08/09/2006 03:18 PM

History

#1 Updated by Henry David over 13 years ago

Created an attachment (id=46)
md input file

#2 Updated by David van der Spoel over 13 years ago

Please send the other input files too

#3 Updated by David van der Spoel over 13 years ago

Please give some details about your system as well. In particular: are you using
FFTW3?

#4 Updated by Henry David over 13 years ago

Created an attachment (id=47)
topology

#5 Updated by Henry David over 13 years ago

Created an attachment (id=48)
pdb file of a Cl and Na ion

#6 Updated by Henry David over 13 years ago

I am using fftw2. I also compiled gromacs with a couple of MPI packeges and
obtained the same result. Besides printing the forces with a print statement
within the subroutines, I also printed the forces using gmxdump.

#7 Updated by David van der Spoel over 13 years ago

It would be interesting to see whether FFTW3 fixes the problem, please check
another recent bug report about PME:
http://bugzilla.gromacs.org/show_bug.cgi?id=74
In that case we should either prevent linking to other packages (or fix it...)

#8 Updated by Henry David over 13 years ago

I have just compiled with fftw-3.1.2 and the problem is still there. The results
are the same.

#9 Updated by Berk Hess over 13 years ago

The bug is caused by an error at line 1269 of pme.c.
pidx should have size ir->nkx instead of size nsb->natoms.

A problem only occurs when natoms < fourier_nx,
which is usually not the case in md simulations,
as it does not make sense to run on more than 1 cpu with
less atoms than pme grid cells in the x-direction.

I had already encountered and fixed this bug in our development
version.

#10 Updated by Henry David over 13 years ago

Thanks David and Berk for the quick response.
I performed the changes and that solved the problem of the forces between two
ions. However, the problem still exists for other systems. I tested the modified
version of the code (suggested by Berk) for a box of water with 512 water
molecules and the forces are different if I run it in one processor or two. I
initially gave the example of the ions but it is not the only system where I saw
this problem. Probably the bug pointed by Berk is not the only one.

#11 Updated by David van der Spoel over 13 years ago

Could you in that case upload the input files for that one too? (preferably in
one tar file)?

#12 Updated by Berk Hess over 13 years ago

How big are the differences (at the first step)?
Running on different numbers of cpu's will always
give different slightly results, however the differences
should be in the last few decimals.

#13 Updated by Henry David over 13 years ago

Created an attachment (id=49)
pdb, topology, mdin, and forces of Water box of 512 water molecules

In this example you can see that the differences between running in one or two
processors are by no means small. You can see this comparing the forces over
just the few first water molecules:

> f_512_one_pro.dat <
test.trr frame 0:
f (1536x3):
f[ 0]={-2.08995e+02, 8.37162e+01, -1.02306e+03}
f[ 1]={ 3.53764e+02, -2.41035e+02, 5.89411e+02}
f[ 2]={-1.16743e+01, 3.20724e+02, 4.60618e+02}
f[ 3]={-7.15247e+02, 3.20756e+02, -8.93518e+01}
f[ 4]={ 8.82507e+02, -1.70904e+01, -4.86566e+02}
f[ 5]={ 1.37514e+02, 1.38272e+02, 4.14816e+02}
f[ 6]={ 4.50008e+02, 1.69172e+03, 1.40773e+02}
f[ 7]={-2.01155e+02, -8.67763e+02, 1.07948e+02} > f_512_two_pro.dat <
test.trr frame 0:
f (1536x3):
f[ 0]={-1.62574e+03, 2.49679e+02, 1.17202e+03}
f[ 1]={ 5.85534e+02, -4.73474e+02, -3.65826e+02}
f[ 2]={ 8.66339e+02, -5.01365e+01, -5.25921e+02}
f[ 3]={-6.12851e+02, 3.38006e+02, 7.60616e+02}
f[ 4]={ 3.72926e+02, -2.79428e+02, 1.68851e+01}
f[ 5]={ 9.27959e+01, -8.09832e+01, -5.26526e+02}
f[ 6]={-5.15624e+02, 5.64076e+02, 1.42298e+03}
f[ 7]={ 3.45854e+02, 2.93675e+01, -5.81977e+02}

#14 Updated by Berk Hess over 13 years ago

You seem to have used the options -shuffle -sort for grompp.
This reorders the water molecules.
Without these options the forces are identical, except for the last decimal.

For 512 waters the -shuffle and -sort options are useless.
If you really want to use them and compare the forces,
you have do deshuffle your trr file using trjconv with
the deshuf.ndx file produced by grompp.

#15 Updated by Henry David over 13 years ago

(In reply to comment #14)
Ok! Thanks!!

Also available in: Atom PDF