Project

General

Profile

Bug #383

particle decomposition requires preliminary trjconv -pbc mol input

Added by Chris Neale almost 10 years ago. Updated almost 10 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Erik Lindahl
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

I have attached my original files for the system in which I discovered this problem. I have tested that the system can be simplified, but I attached the original usage in case that is important and I missed it.

I can still reproduce the problem if I get rid of the angle and distance restraints, virtual protein atoms, or the entire protein.

This has been discussed on the gmx-users list.
http://lists.gromacs.org/pipermail/gmx-users/2010-January/047907.html

I am using Nehalems with the intel compiler and openmpi and have tested on both gromacs 4.0.5 and 4.0.7

Thanks,
Chris.

bugzillaPD.tar.gz (601 KB) bugzillaPD.tar.gz tpr file and files necessary to regenerate the tpr. Also a script file to show my usage Chris Neale, 01/07/2010 04:39 PM
secondUpload.tar.gz (2.03 MB) secondUpload.tar.gz tpr from the structure prior to -pbc mol; the .gro file after trjconv -pbc mol; a tpr based on the -pbc mol .gro Chris Neale, 01/07/2010 06:14 PM
md1.tpr (1.12 MB) md1.tpr a .tpr that crashed mdrun -pd with the new patch Chris Neale, 01/09/2010 03:21 AM

History

#1 Updated by Chris Neale almost 10 years ago

Created an attachment (id=409)
tpr file and files necessary to regenerate the tpr. Also a script file to show my usage

File upload failed on first attempt. Trying again...

#2 Updated by Berk Hess almost 10 years ago

mdrun gives a segv when freeing the graph.
I can't do trjconv -pbc mol (at least in git master),
because that gives the same segv.
Unfortunately valgrind only complains at the same point and not before.
At the moment I have no clue of what could be causing this.

Berk

#3 Updated by Chris Neale almost 10 years ago

Created an attachment (id=410)
tpr from the structure prior to -pbc mol; the .gro file after trjconv -pbc mol; a tpr based on the -pbc mol .gro

I have no problem to trjconv -pbc mol using gromacs 4.0.5 or 4.0.7. The yespbcmol.start.gro file that is created by trjconv using these two versions of gromacs are identical.

Files created like this:

grompp -f md.mdp -p withions.top -c nopbcmol.start.gro -o nopbcmol.tpr
trjconv -f nopbcmol.start.gro -s nopbcmol.tpr -pbc mol -o yespbcmol.start.gro
grompp -f md.mdp -p withions.top -c yespbcmol.start.gro -o yespbcmol.tpr

###

If you don't have time to solve this currently, then I suggest at least tacking the following message on to any crash when the using -pd: "Particle decomposition may require that the structure passed to grompp contains whole molecules not broken over periodic boundaries. Consider running trjconv -pbc mol; grompp; mdrun as a possible solution."

#4 Updated by Berk Hess almost 10 years ago

I fixed it.
This a bug that would only occur when atoms where connected by only
one multi-body interaction and not by so-called "chemical bonds".
This is quite unlikely, but in your case you have virtual sites
that do not have any other interactions, except more virtual sites.

Note that you were "lucky" that DD still worked.
This routine is called by most programs and on my machine mdrun
crashed in serial, with pd and with dd and also trjconv crashed.

Berk

--- a/src/gmxlib/mshift.c
++ b/src/gmxlib/mshift.c
@ -190,8 +190,15 @ static void calc_1se(t_graph g,int ftype,t_ilist *il,
if (iaa >= at_start && iaa < at_end) {
g->start=min(g->start,iaa);
g->end =max(g->end, iaa);
- /*if (interaction_function[tp].flags & IF_CHEMBOND)
/
- nbond[iaa]
+;
+ /* When making the graph we (might) link all atoms in an interaction
+ * sequentially. Therefore the end atoms add 1 to the count,
+ * the middle atoms 2.
+ */
+ if (k 1 || k nratoms) {
+ nbond[iaa] = 1;
} else {
+ nbond[iaa] = 2;
}
}
}
}

#5 Updated by Chris Neale almost 10 years ago

Thank you very much for tracking that one down Berk, in spite of the fact that my title turned out to not match the underlying problem.

Previously, I did take that same system and removed the protein and CL-, and also removed the angle restraints and pull code. This system also crashes on step 0 with -pd but does not crash when I omit the -pd argument to mdrun. This is what led me to believe that the problem was unrelated to the non-standard virtual site construction that I was using.

However, today I took the .gro from the 250 step successful DD mdrun and used it to grompp and then mdrun -pd and I received no such errors.

I suppose that my POPC-SOL only system was somewhat unstable and that, perhaps by chance, it crashed out during PD but not DD mdrun.

In any event, a proper EM would likely solve this.

Thanks again,
Chris.

#6 Updated by Chris Neale almost 10 years ago

Created an attachment (id=417)
a .tpr that crashed mdrun -pd with the new patch

#7 Updated by Chris Neale almost 10 years ago

Hi Berk,

I have applied the patch and find that the pre-trjconv -pbc mol structure still
leads to a .tpr that fails with mdrun -pd. I have attached a .tpr for this
(md1.tpr).

Perhaps I did not apply the patch correctly. I could not get the patch program
to work with it so I just did it by hand. Here is what I now have in
src/gmxlib/mshift.c (changes only in the bottom else):

for(j=0; (j&lt;end); j+=nratoms+1,ia+=nratoms+1) {
nratoms = interaction_function[ftype].nratoms;
if (ftype == F_SETTLE) {
iaa = ia[1];
if (iaa >= at_start && iaa < at_end) {
nbond[iaa] = 2;
nbond[iaa+1] += 1;
nbond[iaa+2] += 1;
g->start = min(g->start,iaa);
g->end = max(g->end,iaa+2);
}
} else {
for(k=1; (k<=nratoms); k
+) {
iaa=ia[k];
if (iaa >= at_start && iaa < at_end) {
g->start=min(g->start,iaa);
g->end =max(g->end, iaa);
/* When making the graph we (might) link all atoms in an interaction * sequentially. Therefore the end atoms add 1 to the count, * the middle atoms 2.
*/
if (k 1 || k nratoms) {
nbond[iaa] += 1;
} else {
nbond[iaa] += 2;
}
}
}
}
}

#8 Updated by Berk Hess almost 10 years ago

Hi,

I don't get any problems with the new tpr, neither serial, pd, dd nor trjconv.
You patched code looks ok.
Are you sure you recompiled and called the patched binary?

Berk

#9 Updated by Berk Hess almost 10 years ago

Have you checked if they is really still a problem
or if you missed something in your check?
I am pretty sure the fixed code is correct.

Berk

#10 Updated by Chris Neale almost 10 years ago

Sorry for the delay Berk,

I have downloaded the md1.tpr file that I uploaded on the 9th and run it through mdrun again. I do still find that the only time I get a crash is the with the -pd option and that this crash occurs with the new code.

The only thing that I can think of is that I likely created this md1.tpr file with the standard release of 4.0.5, does grompp call mshift.c? If so, then this could be the problem (although for whatever reason this still works for me with -dd).

I have ensured that my run script is calling the modified version of the code and I have just now checked that my compilation went as I expected. Here is the script that I used for my compilation:

gpc-f101n084-$ pwd
/scratch/cneale/GPC/exe/intel/gromacs-4.0.5_berkpdfix

gpc-f101n084-$ cat cn_compile_mpi.sh
#!/bin/bash

cd /scratch/cneale/GPC/exe/intel/gromacs-4.0.5_berkpdfix
mkdir exec

module purge
module load openmpi intel

export FFTW_LOCATION=/scratch/cneale/GPC/exe/intel/fftw-3.1.2/exec
export GROMACS_LOCATION=/scratch/cneale/GPC/exe/intel/gromacs-4.0.5_berkpdfix/exec
export CPPFLAGS="-I$FFTW_LOCATION/include -I/scinet/gpc/mpi/openmpi/1.3.2-intel-v11.0-ofed/include -I/scinet/gpc/mpi/openmpi/1.3.2-intel-v11.0-ofed/lib"
export LDFLAGS=-L$FFTW_LOCATION/lib

./configure --prefix=$GROMACS_LOCATION --without-motif-includes --without-motif-libraries --without-x --without-xml --enable-mpi --program-suffix="_openmpi" >output.configure.mpi 2>&1
make >output.make.mpi 2>&1
make install-mdrun >output.make_install.mpi 2>&1
touch DONE_PARALLEL
make distclean

##############################

And here I show that I was compiling the modified code:

gpc-f101n084-$ pwd
/scratch/cneale/GPC/exe/intel/gromacs-4.0.5_berkpdfix

gpc-f101n084-$ head -n 206 src/gmxlib/mshift.c | tail -n 41
static void calc_1se(t_graph *g,int ftype,t_ilist *il,
int nbond[],int at_start,int at_end) {
int k,nratoms,end,j;
t_iatom *ia,iaa;

end=il->nr;
ia=il->iatoms;
for(j=0; (j&lt;end); j+=nratoms+1,ia+=nratoms+1) {
nratoms = interaction_function[ftype].nratoms;
if (ftype == F_SETTLE) {
iaa = ia[1];
if (iaa >= at_start && iaa < at_end) {
nbond[iaa] = 2;
nbond[iaa+1] += 1;
nbond[iaa+2] += 1;
g->start = min(g->start,iaa);
g->end = max(g->end,iaa+2);
}
} else {
for(k=1; (k<=nratoms); k
+) {
iaa=ia[k];
if (iaa >= at_start && iaa < at_end) {
g->start=min(g->start,iaa);
g->end =max(g->end, iaa);
/* When making the graph we (might) link all atoms in an interaction * sequentially. Therefore the end atoms add 1 to the count, * the middle atoms 2.
*/
if (k 1 || k nratoms) {
nbond[iaa] += 1;
} else {
nbond[iaa] += 2;
}
}
}
}
}
}

################################

And here I show that I was running the modified compilation:

gpc-f101n084-$ cat ib.sh
#!/bin/bash
#PBS -l nodes=1:ib:ppn=8,walltime=48:00:00,os=centos53computeA
#PBS -N BT

  1. To submit type: qsub this.sh
  1. If not an interactive job (i.e. -I), then cd into the directory where
  2. I typed qsub.
    if [ "$PBS_ENVIRONMENT" != "PBS_INTERACTIVE" ]; then
    if [ -n "$PBS_O_WORKDIR" ]; then
    cd $PBS_O_WORKDIR
    fi
    fi

/scinet/gpc/mpi/openmpi/1.3.2-intel-v11.0-ofed/bin/mpirun -np $(wc -l $PBS_NODEFILE | gawk '{print $1}') -machinefile $PBS_NODEFILE /scratch/cneale/GPC/exe/intel/gromacs-4.0.5_berkpdfix/exec/bin/mdrun_openmpi -deffnm md1 -nosum -dlb yes -npme -1 -cpt 120 -maxh 48 -px coord.xvg -pf force.xvg -pd

gpc-f101n084-$ ls ltr
total 8832
-rw-r--r-
1 cneale pomes 1179124 Jan 14 09:45 md1.tpr
rwxr-xr-x 1 cneale pomes 629 Jan 14 09:46 ib.sh
-rw------
1 cneale pomes 106702 Jan 14 09:47 BT.e423853
rw-r--r- 1 cneale pomes 0 Jan 14 09:47 coord.xvg
rw-r--r- 1 cneale pomes 0 Jan 14 09:47 force.xvg
rw-r--r- 1 cneale pomes 9828 Jan 14 09:47 md1.log
rw-r--r- 1 cneale pomes 0 Jan 14 09:47 md1.edr
rw-r--r- 1 cneale pomes 300000 Jan 14 09:47 #step0b_n6.pdb.1#
rw-r--r- 1 cneale pomes 299780 Jan 14 09:47 #step0b_n3.pdb.1#
rw-r--r- 1 cneale pomes 1569180 Jan 14 09:47 md1.trr
rw-r--r- 1 cneale pomes 157012 Jan 14 09:47 md1.xtc
rw-r--r- 1 cneale pomes 299780 Jan 14 09:47 step0b_n2.pdb
rw-r--r- 1 cneale pomes 300000 Jan 14 09:47 step0b_n4.pdb
rw-r--r- 1 cneale pomes 299780 Jan 14 09:47 step0b_n5.pdb
rw-r--r- 1 cneale pomes 301501 Jan 14 09:47 step0c_n6.pdb
rw-r--r- 1 cneale pomes 301392 Jan 14 09:47 step0c_n3.pdb
rw-r--r- 1 cneale pomes 299780 Jan 14 09:47 step0b_n7.pdb
rw-r--r- 1 cneale pomes 299395 Jan 14 09:47 step0b_n0.pdb
rw-r--r- 1 cneale pomes 301327 Jan 14 09:47 step0c_n2.pdb
rw-r--r- 1 cneale pomes 300742 Jan 14 09:47 step0c_n5.pdb
rw-r--r- 1 cneale pomes 301326 Jan 14 09:47 step0c_n4.pdb
rw-r--r- 1 cneale pomes 300714 Jan 14 09:47 step0c_n7.pdb
rw-r--r- 1 cneale pomes 307451 Jan 14 09:47 step0c_n0.pdb
rw------ 1 cneale pomes 765 Jan 14 09:48 BT.o423853

##################################

I will try to find the original system and provide .top, .gro, etc for this run. I checked my notebook and unfortunately I didn't write down where this .tpr came from, but diff can probably get me there.

Chris.

#11 Updated by Chris Neale almost 10 years ago

It appears that there are no differences in grompp .tpr output from the regular v4.0.5 code and with your patch. To test this, I made a .tpr file from each compilation, ran these .tpr files through gmxdump, and then ran a diff:

gpc-f101n084-$ diff z.reg z.berkfix
1c1
< reg.tpr:
---

berkfix.tpr:

#12 Updated by Berk Hess almost 10 years ago

No, grompp does not care about pbc.
It is mdrun that corrects for all this.
mdrun corrects for any pbc issues, broken molecules as well as broken
charge groups.
The bug was a memory allocation issue in the graph pbc routine,
not the way the code functions.
But as fas as I can see the code is 100% correct now
and valgrind also does not give me complaints with your tpr file.

Berk

#13 Updated by Chris Neale almost 10 years ago

I now agree that you are correct and the code is fixed. In spite of the fact that this .tpr does crash (for me) with PD and not with DD, if I run it through a 10 step DD run and then use this .gro to generate a new .tpr and run it through PD it no longer crashes.

I figure my initial structure has some poor contacts that, for whatever reason, were crashing in PD but not DD. This is not a real problem as I was working with thousands of automatically generated and EM'd structures so I am not surprised that at least one of them had some problem. Once this system was relaxed with DD, it runs fine with PD.

I am happy to upload the .top .gro, etc as I mentioned before, but I'm not sure that there is any real problem here and even if there is it would not have anything to do with this ticket.

I suggest closing this one again.

Thank you very much for the resolution and the further assistance to ensure that the resolution was successful.

Chris.

#14 Updated by Berk Hess almost 10 years ago

This bug was probably reopened because of an unstable starting structure,
not because of a problem in Gromacs.

Berk

Also available in: Atom PDF