Project

General

Profile

Bug #516

several pdb2gmx issues

Added by Mark Abraham about 9 years ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Using git master (commit 3a6c114ace350) pdb2gmx:

1) The list of forcefields listed by "pdb2gmx -h" is out of date. Either it should be updated from the current output, not listed, or generated in the same way as the current output.

2) The "pdb2gmx -h" text talks about looking for "a forcefield.itp file in such subdirectories in the current working directory", whereas I think it is intended that all files normally found in the force field library directory can come from either source. Perhaps some expanded text is needed here?

3) It should be clear in the "pdb2gmx -h" text whether such CWD files supersede or augment the files in the library directory, so the user knows whether to copy the file and augment, or make a new file to be augmented.

4) I expected to be able to augment the residue database easily by writing a one-residue .rtp file in the CWD. However, pdb2gmx.c calls choose_ff() very early, which sets ffdir. So when fflib_search_file_end() seeks the .rtp file, ffdir is set, and the "-cwd" command line flag, via bAddCWD has no effect. Thus, even if copying the library file to the CWD and adding the new residue there is the right approach, pdb2gmx will never know about it. This contradicts the "pdb2gmx -h" text for "-cwd", and the in-code comments.

5) Making an xxx.ff directory into which to place such a new .rtp file breaks pdb2gmx elsewhere, because it can't find all the other force field files. This seems to require copying the whole xxx.ff directory, which is cumbersome. I would have expected files not found in the local xxx.ff to be sourced from the library xxx.ff.

6) "pdb2gmx -f blah.pdb -debug 1" seems to hang while reading the -f file

History

#1 Updated by Mark Abraham about 9 years ago

7) Also, when both a local and library xxx.ff exist, which gets used depends on the command line.

When I did "pdb2gmx_master -f ../amber/L8S8_water_box.pdb -chainsep ter -water tip3p" with GMXLIB=/home/224/mxa224/progs/share/gromacs/top then I was prompted to choose a forcefield and saw

"...
-[no]rtpres bool no Use rtp entry names as residue names

Select the Force Field:
1: AMBER03 force field (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003)
2: AMBER03 force field (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003)
3: AMBER94 force field (Cornell et al., JACS 117, 5179-5197, 1995)
4: AMBER96 force field (Kollman et al., Acc. Chem. Res. 29, 461-469, 1996)
5: AMBER99 force field (Wang et al., J. Comp. Chem. 21, 1049-1074, 2000)
6: AMBER99SB force field (Hornak et al., Proteins 65, 712-725, 2006)
7: AMBER99SB-ILDN force field (Lindorff-Larsen et al., Proteins 78, 1950-58, 2010)
8: AMBERGS force field (Garcia & Sanbonmatsu, PNAS 99, 2782-2787, 2002)
9: CHARMM27 all-atom force field (with CMAP)
10: GROMOS96 43a1 force field
11: GROMOS96 43a2 force field (improved alkane dihedrals)
12: GROMOS96 45a3 force field (Schuler JCC 2001 22 1205)
13: GROMOS96 53a5 force field (JCC 2004 vol 25 pag 1656)
14: GROMOS96 53a6 force field (JCC 2004 vol 25 pag 1656)
15: OPLS-AA/L all-atom force field (2001 aminoacid dihedrals)
16: [DEPRECATED] Encad all-atom force field, using full solvent charges
17: [DEPRECATED] Encad all-atom force field, using scaled-down vacuum charges
18: [DEPRECATED] Gromacs force field (see manual)
19: [DEPRECATED] Gromacs force field with hydrogens for NMR
1

Using the Amber03 force field in directory /short/x04/mxa224/electrostatics/babu/gromacs/amber03.ff

Opening force field file /short/x04/mxa224/electrostatics/babu/gromacs/amber03.ff/aminoacids.r2b
Opening force field file /short/x04/mxa224/electrostatics/babu/gromacs/amber03.ff/dna.r2b
Opening force field file /short/x04/mxa224/electrostatics/babu/gromacs/amber03.ff/rna.r2b
Reading ../amber/L8S8_water_box.pdb...
Read 195537 atoms
..."

but when I did "pdb2gmx_master -f ../amber/L8S8_water_box.pdb -chainsep ter -water tip3p -ff amber03"

I saw
"...
-[no]rtpres bool no Use rtp entry names as residue names

Using the Amber03 force field in directory /home/224/mxa224/progs/share/gromacs/top/amber03.ff

Opening force field file /home/224/mxa224/progs/share/gromacs/top/amber03.ff/aminoacids.r2b
Opening force field file /home/224/mxa224/progs/share/gromacs/top/amber03.ff/dna.r2b
Opening force field file /home/224/mxa224/progs/share/gromacs/top/amber03.ff/rna.r2b
Reading ../amber/L8S8_water_box.pdb...
Read 195537 atoms
..."

With GMXLIB set to an empty string, both command lines tried to access the local xxx.ff (i.e. /short/x04/mxa224/electrostatics/babu/gromacs/amber03.ff) but each time pdb2gmx failed because it could not find elements.dat

#2 Updated by Berk Hess about 9 years ago

I fixed one part of the problem.
pdb2gmx now prints the directory where it found the .ff dir before printing
the selection. It now also gives a verbose fatal error when you use -ff with a name that is present more than one time.

For the -cwd issues Erik proposed to remove the whole option and force the user to copy the .ff dir to cwd when adding files.
For residuetypes.dat you have to copy the whole top dir (with or without .ff dirs) and set GMXLIB.
We should update the description for that.

Berk

#3 Updated by Berk Hess about 9 years ago

I realized now the the -cwd didn't actually read local rtp files.
So I just decided to removed the whole option, as Erik already suggested.
This might make things a bit more tedious when a user just wants to modify
or add a small part of a force field, since you will now have to copy
a whole <forcefield>.ff directory and even the whole top dir if you want
to add new protein residues.
But it does simplify the whole procedure a lot, so things are much simpler
to understand and less error as well as bug prone.
I have updated the pdb2gmx documentation for all this.

Thanks for reporting all these issues,

Berk

#4 Updated by Mark Abraham about 9 years ago

Hmm OK. The main value of the force field reorganization was elsewhere, I suppose.

There was some enthusiasm originally for the ability to make small changes to a force field in a local file, without having to copy whole directory trees, e.g. http://lists.gromacs.org/pipermail/gmx-developers/2010-January/003942.html

It seems very clumsy to require the user to make a local copy of whole database of parameters, just to (say) add a modified amino acid to the .rtp database and residuetypes.dat. That gives the user no ready way to see whether some future version of GROMACS updated or corrected relevant things. (Though to be fair, any mechanism of looking up parameters from a database might allow those parameters to change silently with new versions and the user be unaware...) I do agree it makes for easier code :-)

My quick look at the implementation you had in place suggested to me that the code searched for an xxx.ff directory and used the first it found. That directory had to have a complete file structure in it. Then (in some cases) the -cwd search could override/augment/whatever.

I would have thought a "fall through" hierarchy would be a good way to go. The tools search first the local xxx.ff, then the GMXLIB xxx.ff, then the installation xxx.ff, taking the first suitable file found. Then -cwd acts as appropriate. Doubtless there's some hairy cases needing special treatment, though. This approach allows the user to do either a "quicky" (say, addition to an .rtp) by adding a new CWD file, or a series of changes (say, to add a whole ligand+parameters) to a local xxx.ff, and never be tempted to alter the installation library (so there's always a pristine reference copy to be found, other than in the tarball/RPM).

That's easy for me to say when I'm not dealing with implementing and maintaining it, however :-)

#5 Updated by Mark Abraham about 9 years ago

I've reopened this bug, because I do not think the current implementation works correctly (git release-4-5-patches d6298f2ee).

I've copied the whole amber03.ff directory to my working directory (see ls -l output below), and modified to add my two new residue types. Now when I do


[mxa224@vayu2 gromacs]$ ~/progs/bin/pdb2gmx_master -f L8S8_water_box.pdb -chainsep ter -water tip3p -ff amber03 -ignh

:-)  G  R  O  M  A  C  S  (-:
Grunge ROck MAChoS
:-)  VERSION 4.5-beta3-20100824-d6298f2  (-:
Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2008, The GROMACS development team,
check out http://www.gromacs.org for more information.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.
:-)  /home/224/mxa224/progs/bin/pdb2gmx_master  (-:

Option Filename Type Description
------------------------------------------------------------
-f L8S8_water_box.pdb Input Structure file: gro g96 pdb tpr etc.
-o conf.gro Output Structure file: gro g96 pdb etc.
-p topol.top Output Topology file
-i posre.itp Output Include file for topology
-n clean.ndx Output, Opt. Index file
-q clean.pdb Output, Opt. Structure file: gro g96 pdb etc.

Option Type Value Description
------------------------------------------------------
h bool no Print help info and quit
[no]version bool no Print version info and quit
nice int 0 Set the nicelevel
-chainsep enum ter Condition in PDB files when a new chain and
molecule_type should be started: id_or_ter,
id_and_ter, ter, id or interactive
-ff string amber03 Force field, interactive by default. Use -h for
information.
-water enum tip3p Water model to use: select, none, spc, spce,
tip3p, tip4p or tip5p
[no]inter bool no Set the next 8 options to interactive
ss bool no Interactive SS bridge selection
[no]ter bool no Interactive termini selection, iso charged
lys bool no Interactive Lysine selection, iso charged
[no]arg bool no Interactive Arganine selection, iso charged
asp bool no Interactive Aspartic Acid selection, iso charged
[no]glu bool no Interactive Glutamic Acid selection, iso charged
gln bool no Interactive Glutamine selection, iso neutral
[no]his bool no Interactive Histidine selection, iso checking
H-bonds
angle real 135 Minimum hydrogen-donor-acceptor angle for a
H-bond (degrees)
-dist real 0.3 Maximum donor-acceptor distance for a H-bond (nm)
[no]una bool no Select aromatic rings with united CH atoms on
Phenylalanine, Tryptophane and Tyrosine
ignh bool yes Ignore hydrogen atoms that are in the pdb file
[no]missing bool no Continue when atoms are missing, dangerous
v bool no Be slightly more verbose in messages
-posrefc real 1000 Force constant for position restraints
-vsite enum none Convert atoms to virtual sites: none, hydrogens
or aromatics
[no]heavyh bool no Make hydrogen atoms heavy
deuterate bool no Change the mass of hydrogens to 2 amu
[no]chargegrp bool yes Use charge groups in the rtp file
cmap bool yes Use cmap torsions (if enabled in the rtp file)
[no]renum bool no Renumber the residues consecutively in the output
-[no]rtpres bool no Use rtp entry names as residue names

-------------------------------------------------------
Program pdb2gmx_master, VERSION 4.5-beta3-20100824-d6298f2
Source code file: ../../../src/kernel/pdb2top.c, line: 186

Fatal error:
There are multiple force field directories in your path with the name 'amber03'. Run without the ff switch and select the force field interactively.
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
------------------------------------------------------


It seems that if you try to replace the whole xxx.ff directory with a local copy, you can't use the "-ff xxx" mechanism. That seems unduly restrictive.

**
[mxa224@vayu2 gromacs]$ ls l
total 44984
-rw-r----
1 mxa224 x04 11038328 Aug 23 20:25 L8S8_water_box.pdb
rw-r---- 1 mxa224 x04 10754842 Aug 24 02:37 L8S8_water_box_fewer_ter.pdb
drwxr-x--- 2 mxa224 x04 4096 Aug 25 12:45 amber03.ff
rw-r---- 1 mxa224 x04 15447584 Aug 24 03:50 clean.pdb
rw-r---- 1 mxa224 x04 8799222 Aug 24 03:42 conf.gro
rw-r--r- 1 mxa224 x04 2580 Aug 24 18:55 residuetypes.dat
[mxa224@vayu2 gromacs]$ ls l amber03.ff/
total 400
-rw-r--r-
1 mxa224 x04 2591 Aug 23 16:52 aminoacids.arn
rw-r--r- 1 mxa224 x04 10 Aug 24 13:43 aminoacids.c.tdb
rw-r--r- 1 mxa224 x04 9338 Aug 25 12:45 aminoacids.hdb
rw-r--r- 1 mxa224 x04 8884 Aug 23 16:52 aminoacids.hdb~
rw-r--r- 1 mxa224 x04 10 Aug 24 13:43 aminoacids.n.tdb
rw-r--r- 1 mxa224 x04 849 Aug 24 13:43 aminoacids.r2b
rw-r--r- 1 mxa224 x04 83070 Aug 25 12:44 aminoacids.rtp
rw-r--r- 1 mxa224 x04 81597 Aug 23 19:03 aminoacids.rtp~
rw-r--r- 1 mxa224 x04 5207 Aug 24 13:43 aminoacids.vsd
rw-r--r- 1 mxa224 x04 3835 Aug 24 13:43 atomtypes.atp
rw-r--r- 1 mxa224 x04 116 Aug 24 13:43 dna.arn
rw-r--r- 1 mxa224 x04 3104 Aug 24 13:43 dna.hdb
rw-r--r- 1 mxa224 x04 145 Aug 24 13:43 dna.r2b
rw-r--r- 1 mxa224 x04 29892 Aug 24 13:43 dna.rtp
rw-r--r- 1 mxa224 x04 33295 Aug 24 13:43 ffbonded.itp
rw-r--r- 1 mxa224 x04 4683 Aug 24 13:43 ffnonbonded.itp
rw-r--r- 1 mxa224 x04 763 Aug 24 13:43 forcefield.doc
rw-r--r- 1 mxa224 x04 945 Aug 24 13:43 forcefield.itp
rw-r--r- 1 mxa224 x04 1948 Aug 24 13:43 gbsa.itp
rw-r--r- 1 mxa224 x04 2163 Aug 24 13:43 ions.itp
rw-r--r- 1 mxa224 x04 116 Aug 24 13:43 rna.arn
rw-r--r- 1 mxa224 x04 4104 Aug 24 13:43 rna.hdb
rw-r--r- 1 mxa224 x04 145 Aug 24 13:43 rna.r2b
rw-r--r- 1 mxa224 x04 30277 Aug 24 13:43 rna.rtp
rw-r--r- 1 mxa224 x04 746 Aug 24 13:43 spc.itp
rw-r--r- 1 mxa224 x04 746 Aug 24 13:43 spce.itp
rw-r--r- 1 mxa224 x04 802 Aug 24 13:43 tip3p.itp
rw-r--r- 1 mxa224 x04 1261 Aug 24 13:43 tip4p.itp
rw-r--r- 1 mxa224 x04 1317 Aug 24 13:43 tip4pew.itp
rw-r--r- 1 mxa224 x04 1870 Aug 24 13:43 tip5p.itp
rw-r--r- 1 mxa224 x04 1250 Aug 24 13:43 urea.itp
rw-r--r- 1 mxa224 x04 214 Aug 24 13:43 watermodels.dat

#6 Updated by Mark Abraham about 9 years ago

On the plus side, pdb2gmx now correctly flags an error relating to termini when the input .pdb file doesn't have the terminal-specific residue names, which are required with AMBER forcefields. Earlier this week, it didn't.

However, the error message under "-ignh" for the N-termini (which is now missing hydrogens) would be a bit cryptic for some users, so I've committed an enhancement that passes ffname into get_hackblocks_rtp so that the error message can be conditional and be more useful here. Perhaps a more general check for AMBER terminal residues having well-formed names(?) is in order. Thoughts?

#7 Updated by Mark Abraham about 9 years ago

OK I found some more issues with atom renaming.

1) When there were a large number of TER records separating chains, I was getting persistent crashes at the 993rd, which led me to suspect some kind of filehandle leak. Memory debugging confirmed that, and it turned out that low_fflib_search_file_end() was using gmx_directory_open() and not closing. So I committed two fixes for that oversight. Now the large-TER .pdb works fine.

2) rename_atoms() uses fflib_open(), which calls libgmxfn(), which calls low_libgmxfn() passing bAddCWD=TRUE. Then things started breaking, but I can't reproduce that now. Perhaps these other issues were confounding things. Ignore this unless I post again!

#8 Updated by Erik Lindahl about 9 years ago

Hi Mark,

Does that mean you've resolved it already?

#9 Updated by Mark Abraham about 9 years ago

The issue of comment #5 is still unresolved. pdb2gmx shouldn't force the user to choose the force field interactively when they've already actively copied a force field directory to the working directory, presumably to modify it.

Everything else is OK. Berk's probably ignoring comment #4, too :-P

#10 Updated by Erik Lindahl about 9 years ago

We had a chat about this the other day.

I see your problem, but the main issue historically with pdb2gmx is that it has tried to do way too many things under the hood, and in many cases it has been impossible to prevent it from doing stupid things even when you know what it is doing wrong.

Having each force field in a self-contained directory is a really nice way of isolating this behavior, and we don't want to break that by suddently having pdb2gmx open files in entirely different places. Then the result of the processing could depend on minor changes in the path variable.

Even our large force field directories are just ~200kb and takes <0.01 seconds to copy, so I don't really see the major inconvenience of copying the directory instead of the individual files.

#11 Updated by Mark Abraham about 9 years ago

Sure, your reasoning against my wish in comment #4 makes sense.

I'm still keen on getting comment #5 changed. If the user's done "pdb2gmx ff amber03" and there's a local amber03.ff, then surely that's the one they want, and that should be the end of proceedings. :) If they don't want to use the local version, they can choose the library one interactively, or rename their directory. Everything always comes from a single hopefully-well-formed xxx.ff directory, and the behaviour is quite stable.

#12 Updated by Erik Lindahl about 9 years ago

Ah, I see. That fix we'll have to incorporate. I'll do it later tonight, but can't promise it will be in the next beta. It will be in the final release, though.

#13 Updated by Erik Lindahl about 9 years ago

Fixed. Commit 52d7ce2f9aca385fe37b5204844e47acd5fc01b9 now chooses the FF in the current directory by default, if it is specified explicitly.

Also available in: Atom PDF