Project

General

Profile

Bug #462

pdb2gmx processing with CHARMM broken

Added by Justin Lemkul over 9 years ago. Updated over 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Erik Lindahl
Category:
mdrun
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

Per the recent discussion on gmx-users (http://lists.gromacs.org/pipermail/gmx-users/2010-July/052382.html), it appears that pdb2gmx has several problems when processing an input structure under the CHARMM force field. All information contained here pertains to the git version as of 7/7/2010.

1. Double hydrogens are added to backbone amides. As I posted to the list, both H and HN are added to all non-terminal residues, giving an excess +0.31 charge on every amino acid.

2. I can't get a topology for lysozyme (PDB code 1AKI) under CHARMM, or any other structure, for that matter. The input file for 1AKI is from NMR, so it contains H atoms. I stripped out crystal waters and ran:

pdb2gmx -f 1AKI.pdb

...which gave me a fatal error:

Program pdb2gmx, VERSION 4.0.99-dev-2010-07-07 08:35:28 -0700-7cbc3b9
Source code file: pdb2gmx.c, line: 591

Fatal error:
Atom HN in residue CYS 6 not found in rtp entry CYS2 with 10 atoms
while sorting atoms. Maybe different protonation state.
Remove this hydrogen or choose a different protonation state.
Option -ignh will ignore all hydrogens in the input.
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors

Turns out there are several typos in the aminoacids.rtp file, making reference to "H" instead of "HN" causing the above complaint. The affected residues are CYS2 and LSN.

If I add -ignh to the above pdb2gmx command line (common practice, I should think, since most of the H are named incorrectly), I get a different fatal error (after fixing the typos in the .rtp file):

-------------------------------------------------------
Program pdb2gmx, VERSION 4.0.99-dev-2010-07-07 08:35:28 -0700-7cbc3b9
Source code file: pdb2top.c, line: 1315

Fatal error:
There were 128 missing atoms in molecule Protein, if you want to use this incomplete topology anyhow, use the option missing
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
------------------------------------------------------

I received a warning that atom HN was missing from each of the amino acids. It's as if the .hdb file is not being read correctly, or somehow it's being misinterpreted. I receive the same message even if I strip out all H atoms from the structure, so it's very troubling, almost as if H atoms can't be properly added to the structure.

3. Minor issue: the warnings about missing atoms refer to "buidling" blocks, so a minor typo fix would clean up the output :)

History

#1 Updated by Justin Lemkul over 9 years ago

I should also add some of the notes from the discussion on the list. It appears that C-terminal caps are name-specific, i.e. the one proposed by the user on the list (CBX) failed, but simply renaming the residue and the .rtp entry to NAC allowed pdb2gmx to finish by selecting "None" for the N- and C-termini. Using CBX caused pdb2gmx to fail with an error referring to a "dangling bond" at one terminus.

It sounds like something weird is going on here, as well, since shouldn't we be able to define any residue name we want for any building block that we might create?

I would also ask that the "dangling bond" error message be clarified a bit. It doesn't make a bit of sense to me, and likely someone whose English skills are weak would also be very confused by the cryptic message. It might also be helpful if the error message indicated which terminus was causing the problem, if possible.

#2 Updated by Pär Bjelkmar over 9 years ago

I confirm this problem. The reason is (at least partly) due to some changes in pdb2gmx since this worked in May. But first of all, there were two typos in the CYS2 and LSN residues as pointed out. This have now been fixed.

When this has been fixed, pdb2gmx complains about a missing HISE (when running pdb2gmx -f 1AKI.pdb). This is strange since there's no HISE present in CHARMM nor is it mentioned in the GROMACS manual (GROMACS histidines are called HISA, HISB and HISH, right?). When looking at oplsaa.ff there's a HISE, corresponding to GROMACS HISB. If I work around this problem by adding a HISE residue in rtp and hdb files it works.

Another thing, there're lines in the oplsaa.ff/aminoacids.r2b file converting opls names to GROMACS building blocks but when looking at opls rtp file, the GROMACS names are used anyway. In the same r2b file there's also a proper header saying that the first column corresponds to GMX and the second to "Force-field" but HISA and HISB residues are in "Force-field" column (it clearly says in the rtp file that HISD is OPLS-naming)!? Something fishy is going on here or do I misunderstand?

(In reply to comment #0)

Per the recent discussion on gmx-users
(http://lists.gromacs.org/pipermail/gmx-users/2010-July/052382.html), it
appears that pdb2gmx has several problems when processing an input structure
under the CHARMM force field. All information contained here pertains to the
git version as of 7/7/2010.

1. Double hydrogens are added to backbone amides. As I posted to the list,
both H and HN are added to all non-terminal residues, giving an excess +0.31
charge on every amino acid.

2. I can't get a topology for lysozyme (PDB code 1AKI) under CHARMM, or any
other structure, for that matter. The input file for 1AKI is from NMR, so it
contains H atoms. I stripped out crystal waters and ran:

pdb2gmx -f 1AKI.pdb

...which gave me a fatal error:

Program pdb2gmx, VERSION 4.0.99-dev-2010-07-07 08:35:28 -0700-7cbc3b9
Source code file: pdb2gmx.c, line: 591

Fatal error:
Atom HN in residue CYS 6 not found in rtp entry CYS2 with 10 atoms
while sorting atoms. Maybe different protonation state.
Remove this hydrogen or choose a different protonation state.
Option -ignh will ignore all hydrogens in the input.
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors

Turns out there are several typos in the aminoacids.rtp file, making reference
to "H" instead of "HN" causing the above complaint. The affected residues are
CYS2 and LSN.

If I add -ignh to the above pdb2gmx command line (common practice, I should
think, since most of the H are named incorrectly), I get a different fatal
error (after fixing the typos in the .rtp file):

-------------------------------------------------------
Program pdb2gmx, VERSION 4.0.99-dev-2010-07-07 08:35:28 -0700-7cbc3b9
Source code file: pdb2top.c, line: 1315

Fatal error:
There were 128 missing atoms in molecule Protein, if you want to use this
incomplete topology anyhow, use the option missing
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
------------------------------------------------------

I received a warning that atom HN was missing from each of the amino acids.
It's as if the .hdb file is not being read correctly, or somehow it's being
misinterpreted. I receive the same message even if I strip out all H atoms
from the structure, so it's very troubling, almost as if H atoms can't be
properly added to the structure.

3. Minor issue: the warnings about missing atoms refer to "buidling" blocks, so
a minor typo fix would clean up the output :)

#3 Updated by Justin Lemkul over 9 years ago

(In reply to comment #2)

I confirm this problem. The reason is (at least partly) due to some changes in
pdb2gmx since this worked in May. But first of all, there were two typos in the
CYS2 and LSN residues as pointed out. This have now been fixed.

Great, thanks.

When this has been fixed, pdb2gmx complains about a missing HISE (when running
pdb2gmx -f 1AKI.pdb). This is strange since there's no HISE present in CHARMM
nor is it mentioned in the GROMACS manual (GROMACS histidines are called HISA,
HISB and HISH, right?). When looking at oplsaa.ff there's a HISE, corresponding
to GROMACS HISB. If I work around this problem by adding a HISE residue in rtp
and hdb files it works.

Is the problem with "HISE" or "HSE"? The version I used had HSE already present, and the complaint I get now (after fixing only CYS2 and LSN) is about the HG atom name on SER (which is HG1 in the aminoacids.rtp file). Using -ignh gives me the fatal error about missing atoms.

Another thing, there're lines in the oplsaa.ff/aminoacids.r2b file converting
opls names to GROMACS building blocks but when looking at opls rtp file, the
GROMACS names are used anyway. In the same r2b file there's also a proper
header saying that the first column corresponds to GMX and the second to
"Force-field" but HISA and HISB residues are in "Force-field" column (it
clearly says in the rtp file that HISD is OPLS-naming)!? Something fishy is
going on here or do I misunderstand?

Something does seem weird. I think whatever this is doing is causing the fatal error about missing atoms, and perhaps even the duplicate H/HN problem I reported on gmx-users.

(In reply to comment #0)

Per the recent discussion on gmx-users
(http://lists.gromacs.org/pipermail/gmx-users/2010-July/052382.html), it
appears that pdb2gmx has several problems when processing an input structure
under the CHARMM force field. All information contained here pertains to the
git version as of 7/7/2010.

1. Double hydrogens are added to backbone amides. As I posted to the list,
both H and HN are added to all non-terminal residues, giving an excess +0.31
charge on every amino acid.

2. I can't get a topology for lysozyme (PDB code 1AKI) under CHARMM, or any
other structure, for that matter. The input file for 1AKI is from NMR, so it
contains H atoms. I stripped out crystal waters and ran:

pdb2gmx -f 1AKI.pdb

...which gave me a fatal error:

Program pdb2gmx, VERSION 4.0.99-dev-2010-07-07 08:35:28 -0700-7cbc3b9
Source code file: pdb2gmx.c, line: 591

Fatal error:
Atom HN in residue CYS 6 not found in rtp entry CYS2 with 10 atoms
while sorting atoms. Maybe different protonation state.
Remove this hydrogen or choose a different protonation state.
Option -ignh will ignore all hydrogens in the input.
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors

Turns out there are several typos in the aminoacids.rtp file, making reference
to "H" instead of "HN" causing the above complaint. The affected residues are
CYS2 and LSN.

If I add -ignh to the above pdb2gmx command line (common practice, I should
think, since most of the H are named incorrectly), I get a different fatal
error (after fixing the typos in the .rtp file):

-------------------------------------------------------
Program pdb2gmx, VERSION 4.0.99-dev-2010-07-07 08:35:28 -0700-7cbc3b9
Source code file: pdb2top.c, line: 1315

Fatal error:
There were 128 missing atoms in molecule Protein, if you want to use this
incomplete topology anyhow, use the option missing
For more information and tips for troubleshooting, please check the GROMACS
website at http://www.gromacs.org/Documentation/Errors
------------------------------------------------------

I received a warning that atom HN was missing from each of the amino acids.
It's as if the .hdb file is not being read correctly, or somehow it's being
misinterpreted. I receive the same message even if I strip out all H atoms
from the structure, so it's very troubling, almost as if H atoms can't be
properly added to the structure.

3. Minor issue: the warnings about missing atoms refer to "buidling" blocks, so
a minor typo fix would clean up the output :)

#4 Updated by Pär Bjelkmar over 9 years ago

(In reply to comment #3)

(In reply to comment #2)
Is the problem with "HISE" or "HSE"? The version I used had HSE already
present, and the complaint I get now (after fixing only CYS2 and LSN) is about
the HG atom name on SER (which is HG1 in the aminoacids.rtp file). Using -ignh
gives me the fatal error about missing atoms.

I get complaints about HISE (HSE is present in the rtp because it's the CHARMM name and it should be converted to GMX building block name HISB in aminoacids.r2b). Could you do a git pull, recompile and try
pdb2gmx -f 1AKI.pdb
and confirm you get this very same error also. So we are on the same page. The LSN and CYS2 issue should be fixed in the current git head.

Another thing, there're lines in the oplsaa.ff/aminoacids.r2b file converting
opls names to GROMACS building blocks but when looking at opls rtp file, the
GROMACS names are used anyway. In the same r2b file there's also a proper
header saying that the first column corresponds to GMX and the second to
"Force-field" but HISA and HISB residues are in "Force-field" column (it
clearly says in the rtp file that HISD is OPLS-naming)!? Something fishy is
going on here or do I misunderstand?

Something does seem weird. I think whatever this is doing is causing the fatal
error about missing atoms, and perhaps even the duplicate H/HN problem I
reported on gmx-users.

Yes probably. But when I run with current git I don't get the duplicate H/HN problem (once bypassing the HISE error mentioned above by adding HISE in rtp and hdb files).

#5 Updated by Justin Lemkul over 9 years ago

(In reply to comment #4)

(In reply to comment #3)

(In reply to comment #2)
Is the problem with "HISE" or "HSE"? The version I used had HSE already
present, and the complaint I get now (after fixing only CYS2 and LSN) is about
the HG atom name on SER (which is HG1 in the aminoacids.rtp file). Using -ignh
gives me the fatal error about missing atoms.

I get complaints about HISE (HSE is present in the rtp because it's the CHARMM
name and it should be converted to GMX building block name HISB in
aminoacids.r2b). Could you do a git pull, recompile and try
pdb2gmx -f 1AKI.pdb
and confirm you get this very same error also. So we are on the same page. The
LSN and CYS2 issue should be fixed in the current git head.

OK, now I'm seeing the same result you are - pdb2gmx cannot find HISE.

...

Yes probably. But when I run with current git I don't get the duplicate H/HN
problem (once bypassing the HISE error mentioned above by adding HISE in rtp
and hdb files).

This issue is not occurring any more with the current git. I don't know what was going on before, but probably nothing to worry about now. I guess all that's left is to sync up the residue naming.

#6 Updated by David van der Spoel over 9 years ago

As far as I can see the rtp files in git have not been updated (there is no HISE). Could you please do that Pär?

#7 Updated by Pär Bjelkmar over 9 years ago

(In reply to comment #6)

As far as I can see the rtp files in git have not been updated (there is no
HISE). Could you please do that Pär?

Hello, of course I could do that but there're two issues. First, where does the HISE residue name come from and what protonation state of the histidine does it correspond too, HISB as in opls? As far as I know there's only HISA, HISB and HISH in GROMACS, or has this changed? Secondly, adding this residue would be inconsistent with the idea (Berk (and I) had) to keep the force field-specific names in the rtp file and do the renaming in the r2b file. But again, this "convention" might have changed since I left for vacation, since (looking at the oplsaa ff) the GROMACS names are used (even though there's still a r2b file).

Could somebody enlighten me here?

In the meantime, assuming the renaming treatment of pdb2gmx has not changed also, it should be enough to add a line in the aminoacids.r2b file to "add" the HISE residue. I can try this out and commit it to git the next time I'm in the office.

BTW: I was not able to add Berk to the mailing list of this bug.

#8 Updated by Justin Lemkul over 9 years ago

(In reply to comment #7)

(In reply to comment #6)

As far as I can see the rtp files in git have not been updated (there is no
HISE). Could you please do that Pär?

Hello, of course I could do that but there're two issues. First, where does the
HISE residue name come from and what protonation state of the histidine does it
correspond too, HISB as in opls? As far as I know there's only HISA, HISB and
HISH in GROMACS, or has this changed? Secondly, adding this residue would be
inconsistent with the idea (Berk (and I) had) to keep the force field-specific
names in the rtp file and do the renaming in the r2b file. But again, this
"convention" might have changed since I left for vacation, since (looking at
the oplsaa ff) the GROMACS names are used (even though there's still a r2b
file).

Could somebody enlighten me here?

In the meantime, assuming the renaming treatment of pdb2gmx has not changed
also, it should be enough to add a line in the aminoacids.r2b file to "add" the
HISE residue. I can try this out and commit it to git the next time I'm in the
office.

BTW: I was not able to add Berk to the mailing list of this bug.

I don't know if I'm fully up-to-speed on the new intrinsics of all the renaming and such, but adding an entry in the aminoacids.r2b file like:

HISE HSE

solves the problem. That is, pdb2gmx completes and produces a viable topology. It raises another question, though. The net charge on 1AKI I get is +2, when it should be +8. It appears that the default behavior is to now treat lysine as neutral instead of charged. The pdb2gmx help information still claims that the charged state (which should be correct in most instances) is default, but this is clearly not the case, and I presume the issue is related to LYS being interpreted as "LSN." From the array in pdb2gmx.c, it seems that the Gromacs building blocks should be LYSN and LYS, so the CHARMM aminoacids.r2b file should be:

LYSN LSN

in order to get charged lysines by default. The current version of the .r2b file seems to be incorrect, and wouldn't really need a line corresponding to LYS, since it would just translate to itself :) With these modifications, pdb2gmx produces a topology for 1AKI that appears correct.

#9 Updated by David van der Spoel over 9 years ago

There still is some rubbish under the carpet:

[leyster:src/kernel] % grep hh *[ch]
hizzie.c: hh[type],pdba->resinfo[hisind].nr);
hizzie.c: *pdba->resinfo[hisind].rtp = strdup(hh[type]);
pdb2gmx.c: return select_res(ehisNR,resnr,hh,expl,"HISTIDINE",nrr,rr);
pdb2top.c:const char *hh[ehisNR] = { "HISD", "HISE", "HISH", "HIS1" };
pdb2top.h:extern const char *hh[ehisNR];

So these names are hardcoded. What I think happens is that the HIS names are converted internally to one of these four depending on protonation state, and then the conversion listed in the .r2b file should turn it back into a force field dependent name.

The r2b file therefore lists two columns: GMX (internal name) and Force field (could be anything).

Now for Lys: I can also reproduce Justin's issue with 1AKI and get neutral Lys residues. Strangely for another pdb file I do get protonated LYS. OK, I take that back, I did get protonated LYS one week ago. Not anymore. When I however add Justin's proposed change I do get protonated Lys again, and also the same charge as I get with an OPLS topology.

I have now committed this r2b file:

; rtp residue to rtp building block table
;GMX Force-field
HISD HSD
HISE HSE
HISH HSP
CYSH CYS
LYSN LSN
ASPH ASPP
GLUH GLUP

*note that there is no HIS1 (negative HIS for binding to e.g. Zn) because there is no support in the rtp file.

I'm also closing this bug, but feel free to reopen if necessary.

#10 Updated by Berk Hess over 9 years ago

I only browsed through the mails quickly, but this looks ok to me.
There are some hardcoded names in grompp, but we can't really avoid those
(unless we make a general chemistry package).
All the hard coded names, which have been reduced to the bare minimum,
are listed in the topology section of the manual.

Berk

Also available in: Atom PDF