pdb2gmx does not protonate correctly for united-atom Gromos
pdb2gmx builds ASPH residues without the HD2 hydrogen for Gromos 54a7. I changed manually the residue name of Asp95 in the PDB file 3GUU from ASP to ASPH and ran pdb2gmx. The result was a ASPH residue name both in the topology and the structure file, but both without the HD2 hydrogen, i.e. ASP structure, not ASPH. Changing His366 to HISA worked correctly, giving a correct HISA residue in the output. Edited PDB file included (3GUU A chain + crystal water, excluding heteroatoms, modified Asp95 and His366).
Version: Gromacs 2018, single precision, Linux x86_64.
Discussed on the mailing list: https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-users/2018-April/119852.html
#2 Updated by Mikko Huhtala over 1 year ago
Realized that the issue cuts both ways: renaming ASP to ASPH in the input will build a ASP topology, and having an actual ASPH with the hydrogen in the input is not recognized. I get the following error:
Atom HD2 in residue ASPH 61 was not found in rtp entry ASP with 9 atoms
while sorting atoms."
ASPH has 10 atoms, not 9. For some reason pdb2gmx tries to build an ASP residue with 9 atoms.
This is when selecting GROMOS96 54a7 in Gromacs 2018.1.
#4 Updated by Mikko Huhtala over 1 year ago
That will work, but avoiding that was my whole point starting the thread on the mailing list. The manual entering of protonation states is a pain, if you need to e.g. input protonation calculated with another tool, or prepare multiple versions of the protein with residue substitutions etc. You want text input that's readable and scriptable.
#5 Updated by Mikko Huhtala over 1 year ago
Work intervened, so I'm adding this comment late. I made another post weeks ago on the mailing list, and Justin Lemkul confirmed that pdb2gmx does behave inconsistently with different force fields. ASH gets built correctly for Amber and Charmm, but ASPH does not for Gromos and OPLS-AA. See
Justin Lemkul commented earlier that building the protonated states according to the force field residue name in the input PDB file is the expected behavior. I can't actually find an explicit statement of that anywhere in Gromacs documentation, but that may be just me. If the user is always expected to enter the protonation states manually, I think there should be at least a mention of it somewhere ("no guarantees on what happens to ASH/ASPH etc." or something to that effect). In any case, I think the current interactive interface is a very bad idea anyway - entering long lists of residues by hand. If you make one mistake, you have to start from the beginning. And the residues are not in the order of sequence, but by type, so using the output of a external protonation calculation tool manually becomes very inconvenient. E.g. my case happened to be a 430-residue protein with 6 non-default protonation states across three different residue types (total > 40 residues). Preparing topologies for a wild type and a couple of mutants means going through the error-prone manual keying three times.