Bug #2954

genion changes residue numbering

Added by Eiso AB 7 days ago. Updated 6 days ago.

Target version:
Affected version - extra info:
Affected version:


Using the following commands to add ions:

gmx grompp -f ions.mdp -c solv.gro -p -o ions.tpr
gmx genion -s ions.tpr -o solv_ions.gro -p -pname NA -nname CL -neutral

My system is the following with protein residue nrs 79-405 and lig 411

[ molecules ]
; Compound #mols
Protein_chain_A 1
lig 1
SOL 35973
NA 4 ; after genion

In solv.gro the residue numbering is still 79-405 (protein),411 (lig)

But in solv_ions.gro the residue numbering changed to 79-405 (protein),406 (lig) so gmx genion doesn't conserve the residue numbering in the .gro file

probably happens when there are gaps in the sequence numbering.


newbox.gro (145 KB) newbox.gro Eiso AB, 05/15/2019 11:56 PM
ions.mdp (896 Bytes) ions.mdp Eiso AB, 05/15/2019 11:56 PM
lig_GMX_1.itp (799 Bytes) lig_GMX_1.itp Eiso AB, 05/15/2019 11:56 PM
lig_GMX_2.itp (12.5 KB) lig_GMX_2.itp Eiso AB, 05/15/2019 11:57 PM (902 KB) Eiso AB, 05/15/2019 11:57 PM
source-this-rc (300 Bytes) source-this-rc commands to reproduce ; source this. Eiso AB, 05/15/2019 11:57 PM


#1 Updated by Paul Bauer 6 days ago

Is this behaviour different to previous versions of GROMACS? I would see this as expected behaviour, but maybe it should be mentioned in the tool help text.

#2 Updated by Eiso AB 6 days ago

Not sure if it is different for this version, but I dont' think it is (or should be) expected behaviour.
There are convenience reasons for having a certain user defined residue numbering ( e.g. easy comparison with sequence aligned homologues, with multimers it's easiest when all monomers start at n*100 + i )

Since the residue nr is just a label IMHO it would be best if by default the user supplied residue numbering is conserved as much as possible. It's annoying if you need to renumber the pdb files again after the md run because you want to compare it to something with a different numbering - much easier if it just keeps it as it is.

Long time ago gmx required that the system was always renumbered from 1 to n 9 (or maybe 0 ..n)
At some point this was changed so now it keeps for the first molecule at least the numbering as it is supplied (79-405), but apparently for the second molecule it doesn't (411->406). It see no obvious reason why it would keep the numbering of one molecule and change it for another - consistent behaviour would be to keep al resnrs as they are.

That said, it's probably easily circumvented by modifying the residue nrs in the final topology file working with that.
(not actually tested).

#3 Updated by Paul Bauer 6 days ago

I see your point and will investigate this.
I quick read through genion did not show anything that would do this, the only place I know of where residues are explicitly renumbered is in genconf.

#4 Updated by Paul Bauer 6 days ago

can you provide your inputs and command line so I can check this in the debugger? :)

#5 Updated by Eiso AB 6 days ago

I've attached files that should allow you to reproduce the problem.
Just put all in one dir and do:

 source source-this-rc

upon further inspection, it's a bit more complicated than I imagined.

first I have to take this back

That said, it's probably easily circumvented by modifying the residue
nrs in the final topology file working with that.
(not actually tested).

unlike what I expected the top/tpr files doesn't seem to explicitly contain residue nrs for all of the system,
so it doesn't have the information to reproduce the residue numbering in the .gro file that was used to run grompp and make the tpr file.

gmx mdrun will also have the same renumbering behaviour since it also only has a tpr to work from.

gmx dump -s ions.tpr |grep resid 
         residue (208):
            residue[0]={name="VAL", nr=17, ic=' '}
            residue[1]={name="GLU", nr=18, ic=' '}
           residue[207]={name="LYS", nr=224, ic=' '}
         residue (1):
            residue[0]={name="LIG", nr=300, ic=' '}
         residue (1):
            residue[0]={name="SOL", nr=1, ic=' '}

Here the residue nr for the protein 18-224 come from the,
the LIG residue nr 300 comes from lig_GMX_2.itp (#included in
and the SOL resnr=1 from the amber99sb-ildn.ff/spce.itp

Since the tpr doesn't contain info about the input .gro file, for the SOL residues it obviously has to make up residue numbers for the multiple SOL residues, and simplest thing is just renumber from the last avaiable residue.
Apparently mdrun/genion gets the residue number for the first molecule from the topology and numbers on from there.
Not sure what happens with topologies with multiple protein molecules with (non)overlapping residue ranges..

I guess in theory it could use the the residue nr from the itp file (nr=300) if it is higher than the last residue but I have to admit that is a bit ugly way to do it.

I still think that conserving input residuenrs would be desirable from a user perspective but I can see it's not an easy fix

I can see a few options to make it easier to get back coordinate files with a certain residue numbering:
  1. use the resnr in the itp if higher than previous (ugly)
  2. include explicit residue info into the tpr file
  3. add an option to genion/mdrun etc (or trjconv!) to accept a reference coordinate file from which it can take correct residue numbering assuming identical atom ordering. That would make it easy to extract pdb file with some arbitrary desirable numbering scheme after an mdrun

The 2nd option (explict resnrs in the tpr(top?) would allow general conservation of resnrs by genion/mdrun and I guess would be the least surprising behaviour for users (at least this one;-)

The 3rd option, I just realized ;-) , is already possible at least for trjconv if you give '-s' an appropriately numbered pdb or gro file !!
turns out I was doing this without realizing it ;-)

Hmm, I guess this makes implementation of the second option less likely ;-(


Also available in: Atom PDF