Project

General

Profile

Bug #1600

editconf merges atoms of distinct residues - insertion codes

Added by Steffen Möller almost 3 years ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
preprocessing (pdb2gmx,grompp)
Target version:
Affected version - extra info:
Affected version:
Difficulty:
uncategorized
Close

Description

I ran into insertion codes with Rosetta Antibody [http://rosie.rosettacommons.org/antibody], which threads sequences of light and heavy chains of variable regions into structures. Here, the resid numbering refers to the Cothia numbering of immunoglobulins and at times some residues need to be squeezed in, i.e. insertion codes are introduced to make up for some extra numbers. The .gro file format renders it difficult to reverse-transform back - but it would be possible. On this, editconf does worse than just omitting the insertion codes - it errs on the redidue names.

The translation from PDB-With-Insertion-Code to .gro works fine, but that .gro file does not show the insertion code:

(Excerpt from) PDB file (generated by Rosetta Antibody):

ATOM    339 2HB  ASP L  29      20.358  18.713  28.883  1.00  0.00           H
ATOM 340 N VAL L 30 17.005 16.702 28.750 1.00 0.00 N
ATOM 341 CA VAL L 30 15.882 15.815 29.057 1.00 0.00 C
ATOM 342 C VAL L 30 15.202 15.283 27.774 1.00 0.00 C
ATOM 343 O VAL L 30 14.781 14.129 27.710 1.00 0.00 O
ATOM 344 CB VAL L 30 14.845 16.561 29.917 1.00 0.00 C
ATOM 345 CG1 VAL L 30 13.602 15.706 30.114 1.00 0.00 C
ATOM 346 CG2 VAL L 30 15.458 16.939 31.256 1.00 0.00 C
ATOM 347 H VAL L 30 16.992 17.650 29.098 1.00 0.00 H
ATOM 348 HA VAL L 30 16.259 14.967 29.629 1.00 0.00 H
ATOM 349 HB VAL L 30 14.534 17.464 29.391 1.00 0.00 H
ATOM 350 1HG1 VAL L 30 12.879 16.250 30.723 1.00 0.00 H
ATOM 351 2HG1 VAL L 30 13.160 15.478 29.144 1.00 0.00 H
ATOM 352 3HG1 VAL L 30 13.875 14.779 30.617 1.00 0.00 H
ATOM 353 1HG2 VAL L 30 14.719 17.467 31.858 1.00 0.00 H
ATOM 354 2HG2 VAL L 30 15.774 16.036 31.779 1.00 0.00 H
ATOM 355 3HG2 VAL L 30 16.321 17.585 31.093 1.00 0.00 H
ATOM 356 N GLY L 30A 15.111 16.134 26.752 1.00 0.00 N
ATOM 357 CA GLY L 30A 14.593 15.750 25.445 1.00 0.00 C
ATOM 358 C GLY L 30A 15.444 14.795 24.600 1.00 0.00 C
ATOM 359 O GLY L 30A 14.918 14.122 23.694 1.00 0.00 O
ATOM 360 H GLY L 30A 15.418 17.084 26.900 1.00 0.00 H
ATOM 361 1HA GLY L 30A 13.621 15.273 25.563 1.00 0.00 H
ATOM 362 2HA GLY L 30A 14.440 16.643 24.839 1.00 0.00 H
ATOM 363 N GLY L 30B 16.803 14.661 24.906 1.00 0.00 N
ATOM 364 CA GLY L 30B 17.657 13.737 24.175 1.00 0.00 C
ATOM 365 C GLY L 30B 18.254 12.550 24.909 1.00 0.00 C
ATOM 366 O GLY L 30B 18.752 11.616 24.276 1.00 0.00 O
ATOM 367 H GLY L 30B 17.201 15.214 25.651 1.00 0.00 H
ATOM 368 1HA GLY L 30B 17.101 13.313 23.338 1.00 0.00 H
ATOM 369 2HA GLY L 30B 18.504 14.278 23.756 1.00 0.00 H
ATOM 370 N TYR L 30C 18.233 12.596 26.241 1.00 0.00 N
ATOM 371 CA TYR L 30C 18.729 11.501 27.083 1.00 0.00 C
ATOM 372 C TYR L 30C 17.674 11.059 28.071 1.00 0.00 C
ATOM 373 O TYR L 30C 16.943 11.892 28.616 1.00 0.00 O
ATOM 374 CB TYR L 30C 19.990 11.922 27.862 1.00 0.00 C
ATOM 375 CG TYR L 30C 21.047 12.355 26.904 1.00 0.00 C
ATOM 376 CD1 TYR L 30C 21.140 13.690 26.510 1.00 0.00 C
ATOM 377 CD2 TYR L 30C 21.889 11.416 26.300 1.00 0.00 C
ATOM 378 CE1 TYR L 30C 22.079 14.095 25.577 1.00 0.00 C
ATOM 379 CE2 TYR L 30C 22.838 11.815 25.354 1.00 0.00 C
ATOM 380 CZ TYR L 30C 22.924 13.148 25.006 1.00 0.00 C
ATOM 381 OH TYR L 30C 23.871 13.556 24.091 1.00 0.00 O
ATOM 382 H TYR L 30C 17.857 13.423 26.682 1.00 0.00 H
ATOM 383 HA TYR L 30C 18.991 10.661 26.438 1.00 0.00 H
ATOM 384 1HB TYR L 30C 19.741 12.735 28.546 1.00 0.00 H
ATOM 385 2HB TYR L 30C 20.343 11.085 28.463 1.00 0.00 H
ATOM 386 HD1 TYR L 30C 20.468 14.433 26.940 1.00 0.00 H
ATOM 387 HD2 TYR L 30C 21.808 10.362 26.565 1.00 0.00 H
ATOM 388 HE1 TYR L 30C 22.138 15.143 25.282 1.00 0.00 H
ATOM 389 HE2 TYR L 30C 23.500 11.076 24.900 1.00 0.00 H
ATOM 390 HH TYR L 30C 24.256 12.787 23.667 1.00 0.00 H
ATOM 391 N ASN L 31 17.622 9.751 28.303 1.00 0.00 N

GRO (by pdb2gmx)

29ASP      O  340   1.807   1.512   2.748
30VAL N 341 1.699 1.665 2.873
30VAL H 342 1.699 1.758 2.910
30VAL CA 343 1.588 1.575 2.904
30VAL HA 344 1.630 1.499 2.954
30VAL CB 345 1.485 1.649 2.992
30VAL HB 346 1.455 1.732 2.946
30VAL CG1 347 1.362 1.561 3.013
30VAL HG11 348 1.296 1.610 3.070
30VAL HG12 349 1.320 1.540 2.925
30VAL HG13 350 1.388 1.477 3.059
30VAL CG2 351 1.547 1.687 3.125
30VAL HG21 352 1.480 1.734 3.182
30VAL HG22 353 1.578 1.604 3.172
30VAL HG23 354 1.626 1.747 3.109
30VAL C 355 1.519 1.523 2.776
30VAL O 356 1.475 1.408 2.770
30GLY N 357 1.510 1.608 2.674
30GLY H 358 1.541 1.702 2.688
30GLY CA 359 1.459 1.569 2.543
30GLY HA1 360 1.369 1.528 2.559
30GLY HA2 361 1.447 1.655 2.492
30GLY C 362 1.543 1.473 2.459
30GLY O 363 1.489 1.399 2.375
30GLY N 364 1.681 1.466 2.483
30GLY H 365 1.722 1.528 2.550
30GLY CA 366 1.765 1.371 2.413
30GLY HA1 367 1.709 1.333 2.339
30GLY HA2 368 1.840 1.425 2.373
30GLY C 369 1.827 1.255 2.490
30GLY O 370 1.883 1.164 2.429
30TYR N 371 1.819 1.261 2.623
30TYR H 372 1.777 1.341 2.665
30TYR CA 373 1.868 1.154 2.710
30TYR HA 374 1.890 1.079 2.648
30TYR CB 375 1.993 1.199 2.789
30TYR HB1 376 1.970 1.278 2.847
30TYR HB2 377 2.027 1.124 2.846
30TYR CG 378 2.100 1.240 2.693
30TYR CD1 379 2.111 1.372 2.651
30TYR HD1 380 2.048 1.441 2.687
30TYR CD2 381 2.185 1.144 2.637
30TYR HD2 382 2.176 1.048 2.664
30TYR CE1 383 2.206 1.410 2.558
30TYR HE1 384 2.214 1.505 2.529
30TYR CE2 385 2.282 1.181 2.543
30TYR HE2 386 2.343 1.112 2.504
30TYR CZ 387 2.291 1.313 2.504
30TYR OH 388 2.388 1.351 2.413
30TYR HH 389 2.440 1.271 2.385
30TYR C 390 1.762 1.111 2.808
30TYR O 391 1.686 1.194 2.859
31ASN N 392 1.759 0.980 2.835

One already senses the loss of the insertion code.
PDB reverse-transformed by editconf sees only VALines (bug, bug, bug!)

ATOM    340  O   ASP    29      56.440  60.740  12.330  1.00  0.00
ATOM 341 N VAL 30 55.300 62.140 13.670 1.00 0.00
ATOM 342 H VAL 30 55.310 63.110 14.040 1.00 0.00
ATOM 343 CA VAL 30 54.130 61.320 13.970 1.00 0.00
ATOM 344 HA VAL 30 54.500 60.420 14.480 1.00 0.00
ATOM 345 CB VAL 30 53.180 62.090 14.930 1.00 0.00
ATOM 346 HB VAL 30 52.880 63.010 14.420 1.00 0.00
ATOM 347 CG1 VAL 30 51.880 61.340 15.300 1.00 0.00
ATOM 348 1HG1 VAL 30 51.250 61.960 15.940 1.00 0.00
ATOM 349 2HG1 VAL 30 51.290 61.070 14.440 1.00 0.00
ATOM 350 3HG1 VAL 30 52.100 60.430 15.860 1.00 0.00
ATOM 351 CG2 VAL 30 53.880 62.490 16.250 1.00 0.00
ATOM 352 1HG2 VAL 30 53.230 63.110 16.860 1.00 0.00
ATOM 353 2HG2 VAL 30 54.140 61.610 16.830 1.00 0.00
ATOM 354 3HG2 VAL 30 54.790 63.060 16.080 1.00 0.00
ATOM 355 C VAL 30 53.420 60.840 12.680 1.00 0.00
ATOM 356 O VAL 30 52.810 59.770 12.680 1.00 0.00
ATOM 357 N VAL 30 53.520 61.560 11.560 1.00 0.00
ATOM 358 H VAL 30 54.010 62.450 11.610 1.00 0.00
ATOM 359 CA VAL 30 52.810 61.240 10.320 1.00 0.00
ATOM 360 HA1 VAL 30 51.870 60.730 10.510 1.00 0.00
ATOM 361 HA2 VAL 30 52.560 62.180 9.820 1.00 0.00
ATOM 362 C VAL 30 53.630 60.400 9.350 1.00 0.00
ATOM 363 O VAL 30 53.090 60.000 8.320 1.00 0.00
ATOM 364 N VAL 30 54.870 60.080 9.710 1.00 0.00
ATOM 365 H VAL 30 55.240 60.470 10.560 1.00 0.00
ATOM 366 CA VAL 30 55.730 59.160 8.970 1.00 0.00
ATOM 367 HA1 VAL 30 55.290 58.830 8.030 1.00 0.00
ATOM 368 HA2 VAL 30 56.660 59.670 8.720 1.00 0.00
ATOM 369 C VAL 30 56.060 57.950 9.840 1.00 0.00
ATOM 370 O VAL 30 55.950 56.800 9.390 1.00 0.00
ATOM 371 N VAL 30 56.430 58.180 11.100 1.00 0.00
ATOM 372 H VAL 30 56.450 59.140 11.450 1.00 0.00
ATOM 373 CA VAL 30 56.950 57.160 12.010 1.00 0.00
ATOM 374 HA VAL 30 57.240 56.270 11.460 1.00 0.00
ATOM 375 CB VAL 30 58.200 57.740 12.710 1.00 0.00
ATOM 376 HB1 VAL 30 57.950 58.650 13.260 1.00 0.00
ATOM 377 HB2 VAL 30 58.570 57.030 13.450 1.00 0.00
ATOM 378 CG VAL 30 59.300 58.030 11.710 1.00 0.00
ATOM 379 CD1 VAL 30 59.470 59.330 11.210 1.00 0.00
ATOM 380 HD1 VAL 30 58.890 60.160 11.590 1.00 0.00
ATOM 381 CD2 VAL 30 60.050 56.970 11.160 1.00 0.00
ATOM 382 HD2 VAL 30 59.920 55.960 11.520 1.00 0.00
ATOM 383 CE1 VAL 30 60.370 59.580 10.150 1.00 0.00
ATOM 384 HE1 VAL 30 60.490 60.580 9.770 1.00 0.00
ATOM 385 CE2 VAL 30 60.960 57.210 10.110 1.00 0.00
ATOM 386 HE2 VAL 30 61.520 56.390 9.690 1.00 0.00
ATOM 387 CZ VAL 30 61.110 58.520 9.600 1.00 0.00
ATOM 388 OH VAL 30 61.990 58.760 8.590 1.00 0.00
ATOM 389 HH VAL 30 62.450 58.000 8.280 1.00 0.00
ATOM 390 C VAL 30 55.860 56.810 13.020 1.00 0.00

which then again makes rightous trouble if attempted to transform from that .PDB to .GRO again:

Program pdb2gmx, VERSION 4.6.3
Source code file: /homeLvm/moeller/alioth/debichem/unstable/gromacs-4.6.3/src/kernel/pdb2gmx.c, line: 727
Fatal error:
Atom HA1 in residue VAL 30 was not found in rtp entry VAL with 16 atoms
while sorting atoms.

The transition from PDB (1) to GRO (2) and back to PDB (3) to then
go to GRO (4) again was meant to eliminate the need to manually edit
the TRP file after adding ions to the solutions after (2) with genion.

To add "-resnr" as an argument to editconf does not help. The issue
was found with both 4.6.3 and 5.0.1

The insertion code is apparently used throughout
http://www3.imperial.ac.uk/bioinfsupport/help/pdb_format
http://deposit.rcsb.org/adit/docs/pdb_atom_format.html
PDB entries to distinguish residues. For my application, it is just
fine as I do not need the Cothia numbers and can rename upfront,
but for more complicated PDB entries, with referrals between lines,
one does not want to mess with it all. And no tool of gromacs should
produce wrong residue names, obviously.

Associated revisions

Revision a9b36c5a (diff)
Added by Erik Lindahl about 2 years ago

Fix gro errors with PDB insertion codes

The insertion codes are discarded when converting to GRO files,
which results in adjacent residues with different names but
identical numbers. The reading code has been altered to identify
new residues also when the resname changes; this does not fix
the duplicate numbers (it cannot be fixed in GRO files), but it
will correctly propagate all data so the correct labels can
be recovered by using gmx trjconv with the original PDB file for
the -s argument.

Fixes #1600.

Change-Id: Iaf79f3f9e548e8555d78cb39e869410aa8186029

History

#1 Updated by Erik Lindahl about 2 years ago

  • Status changed from New to Rejected

The GRO file format itself is a very simple one that is not sufficient to describe all the information present in a PDB file. However, Gromacs will store all this stuff in the topology (and then the binary TPR file) when you run pdb2gmx, and if you use trjconv and provide that TPR file Gromacs will happily write all the detailed stuff that was originally present in the PDB file.

For editconf we try to keep all the information we can, but ultimately we can't solve the problem of writing from a "fat" to a "lean" file format and then hoping to get the information back by converting back to a fat format.

#2 Updated by Erik Lindahl about 2 years ago

  • Status changed from Rejected to Accepted

Or actually, my bad - I was reading a bit too fast. This is indeed a bug.

#3 Updated by Peter Kasson about 2 years ago

Just to clarify, acceptably correct behavior would be to lose the insertion codes but to recognize (residue number, residue ID) pairs as distinct residues?

#4 Updated by Erik Lindahl about 2 years ago

  • Status changed from Accepted to In Progress
  • Assignee set to Erik Lindahl

#5 Updated by Gerrit Code Review Bot about 2 years ago

Gerrit received a related patchset '1' for Issue #1600.
Uploader: Erik Lindahl ()
Change-Id: Iaf79f3f9e548e8555d78cb39e869410aa8186029
Gerrit URL: https://gerrit.gromacs.org/4751

#6 Updated by Erik Lindahl about 2 years ago

  • Status changed from In Progress to Fix uploaded

#7 Updated by Erik Lindahl about 2 years ago

The GRO format only allows integer residue numbers, so we cannot represent the insertion code names correctly. However, Gromacs internally does not care about the exact residue number, so the above fix merely makes sure we keep all lines when reading such GRO files, and keep all the residue names even if the number is the same.

Note that this will result in several residues with number "30" in your output in this case. However, instead of using editconf for the final step you can then use trjconv, which will be happy to use all the labels and other special PDB information from the original PDB if you provide that for the -s argument to gmx trjconv.

#8 Updated by Rossen Apostolov about 2 years ago

  • Status changed from Fix uploaded to Closed

#9 Updated by Mark Abraham about 1 year ago

  • Target version changed from 5.x to 5.0.7

Also available in: Atom PDF