make default selections suitable for DNA and RNA
At BioExcel Summer School, Ray Angana suggested that when make_ndx (and/or select) encounter DNA and RNA that they generate default selections like we currently do for protein. The suggestions are (for both dna and rna)
- sugar (which contains 2 atoms of the backbone)
- base (disjoint from both)
- without hydrogens (like Protein-H)
for all residues in both strands. We don't think there's value in default groups for the two separate strands (or decompositions from them for backbone, sugar, base).
Also, some PDB inputs will name DNA adenine ADE or DA, similarly for others. In RNA however adenine is sometimes RA (from AMBER), A or ADE (in CHARMM). This should be regularized, or at least things should recognise multiple schemes.
Canonical definitions found at http://jenalib.leibniz-fli.de/ImgLibDoc/nana/IMAGE_NANA.html so that we can follow established standards. Also look up the PDB definitions and conventions so we get that right. Nucleic acid database exists as analogue to PDB http://ndbserver.rutgers.edu/, so we should see what standards they promulgate.
Modified bases exist, C5M is 5-methyl cytosine and will have more atoms on the sugar part (ie that methyl).
Terminating residues like RC5 and RC3 change the atoms that should form part of the backbone selection, so do that right.
Backbones can also be modified. Perhaps the backbone selection should be generated by subtraction - first remove the base and the sugar.
Currently it is best to generate such default groups from .gro files (likely that's because things just work better after pdb2gmx regularization). Maybe the default selection generation code should re-use the functionality in pdb2gmx for translating naming? Passing in raw structures from the PDB does not work reliably at the moment.