Project

General

Profile

Feature #1511

add PDBx (ie mmcif) support

Added by Erik Marklund over 5 years ago. Updated about 1 year ago.

Status:
Accepted
Priority:
Normal
Assignee:
-
Category:
preprocessing (pdb2gmx,grompp)
Target version:
-
Difficulty:
uncategorized
Close

Description

A suggested feature ([[http://redmine.gromacs.org/issues/902#change-8554]]) for circumventing shortcomings in the pdb format was recently closed, and rightfully so. I suggest integrating/implementing a decent mmCIF parser into gromacs instead. The pdb format will eventually be deprecated by the protein data bank in favour of mmCIF, hence this suggested feature may not be urgent, but it is important.


Related issues

Has duplicate GROMACS - Feature #1950: Support for mmCIFRejected

History

#1 Updated by David van der Spoel over 5 years ago

Didn't you mention that there is NO mmcif parser that works? Or did you find one somewhere yet?

#2 Updated by Erik Marklund over 5 years ago

I think there are those that works, but the ones that do are horribly slow. I've made my own but it's first of all python (can be ported) and can't make use of all things you can find in mmCIF files, such as how to expand the asymmetric unit into biological assemblies.

#3 Updated by Erik Marklund over 5 years ago

That said, there are libraries I haven't yet tried, so with some luck we may be able to integrate something that already exists.

I may need to write a mmCIF parser in C for another application, and if I do I'll be happy to share it if the licenses so permit.

#4 Updated by Mark Abraham over 5 years ago

I thought PDBx was the replacement for PDB (per discussion probably on gmx-developers ~9 months ago) and that there is a C++ API for reading it already available?

#5 Updated by Erik Marklund over 5 years ago

I think mmCIF and PDBx are the same.

If there is a good library around then most of the work is done. I seem to have missed this gmx-developers thread that you mention, but it seems that PDBx/mmCIF was indeed discussed and included in the plan for 5.0. I guess we can close this issue too then.

#6 Updated by Erik Marklund over 5 years ago

  • Status changed from New to Closed

#7 Updated by Mark Abraham over 5 years ago

  • Subject changed from mmcif support to PDBx and/or mmcif support
  • Status changed from Closed to Accepted
  • Assignee deleted (David van der Spoel)

Nothing's actually happened yet

#8 Updated by Mark Abraham over 3 years ago

  • Subject changed from PDBx and/or mmcif support to add PDBx and/or mmcif support
  • Target version deleted (future)

We should indeed do this at some point, preferably by bundling some library

#9 Updated by Mark Abraham over 3 years ago

#10 Updated by Mark Abraham over 3 years ago

  • Subject changed from add PDBx and/or mmcif support to add PDBx (ie mmcif) support

#11 Updated by Marcin Wojdyr about 1 year ago

Hi, I'm developing an mmCIF reading library that you could consider: https://github.com/project-gemmi/gemmi.
It's a header-only C++11 library. The parsing part is based on https://github.com/taocpp/PEGTL and it's faster than other CIF parsers (at least in my hands).

Gemmi is meant to be used in two macromolecular refinement programs (Refmac and BUSTER). There are some similarities between refinement and MD - restraints in refinement are more or less like a force field in MD. From my side it would be interesting to see how this library would fit into an MD program.

The documentation is incomplete but if you'd have any question I'll be happy to help.

#12 Updated by Mark Abraham about 1 year ago

Marcin Wojdyr wrote:

Hi, I'm developing an mmCIF reading library that you could consider: https://github.com/project-gemmi/gemmi.
It's a header-only C++11 library. The parsing part is based on https://github.com/taocpp/PEGTL and it's faster than other CIF parsers (at least in my hands).

Gemmi is meant to be used in two macromolecular refinement programs (Refmac and BUSTER). There are some similarities between refinement and MD - restraints in refinement are more or less like a force field in MD. From my side it would be interesting to see how this library would fit into an MD program.

The documentation is incomplete but if you'd have any question I'll be happy to help.

Wow, looks superb. Using this, adding support to GROMACS for reading the new formats could be quite easy (although too late for the 2019 release, as our beta release feature freeze is barely a week away), but I imagine we could be creative and use the underlying parser functionality to extend support (for our use) to specify other PDB-like formats? I noticed that the intel compiler generates some warnings - might you accept patches that address those?

#13 Updated by Marcin Wojdyr about 1 year ago

Yes, I'd be happy to get patches. Which version of intel compiler do you use? I have a license for old pre-C++11 ICC, but think I could use evaluation version of the latest ICC to check the warnings.

#14 Updated by Mark Abraham about 1 year ago

Marcin Wojdyr wrote:

Yes, I'd be happy to get patches. Which version of intel compiler do you use? I have a license for old pre-C++11 ICC, but think I could use evaluation version of the latest ICC to check the warnings.

We stay pretty up to date, so over time you might get patches on all then-current ones. :-) You can get a license for open source development, but in theory that's not supposed to be used if you're compensated for that development.

#15 Updated by Marcin Wojdyr about 1 year ago

Now it should compile without warnings.

we could be creative and use the underlying parser functionality to extend support (for our use) to specify other PDB-like formats?

Mark: actually I didn't understand this part. What PDB-like formats do you have in mind?

#16 Updated by Mark Abraham about 1 year ago

Marcin Wojdyr wrote:

Now it should compile without warnings.

we could be creative and use the underlying parser functionality to extend support (for our use) to specify other PDB-like formats?

Mark: actually I didn't understand this part. What PDB-like formats do you have in mind?

See http://manual.gromacs.org/documentation/current/user-guide/file-formats.html#structure-files. Both .gro and .g96 are the formats adopted by the GROMOS project (from which GROMACS developed) because they better suited molecular simulation than .pdb or any other format of the day. Both can have velocities, and the latter has higher precision. (For serious trajectory content, we use yet other formats that would be out of scope for gemmi.) The necessary grammar is probably quite simple, but only of interest to people who use GROMACS (quite a few), GROMOS (fewer), and related tools (fewer), so not something I'd ask someone else to do!

Also available in: Atom PDF