Project

General

Profile

Feature #1221

More generic position mapping for selections

Added by Teemu Murtola over 6 years ago. Updated about 5 years ago.

Status:
Accepted
Priority:
Normal
Assignee:
-
Category:
selections
Target version:
Difficulty:
uncategorized
Close

Description

Currently, in addition to selecting individual atoms, the selection engine allows one to specify, e.g., centers-of-mass of (parts of) residues using syntax like:

res_com of resname RA RB

The analysis tools then get a list of positions (one coordinate for each residue), each of which can consist of multiple atoms. It is also possible to use the same position keywords to influence how atoms are selected based on coordinates:
res_com within 1 of resname RA RB

This selects all atoms that belong to residues whose center-of-mass is within the given distance.

These allow a lot of things, but for generic use (in particular with some coarse-graining approaches) it may be too restrictive/cumbersome to only be able to do the splitting based on residue or molecule boundaries. It would be nice to be able to specify a fully generic mapping of this sort, and have the selection engine evaluate it.

One essential part of implementing this is first specifying what "fully generic mapping" really means, and what needs to be supported. But after there is some clarity for this, the implementation should amount to:
  1. Adding unit tests for the existing functionality in selection/poscalc.*.
  2. Adding support for out-of-order mapping in the functions from selection/indexutil.* that operate on gmx_ana_indexmap_t structures (what exactly needs to be supported depends on the specification) and adding similar support in selection/poscalc.*. The main limitation to lift is that currently, the mapping is specified as a t_blocka with a limitation that the list of atoms that participate in the mapping must be ascending (i.e., if the last atom that participates in position N is M, then the first atom that can participate in position N+1 is M+1).
  3. Adding support for parsing a mapping specification and converting that into data that can be used by selection/poscalc.*. One option would be to use a set of selections for this purpose.
  4. Adding support for custom keywords for specifying the position mapping (i.e., allow one to replace res_com in the above examples with the specified custom mapping) and initializing the position calculation for the selection appropriately in this case.
  5. Add sorting of output atoms for selection keywords that select atoms based on positions (i.e., construct the output atom list based on a set of positions).

Related issues

Related to GROMACS - Task #651: Increase coverage of selection unit testsClosed01/09/2011

Associated revisions

Revision 2504ba9a (diff)
Added by Teemu Murtola about 6 years ago

Add unit tests for selection index mapping.

Add unit tests for parts of indexutil.* that are mainly used by the
selection position calculation engine.

Add required functionality to toputils.* and update typedefs.c to
support more flexible freeing of required data structures.

Part of #651, related to #1221.

Change-Id: Ibc68ad71a4834b991820014969be46152426a9f5

Revision b6a7c327 (diff)
Added by Teemu Murtola about 6 years ago

Add unit tests for selection position calculation.

Add unit tests for most functionality of poscalc.*. These tests also
cover centerofmass.*.

Fixed two issues:
- POS_MASKONLY calculations had an inconsistent group set in the output
positions (which was visible in the selection interface, returning
incorrect set of atoms for the selection).
- Force calculation was incorrect.
These do not affect g_select, so only fixed for 5.0.

Part of #651, related to #1221.

Change-Id: I4dc6475f53fb3b1559bae9296f8c9f3e6dd14bf7

Revision 52f3ce64 (diff)
Added by Teemu Murtola almost 6 years ago

Move gmx_ana_pos_t::g into gmx_ana_indexmap_t.

Instead of storing a gmx_ana_index_t structure by reference in the
positions, embed this information in the index mapping structure, as
that is really what it is required for. This localizes the
responsibility of maintaining that state better, and allows removing
code that was there just to provide such an artificial group structure.
The logic still remains mostly the same: the array of atoms is still
stored by reference instead of copying where possible.

This also makes it simpler to handle cases where the output atoms from
the mapping would actually not equal the input group (e.g., that they
would need to be in a different order).

Prerequisite for #1221.

Change-Id: I9e3e0455d1129fd6c3dd8056b2f088114764f331

History

#1 Updated by Christoph Junghans over 6 years ago

From the coarse-graining perspective, I would say a "fully generic mapping" is a weighted sum of atom positions, maybe even with negative weights. Center of mass, charge, geometry of a molecule or residue are a subset of this.

In VOTCA we define a mapping file in xml format. A dppc lipid as an example can be found here:
https://code.google.com/p/plumx/source/browse/dppc_mapping/dppc.xml
We have beads, which are made of a set of atoms. The position of the beads are determine by a certain mapping. Though different beads can have the same mapping. The beads have types, which can be used for analysis later. Additionally the file contains topology information (bead, angle) for the coarse-grained system. We had in mind that the mapping file defines how to create cg topology out of an atomistic one.

#2 Updated by Teemu Murtola over 6 years ago

Is it necessary to support the same atom participating in multiple beads (e.g., with a 0.5 weight for both)? For full generality, I would say yes, but that may complicate things a lot. This may, e.g., lead to unintuitive behaviour, where selecting one atom actually produces two positions.

Another question is that is it sufficient to specify the mapping as either molecule-specific or residue-specific, or is there need for more? So is it sufficient to define "this set of atoms, when split over molecules/residues, produces one position per molecule/residue"?

#3 Updated by Teemu Murtola over 6 years ago

Algorithmically, the main challenge in implementing this is that given an arbitrary sorted list of atoms, and a data structure that contains the static mapping (currently, a t_blocka structure), one should be able to quickly partition that list of atoms into the "beads". The current implementation has linear complexity both in the number of input atoms and in the size of the static mapping, and that would be nice to keep.

#4 Updated by Christoph Junghans over 6 years ago

Teemu Murtola wrote:

Is it necessary to support the same atom participating in multiple beads (e.g., with a 0.5 weight for both)? For full generality, I would say yes, but that may complicate things a lot. This may, e.g., lead to unintuitive behaviour, where selecting one atom actually produces two positions.

That would be good, some mappings for polymers actually have shared atoms. (VOTCA supports that as well.)

Another question is that is it sufficient to specify the mapping as either molecule-specific or residue-specific, or is there need for more? So is it sufficient to define "this set of atoms, when split over molecules/residues, produces one position per molecule/residue"?

Mutliple molecules could be useful as well. Martini maps 3 water in one cg bead. (VOTCA doesn't allow multi-molecule mappings and needs one mapping file per molecule-type).

#5 Updated by Teemu Murtola over 6 years ago

  • Description updated (diff)

Christoph Junghans wrote:

Teemu Murtola wrote:

Is it necessary to support the same atom participating in multiple beads (e.g., with a 0.5 weight for both)? For full generality, I would say yes, but that may complicate things a lot. This may, e.g., lead to unintuitive behaviour, where selecting one atom actually produces two positions.

That would be good, some mappings for polymers actually have shared atoms. (VOTCA supports that as well.)

Ok, but that does complicate things. I'll think whether it is reasonable to implement a single algorithm that supports all of this, or whether this kind of mapping would be better treated as a special case (so that the other cases don't suffer from the extra complexity). Suggestions are also welcome, in particular for a memory-efficient data structure to hold this kind of a mapping that would support fast lookup for all atoms in a sorted list. If you are interested, the code is in selection/poscalc.* (the actual calculation is towards the end of the file) and in selection/indexutil.* (all routines that handle gmx_ana_indexmap_t).

Another question is that is it sufficient to specify the mapping as either molecule-specific or residue-specific, or is there need for more? So is it sufficient to define "this set of atoms, when split over molecules/residues, produces one position per molecule/residue"?

Mutliple molecules could be useful as well. Martini maps 3 water in one cg bead. (VOTCA doesn't allow multi-molecule mappings and needs one mapping file per molecule-type).

The kind of mappings that Martini uses are a very different problem, since the constituents of the beads are not static. It is possible to have a custom implementation (i.e., completely separate code) that does the mapping, but don't know how easy that is to make generic. That should be possible to implement already in the current selection engine as long as you are fine with first selecting the atoms, and then doing the mapping, rather than selecting from the mapped positions. If the more generic static mappings discussed here get implemented, it should also be easy to allow using this kind of custom dynamic mappings to be used for selecting atoms. There can be some cases that don't work currently with this kind of dynamic mappings, but those are probably easily solved. The main problem is that it can be very difficult to implement such a dynamic mapping such that it provides any meaningful dynamics (i.e., that positions between different frames have some relation to each other). While not a problem for all uses, it does limit the usefulness of this kind of selections.

In general, would you (or someone else) be interested in contributing to the implementation?

On an unrelated note, updated the description and implementation steps with a bit more details and added an extra step that I initially forgot.

#6 Updated by Teemu Murtola over 6 years ago

  • Status changed from New to Accepted

#7 Updated by Teemu Murtola about 6 years ago

Changes up to https://gerrit.gromacs.org/#/c/2360/ should implement the basic unit tests (first point from the description) for the position calculation functionality.

#8 Updated by Teemu Murtola about 5 years ago

  • Project changed from Next-generation analysis tools to GROMACS
  • Category set to selections
  • Target version set to future

Also available in: Atom PDF