Task #938

Regular expressions in selections

Added by Teemu Murtola over 8 years ago. Updated almost 7 years ago.

Target version:


Currently, string keywords in selections (such as "resname") can be matched against regular expressions using POSIX regex if that is available. If it cannot be found during configuration, this support is silently dropped, and this changes the behavior of selections. The selection code also tries to guess whether the input string is a regex or a simple pattern using ? and * wildcards only, and does the matching based on this guess.

It would be better that the same syntax for selections would always work the same way, and give an error if regexp support is required but not available. This would require a mechanism to specify whether something needs to be matched using regexps or not. There are (at least) two alternatives:
  1. Use different types of quotes like VMD does (single quoted strings are matched literally, doubly quoted as regexps, or something similar).
  2. Use special keyword(s) that switches/forces the matching mode, e.g., resname regexp "R[AB]", resname nowildcards "R*".

Neither is particularly difficult to implement (the second can be done without touching the actual selection parser code, the first requires some simple code to pass the type of quotes as part of strings), so opinions would be welcome.

Associated revisions

Revision a9beaee9 (diff)
Added by Teemu Murtola over 8 years ago

Add syntax to force selection string matching mode.

The selection syntax for string keyword matching no longer depends on
whether regular expression support is available. Instead, an error is
now given if the string looks like a regexp (the logic for the deduction
is not changed), but regexp support is not available. Added syntax to
force the string matching to use either literal, wildcard, or regexp

This change allows removing a few more direct prints to stderr (related
to #655).

Closes #938.

Change-Id: I7b998050c8b00b5f1229ed23a0a15685c514010f


#1 Updated by Roland Schulz over 8 years ago

I like option 2 better. I think using two different quotes is not very intuitive. I would suggest "resname ~ "R[AB]" if it isn't too difficult. But using "regexp" or "like" instead of "~" is OK too. The only advantage of "~" is that it is shorter.

#2 Updated by Teemu Murtola over 8 years ago

Thanks for the idea. I think that resname ~ "R[AB]" should be easy enough to implement, and a non-alphanumeric character makes it syntactically a bit less ambiguous. Syntax for forcing plain string matching could then perhaps be resname = "R*" "RA", although there's some possibility for confusion here (it requires less code to use a single-char symbol than ==, but that one could be used as well)...

There are now four possible cases that should be somehow distinguished:
  • Plain string matching.
  • Regexp matching.
  • Wildcard matching.
  • Automatic detection between regexp and wildcard matching.

The above syntax with either ~ or = to force the first two would leave the syntax resname "str" for either wildcard matching, or for automatic detection. Using it for automatic detection would leave current behavior unchanged, but then there would be no way of forcing wildcard matching. Not sure though if that would be a big problem. Adding yet another symbol (like * or ?) for the wildcard match could also be a bit confusing...

#3 Updated by Teemu Murtola over 8 years ago

  • Assignee set to Teemu Murtola

Will implement this after I have finished doing modifications to the parser for #655 (which in turn requires #985).

#4 Updated by Teemu Murtola over 8 years ago

  • Status changed from New to Closed

#5 Updated by Teemu Murtola almost 7 years ago

  • Project changed from Next-generation analysis tools to GROMACS
  • Category set to selections

Also available in: Atom PDF