Regular expressions in selections
Currently, string keywords in selections (such as "resname") can be matched against regular expressions using POSIX regex if that is available. If it cannot be found during configuration, this support is silently dropped, and this changes the behavior of selections. The selection code also tries to guess whether the input string is a regex or a simple pattern using ? and * wildcards only, and does the matching based on this guess.It would be better that the same syntax for selections would always work the same way, and give an error if regexp support is required but not available. This would require a mechanism to specify whether something needs to be matched using regexps or not. There are (at least) two alternatives:
- Use different types of quotes like VMD does (single quoted strings are matched literally, doubly quoted as regexps, or something similar).
- Use special keyword(s) that switches/forces the matching mode, e.g.,
resname regexp "R[AB]",
resname nowildcards "R*".
Neither is particularly difficult to implement (the second can be done without touching the actual selection parser code, the first requires some simple code to pass the type of quotes as part of strings), so opinions would be welcome.
Add syntax to force selection string matching mode.
The selection syntax for string keyword matching no longer depends on
whether regular expression support is available. Instead, an error is
now given if the string looks like a regexp (the logic for the deduction
is not changed), but regexp support is not available. Added syntax to
force the string matching to use either literal, wildcard, or regexp
This change allows removing a few more direct prints to stderr (related
#2 Updated by Teemu Murtola over 7 years ago
Thanks for the idea. I think that
resname ~ "R[AB]" should be easy enough to implement, and a non-alphanumeric character makes it syntactically a bit less ambiguous. Syntax for forcing plain string matching could then perhaps be
resname = "R*" "RA", although there's some possibility for confusion here (it requires less code to use a single-char symbol than
==, but that one could be used as well)...
- Plain string matching.
- Regexp matching.
- Wildcard matching.
- Automatic detection between regexp and wildcard matching.
The above syntax with either
= to force the first two would leave the syntax
resname "str" for either wildcard matching, or for automatic detection. Using it for automatic detection would leave current behavior unchanged, but then there would be no way of forcing wildcard matching. Not sure though if that would be a big problem. Adding yet another symbol (like
?) for the wildcard match could also be a bit confusing...