2 Mar 2010 23:32
Re: Matching multiple regex patterns simultaneously
On 03/02/2010 09:39 PM, Andrey Fedorov wrote: > So a couple of libraries (Django being the most popular that comes to > mind) try to match a string against several regex expressions. I'm > wondering if there exists a library to "merge" multiple compiled regex > expressions into a single lookup. This could be exposed in a interface like: > > http://gist.github.com/319905 > > > So for an example: > > rd = ReDict() > > rd['^foo$'] = 1 > rd['^bar*$'] = 2 > rd['^bar$'] = 3 > > assert rd['foo'] == [1] > assert rd['barrrr'] == [2] > assert rd['bar'] == [2,3] > > The naive implementation I link is obviously inefficient. What would be > the easiest way to go about compiling a set of regex-es together, so > that they can be matched against a string at the same time? Are there > any standard libraries that do this I'm not aware of? > > Cheers, > Andrey > You can do something like this: r=re.compile('(?P<a>^foo$)|(?P<b>(?P<c>^bar)r*$)') >>> r.match('barrrr').groupdict() {'a': None, 'c': 'bar', 'b': 'barrrr'} >>> r.match('bar').groupdict() {'a': None, 'c': 'bar', 'b': 'bar'} >>> r.match('foo').groups() ('foo', None, None) Ok, it's not 100% the same (it does not match 'ba'), but I think this should cover most cases where you want something like this. Hmm, well. You should resolve it to a form where there are no overlappings in the subexpressions: (?P<a>^foo$)|(?P<b>^ba$)|(?P<c>^bar$)|(?P<d>^bar+$) -panzi
RSS Feed