2 Jul 2010 12:18
Re: regexp
Eugene Van den Bulke <eugene.vandenbulke <at> gmail.com>
2010-07-02 10:18:38 GMT
2010-07-02 10:18:38 GMT
On Fri, Jul 2, 2010 at 8:13 PM, Stefan Behnel <stefan_ml <at> behnel.de> wrote:
>>>> types = doc.xpath('.//a[ <at> href]/ <at> href') ...
>
> Note that this is redundant, './/a/ <at> href' is enough.
I am discovering XPath as well as you can tell :P
> Personally, I wouldn't even use XPath regular expressions here. I'd rather
> do something like this:
>
> from lxml import html
> import re
>
> parse_type_value = re.compile(r'type=([A-Z]*)').findall
>
> root = html.parse(the_file).getroot()
>
> for el, attr, link, pos in root.iterlinks():
> if 'type=' in link:
> print el.tag, parse_type_value(link)
>
> Note that this will give you all links, not only those in <a> href's. If you
> really only want those, the XPath expression above will do just fine.
>
> Stefan
Thanks !
--
--
EuGeNe -- I lend my books on COlivri http://www.colivri.org/user/eugene, do you?
RSS Feed