Eugene Van den Bulke | 2 Jul 2010 12:18
Picon
Gravatar

Re: regexp

On Fri, Jul 2, 2010 at 8:13 PM, Stefan Behnel <stefan_ml <at> behnel.de> wrote:
>>>> types = doc.xpath('.//a[ <at> href]/ <at> href') ...
>
> Note that this is redundant, './/a/ <at> href' is enough.

I am discovering XPath as well as you can tell :P

> Personally, I wouldn't even use XPath regular expressions here. I'd rather
> do something like this:
>
>    from lxml import html
>    import re
>
>    parse_type_value = re.compile(r'type=([A-Z]*)').findall
>
>    root = html.parse(the_file).getroot()
>
>    for el, attr, link, pos in root.iterlinks():
>        if 'type=' in link:
>             print el.tag, parse_type_value(link)
>
> Note that this will give you all links, not only those in <a> href's. If you
> really only want those, the XPath expression above will do just fine.
>
> Stefan

Thanks !

--

-- 
EuGeNe -- I lend my books on COlivri http://www.colivri.org/user/eugene, do you?

Gmane