Stefan Behnel | 8 Feb 14:41
Picon
Favicon
Gravatar

Re: Copying children including text nodes


Martin Aspeli, 06.02.2010 12:46:
> I have two trees that were parsed with the HTML parser. The source tree is:
> 
> <html>
> <head>
> <body>
>      Foo
>      <p>Bar</p>
>      Baz
> </body>
> </html>
> 
> The target is:
> 
> <html>
> <head>
> <body>
>      <div id="target">Placeholder</div>
> </body>
> </html>
> 
> Now, I want to replace the whole of <div id="target"> tag (so, the tag 
> and its children) with the *contents* of the <body> tag in the source 
> tree. I obviously don't want the body tag itself.

parent.replace() doesn't currently support sequence insertion, but I would
expect this to work:

    prev = div_element.getprevious()
    if prev is None:
        target_body[:1] = source_body[:]
        target_body.text = source_body.text # take care of existing text?
    else:
        pos = target_body.index(div_element)
        target_body[pos:pos+1] = source_body[:]
        if prev.tail:
            prev.tail += source_body.text
        else:
            prev.tail = source_body.text

> Performance is important. Also, I don't care about the source tree after 
> I'm done, so if "moving" rather than copying makes things faster/easier, 
> that's OK.

Moving is certainly faster than copying, as copying does at least the same
amount of work, plus the memory allocations. If copying was required, you
could always do a deepcopy of the source content before inserting it.

I can't give any further comments on performance, though. You'll need to do
your own benchmarks (although I'm always interested in the results :)

Stefan

Gmane