4 May 12:18
Re: saving memory with iterparse()
From: Stefan Behnel <stefan_ml <at> behnel.de>
Subject: Re: saving memory with iterparse()
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-05-04 10:18:42 GMT
Subject: Re: saving memory with iterparse()
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-05-04 10:18:42 GMT
Hi, Stefan Behnel wrote: > From: <mharper3 <at> uiuc.edu> > Thanks so much for the quick response. I did consider that the tree was being > built in memory, but the documentation seems to suggest that is not the case. > Specifically the language in the tutorial > (http://codespeak.net/lxml/tutorial.html) in both the sections 'incremental > parsing' and 'event-driven parsing' seem to suggest using iterparse to access > without retaining the tree in memory. It actually says: """ two event-driven parser interfaces, one that generates parser events while building the tree (``iterparse``), and one that does not build the tree at all, and instead calls feedback methods on a target object in a SAX-like fashion. """ but I added a new example now that shows how to save memory. http://codespeak.net/lxml/tutorial.html#event-driven-parsing > If you don't mind, why does the > iterator retain the tree in memory? I would suspect otherwise from the > 'natural' behavior of iterators/generators in general, though that may be an > invalid assumption. [...] > My mistake was to assume that the > 'used' elements would be freed without an explicit call to do so as the > iterator progressed. The question is: how should iterparse() know when you no longer need a subtree? The end event for a parent always comes after the end events of all its children and you might still access the whole subtree when you handle the parent. > (i.e. I would parse the entire tree into memory if I > thought that I had enough memory to do so; otherwise I would _incrementally_ > parse it.) The docs actually use two terms: "incremental parsing" and "event-driven parsing". Incremental parsing is used for feeding data into the parser one chunk at a time, while event-driven parsing means you also get back one parser event at a time. If you have an idea how to present this better, I take patches: http://codespeak.net/svn/lxml/trunk/doc/tutorial.txt Stefan
RSS Feed