4 May 10:59
[Fwd: Re: (no subject)]
From: Stefan Behnel <stefan_ml <at> behnel.de>
Subject: [Fwd: Re: (no subject)]
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-05-04 09:02:07 GMT
Subject: [Fwd: Re: (no subject)]
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-05-04 09:02:07 GMT
[Forwarding to the list ...] From: <mharper3 <at> uiuc.edu> Stefan -- Thanks so much for the quick response. I did consider that the tree was being built in memory, but the documentation seems to suggest that is not the case. Specifically the language in the tutorial (http://codespeak.net/lxml/tutorial.html) in both the sections 'incremental parsing' and 'event-driven parsing' seem to suggest using iterparse to access without retaining the tree in memory. I see now that the documentation says otherwise for iterparse, as you pointed out. If you don't mind, why does the iterator retain the tree in memory? I would suspect otherwise from the 'natural' behavior of iterators/generators in general, though that may be an invalid assumption. (i.e. I would parse the entire tree into memory if I thought that I had enough memory to do so; otherwise I would _incrementally_ parse it.) More specifically, I don't want to ignore any parts of the xml file in this specific instance, so a ParserTarget is not the correct solution. Your suggestion to use clear() works for me; maybe it should be made explicit in the tutorial that memory is not cleared unless clear() is called. The only mention in the tutorial is iterparse "also allows to clear() or modify the content of an Element to save memory". My mistake was to assume that the 'used' elements would be freed without an explicit call to do so as the iterator progressed. Again, thank you for your quick reply! -- Marc
RSS Feed