16 May 11:28
Re: Efficient methods to build a tree out of HTML structure?
From: Viksit Gaur <vik.list.nutch <at> gmail.com>
Subject: Re: Efficient methods to build a tree out of HTML structure?
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-05-16 09:28:39 GMT
Subject: Re: Efficient methods to build a tree out of HTML structure?
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-05-16 09:28:39 GMT
Hi, Stefan Behnel wrote: > Hi, > > Viksit Gaur wrote: >> 2) Things like iterwalk do return "start" and "end" actions - but >> instead of first doing an iterwalk and then parsing the results, is >> there a better way to construct the tree when iterwalk itself is running? > > I don't understand what you mean here. Are you modifying the tree during the > iteration? Or do you think of some kind of pipelining? Hmm. The problem I face was a method to assign a unique ID to each element on the page. Lets say I construct an iterwalk object. But, during this phase, I would like to not only build the tree, but also add some of my own information to each node (such as a unique ID to each element). I'm not sure how to do this, without extending the etree.so file inside which iterwalk is implemented.. Cheers, Viksit > > Stefan >
RSS Feed