16 May 04:57
Efficient methods to build a tree out of HTML structure?
From: Viksit Gaur <vik.list.nutch <at> gmail.com>
Subject: Efficient methods to build a tree out of HTML structure?
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-05-16 02:58:41 GMT
Subject: Efficient methods to build a tree out of HTML structure?
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-05-16 02:58:41 GMT
Hi all, I was wondering - what would be the most efficient method to access all the elements in the DOM tree, in some order, using lxml.etree? The methods I currently see in the docs return a class like ElementDepthfirstIterator or iterwalk, which have 2 issues - 1) The first has a flat representation of the tree, so I lose child/parent structure 2) Things like iterwalk do return "start" and "end" actions - but instead of first doing an iterwalk and then parsing the results, is there a better way to construct the tree when iterwalk itself is running? Or perhaps there is some method I've missed completely? Quick note on what I'm trying to do - graphically represent the DOM structure of a page using a library like networkX.. Cheers, Viksit
RSS Feed