Gilles Lenfant | 29 Jan 19:14
Gravatar

Many thanks to...

The lxml developers great team.

I just released some days ago openxmllib, a Python library that  
extracts text and meta-data from OpenXML documents (MS Office 2007,  
Apple iWork, and some others) for full text indexing purpose. Perhaps  
more features in the future.

http://code.google.com/p/openxmllib/

Got headaches reading and understanding OpenXML docs. Hopefully, lxml  
is so easy to work with and so fast...

The words of a 60 pages Word .docx document is now extracted in 0.2  
seconds instead of 8 seconds on my MacBook and I removed 60% of the  
code volume since I switched from the standard XML libs that come with  
Python 2.4.

lxml rocks and grooves
--

-- 
Gilles Lenfant
gilles.lenfant <at> gmail.com

Gmane