Subject: Many thanks to...
Date: Tuesday 29th January 2008 18:16:50 UTC (over 11 years ago)
The lxml developers great team. I just released some days ago openxmllib, a Python library that extracts text and meta-data from OpenXML documents (MS Office 2007, Apple iWork, and some others) for full text indexing purpose. Perhaps more features in the future. http://code.google.com/p/openxmllib/ Got headaches reading and understanding OpenXML docs. Hopefully, lxml is so easy to work with and so fast... The words of a 60 pages Word .docx document is now extracted in 0.2 seconds instead of 8 seconds on my MacBook and I removed 60% of the code volume since I switched from the standard XML libs that come with Python 2.4. lxml rocks and grooves -- Gilles Lenfant [email protected]