2 May 19:14
threading fixed :)
From: Stefan Behnel <stefan_ml <at> behnel.de>
Subject: threading fixed :)
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-05-02 17:16:34 GMT
Subject: threading fixed :)
Newsgroups: gmane.comp.python.lxml.devel
Date: 2008-05-02 17:16:34 GMT
Hi,
there has been a long-standing issue in the threading support in lxml,
combined with the per-thread string hash table we use for libxml2.
Here is a simple example of a sure crasher:
-------------------------------
import threading
import lxml.etree as et
xml = "<root><threadtag/></root>"
main_root = et.XML("<root/>")
def run_thread():
thread_root = et.XML(xml)
main_root.append(thread_root[0])
del thread_root # deletes the document
thread = threading.Thread(target=run_thread)
thread.start()
thread.join()
print et.tostring(main_root)
-------------------------------
This crashes, because the thread parses the XML fragment into its own
dictionary and stores the tag name "threadtag" there. Then it appends the
"threadtag" element to a tree in the main program, which uses a different
dict. When it deletes the "thread_root", the document will be deleted as well,
and the (ref-counted) thread dictionary that contains the string "threadtag"
will be freed when the thread terminates. The main program then crashes when
it accesses the no longer available tag name in the corrupted document.
The solution I came up with today is actually quite simple. We have to
traverse the subtree anyway to update the document references and to fix the
namespace declarations. So it's only one step more to also fix the name
pointers by looking them up in the target dictionary and re-assigning the
names. This is only required when we really have two different dicts, which is
easy to decide. So there isn't even a performance impact if you only use a
single thread or if you do not move subtrees between threads. And the added
overhead when you need this is really small.
I will release a new beta of 2.1 soon that will have this change, and it would
be very helpful if people who currently use threaded code that exchanges (i.e.
deep copies) tree fragments between threads could check if this works for them
(i.e. if code that crashes under 2.0 if you remove the deep copying works
under 2.1). If it proves to fix the problem, I will backport it to 2.0 also.
Read: the more feedback I get, the faster this will be fixed in 2.0. :)
Stefan
RSS Feed