_Validator error_log on multiple uses and thread safety
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Confirmed
|
Medium
|
Unassigned |
Bug Description
The _Validator base class used for all validation interfaces (DTD, XMLSchema, etc...) maintains a local error_log, this error_log is not cleared on multiple uses and will result in old error messages being given when using assert_(), assertValid().
I think it is also not limited in size unlike the global error log, so keeping an XMLSchema object for a long time can cause memory to ballon when many bad XMLs are being validated.
In addition, this log is shared between threads, although the FAQ states the XMLSchema (Mysteriously not DTD although considering that they use the same base class, and libxml doesn't seem to have anything to say about thread safety on DTD), objects can be reused between threads, this doesn't seem to be the case, since multiple threads running validation concurrently can mess each others error_log as it is not thread local.
Note the attached example script, the second exception talks about "xyz" which is the string in the first XML while the second XML actually contains the string "abc"
Version info:
Python : sys.version_
lxml.etree : (3, 2, 3, 0)
libxml used : (2, 9, 0)
libxml compiled : (2, 9, 0)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)
Changed in lxml: | |
importance: | Undecided → Medium |
status: | New → Confirmed |
I agree that exceptions should report errors from the proper thread.
However, making the "error_log" property thread specific might lead to funny results. You could then validate a document in one thread, pass the validator on to another for result processing, and it would report completely unrelated messages. Not different from what you currently get when you share it. Might be the lesser of two evils, but still not exactly a proper fix.