_Validator error_log on multiple uses and thread safety

Bug #1222132 reported by Someone on 2013-09-07
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
lxml
Medium
Unassigned

Bug Description

The _Validator base class used for all validation interfaces (DTD, XMLSchema, etc...) maintains a local error_log, this error_log is not cleared on multiple uses and will result in old error messages being given when using assert_(), assertValid().

I think it is also not limited in size unlike the global error log, so keeping an XMLSchema object for a long time can cause memory to ballon when many bad XMLs are being validated.

In addition, this log is shared between threads, although the FAQ states the XMLSchema (Mysteriously not DTD although considering that they use the same base class, and libxml doesn't seem to have anything to say about thread safety on DTD), objects can be reused between threads, this doesn't seem to be the case, since multiple threads running validation concurrently can mess each others error_log as it is not thread local.

Note the attached example script, the second exception talks about "xyz" which is the string in the first XML while the second XML actually contains the string "abc"

Version info:
Python : sys.version_info(major=2, minor=7, micro=5, releaselevel='final', serial=0)
lxml.etree : (3, 2, 3, 0)
libxml used : (2, 9, 0)
libxml compiled : (2, 9, 0)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)

Someone (temp4746) wrote :
scoder (scoder) on 2014-02-20
Changed in lxml:
importance: Undecided → Medium
status: New → Confirmed
scoder (scoder) wrote :

I agree that exceptions should report errors from the proper thread.

However, making the "error_log" property thread specific might lead to funny results. You could then validate a document in one thread, pass the validator on to another for result processing, and it would report completely unrelated messages. Not different from what you currently get when you share it. Might be the lesser of two evils, but still not exactly a proper fix.

scoder (scoder) wrote :

I pushed a fix for the case of repeated validation here:

https://github.com/lxml/lxml/commit/4c748a9f8bb97a20f0f6948f3a426f02137f6d6d

The same applied to RelaxNG and Schematron. DTDs were ok. It'll go into lxml 3.3.x.

Note that this change does not fix the multithreading case, which would be a backwards incompatible change. I'm therefore leaving the ticket open for later reconsideration.

scoder (scoder) wrote :

"error_log" resetting is fixed in lxml 3.3.2.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers