exception in start handler of HTML parser target ignored

Bug #1497051 reported by Steve Randall on 2015-09-18
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Undecided
Unassigned

Bug Description

When using a custom target with the lxml.etree.HTMLParser, an exception from the 'start' handler is ignored instead of terminating the parse. This does not affect other handlers, nor does it affect the XML parser.

Python : sys.version_info(major=3, minor=4, micro=3, releaselevel='final', serial=0)
lxml.etree : (3, 4, 4, 0)
libxml used : (2, 9, 2)
libxml compiled : (2, 9, 2)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)

scoder (scoder) wrote :

Could you provide a test case?

Steve Randall (srandall52-o) wrote :

Yes, it's more complicated than I thought.

XMLParser stops parsing at the first error, calls the target's close, then re-raises the exception.

HTMLParser completes parsing, calls the target's close, then re-raises the *last* exception, which hides the real problem.

I think it's now clear how I can work around this.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments