lxml

Overview
Code
Bugs
Blueprints
Translations
Answers

Bug #1240696
Comment #10

Comment 10 for bug 1240696

Revision history for this message

Dan Lecocq (q-dan) wrote on 2013-10-21:

#10

No worries! In fact, it's in a second `<head>` element inside of an `<html>` element. Pretty malformed, to be sure. It seems that perhaps it's libxml2 that's not robust against dealing with mal-encoded content because lxml doesn't deal directly with anything with the internal charset declarations.

If so, that's unfortunate, because libxml2 is not a happy place to spend time debugging. I would like to be able to duplicate this using the `libxml2` python interface directly, which would enable us to legitimately file a bug with them and simultaneously prove that it's not in lxml. I took a stab at it and was unable, but perhaps I'll try again.