No worries! In fact, it's in a second `<head>` element inside of an `<html>` element. Pretty malformed, to be sure. It seems that perhaps it's libxml2 that's not robust against dealing with mal-encoded content because lxml doesn't deal directly with anything with the internal charset declarations.
If so, that's unfortunate, because libxml2 is not a happy place to spend time debugging. I would like to be able to duplicate this using the `libxml2` python interface directly, which would enable us to legitimately file a bug with them and simultaneously prove that it's not in lxml. I took a stab at it and was unable, but perhaps I'll try again.
No worries! In fact, it's in a second `<head>` element inside of an `<html>` element. Pretty malformed, to be sure. It seems that perhaps it's libxml2 that's not robust against dealing with mal-encoded content because lxml doesn't deal directly with anything with the internal charset declarations.
If so, that's unfortunate, because libxml2 is not a happy place to spend time debugging. I would like to be able to duplicate this using the `libxml2` python interface directly, which would enable us to legitimately file a bug with them and simultaneously prove that it's not in lxml. I took a stab at it and was unable, but perhaps I'll try again.