Quote in doctype systemliteral
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Fix Released
|
Medium
|
Olli Pottonen |
Bug Description
XML 1.0 standard specifies that document type declaration system literal must be of the form
('"' [^"]* '"') | ("'" [^']* "'")
[http://
That is, it is anything that starts and ends with hyphen, or starts and ends with quote.
Especially, it may be a string which starts with hyphen, contains a quote, and end swith hyphen.
That is, these both are valid:
<!DOCTYPE a PUBLIC 'foo' '"'><a/>
<!DOCTYPE a SYSTEM '"'><a/>
However, both cases break lxml:
>>> import lxml.etree
>>> doc = lxml.etree.
>>> doc.docinfo.doctype
u'<!DOCTYPE a PUBLIC "foo" """>'
>>> lxml.etree.
'<!DOCTYPE a PUBLIC "foo" """>\n<a/>'
>>>
>>> doc = lxml.etree.
>>> doc.docinfo.doctype
u'<!DOCTYPE a SYSTEM """>'
>>> lxml.etree.
'<!DOCTYPE a SYSTEM """>\n<a/>'
proposed fix: /github. com/opottone/ lxml/commit/ 711c4eccf90f727 d87d3cbeb9c28fb 326cf2acbd
https:/