Quote in doctype systemliteral

Bug #1421927 reported by Olli Pottonen on 2015-02-14
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Medium
Olli Pottonen

Bug Description

XML 1.0 standard specifies that document type declaration system literal must be of the form

('"' [^"]* '"') | ("'" [^']* "'")

[http://www.w3.org/TR/REC-xml/#sec-prolog-dtd]

That is, it is anything that starts and ends with hyphen, or starts and ends with quote.
Especially, it may be a string which starts with hyphen, contains a quote, and end swith hyphen.
That is, these both are valid:

<!DOCTYPE a PUBLIC 'foo' '"'><a/>
<!DOCTYPE a SYSTEM '"'><a/>

However, both cases break lxml:
>>> import lxml.etree
>>> doc = lxml.etree.XML('''<!DOCTYPE a PUBLIC 'foo' '"'><a/>''').getroottree()
>>> doc.docinfo.doctype
u'<!DOCTYPE a PUBLIC "foo" """>'
>>> lxml.etree.tostring(doc)
'<!DOCTYPE a PUBLIC "foo" """>\n<a/>'
>>>
>>> doc = lxml.etree.XML('''<!DOCTYPE a SYSTEM '"'><a/>''').getroottree()
>>> doc.docinfo.doctype
u'<!DOCTYPE a SYSTEM """>'
>>> lxml.etree.tostring(doc)
'<!DOCTYPE a SYSTEM """>\n<a/>'

scoder (scoder) wrote :
Changed in lxml:
importance: Undecided → Medium
milestone: none → 3.4
status: New → In Progress
scoder (scoder) wrote :
Changed in lxml:
assignee: nobody → Olli Pottonen (olli-pottonen)
milestone: 3.4 → 3.5
status: In Progress → Fix Committed
scoder (scoder) wrote :

Fixed in lxml 3.5.0.

Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers