Quote in doctype systemliteral

Bug #1421927 reported by Olli Pottonen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Medium
Olli Pottonen

Bug Description

XML 1.0 standard specifies that document type declaration system literal must be of the form

('"' [^"]* '"') | ("'" [^']* "'")

[http://www.w3.org/TR/REC-xml/#sec-prolog-dtd]

That is, it is anything that starts and ends with hyphen, or starts and ends with quote.
Especially, it may be a string which starts with hyphen, contains a quote, and end swith hyphen.
That is, these both are valid:

<!DOCTYPE a PUBLIC 'foo' '"'><a/>
<!DOCTYPE a SYSTEM '"'><a/>

However, both cases break lxml:
>>> import lxml.etree
>>> doc = lxml.etree.XML('''<!DOCTYPE a PUBLIC 'foo' '"'><a/>''').getroottree()
>>> doc.docinfo.doctype
u'<!DOCTYPE a PUBLIC "foo" """>'
>>> lxml.etree.tostring(doc)
'<!DOCTYPE a PUBLIC "foo" """>\n<a/>'
>>>
>>> doc = lxml.etree.XML('''<!DOCTYPE a SYSTEM '"'><a/>''').getroottree()
>>> doc.docinfo.doctype
u'<!DOCTYPE a SYSTEM """>'
>>> lxml.etree.tostring(doc)
'<!DOCTYPE a SYSTEM """>\n<a/>'

Revision history for this message
scoder (scoder) wrote :
Changed in lxml:
importance: Undecided → Medium
milestone: none → 3.4
status: New → In Progress
Revision history for this message
scoder (scoder) wrote :
Changed in lxml:
assignee: nobody → Olli Pottonen (olli-pottonen)
milestone: 3.4 → 3.5
status: In Progress → Fix Committed
Revision history for this message
scoder (scoder) wrote :

Fixed in lxml 3.5.0.

Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers