Serializing ElementTree duplicates ns
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Happens on both versions:
Python : sys.version_
lxml.etree : (4, 8, 0, 0)
Python : sys.version_
lxml.etree : (4, 7, 1, 0)
Host libraries:
libxml used : (2, 9, 13)
libxml compiled : (2, 9, 13)
libxslt used : (1, 1, 34)
libxslt compiled : (1, 1, 34)
The following test code is, I think, self-explanatory. First, the working example:
>>> b = b'<?xml version="1.0" encoding=
>>> lxml.etree.
b'<html xmlns="http://
Then we add a DOCTYPE and the serialized tree contains the same xmlns twice:
>>> b = b'<?xml version="1.0" encoding=
>>> lxml.etree.
b'<html xmlns="http://
Which is invalid XML and can’t be read: lxml.etree.
Cheers,
Jens
I think it depends on the DOCTYPE itself:
>>> b = b'<?xml version="1.0" encoding= "UTF-8" ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http:// www.w3. org/TR/ xhtml1/ DTD/xhtml1- transitional. dtd"><html xmlns="http:// www.w3. org/1999/ xhtml"></html>' tostring( lxml.html. fromstring( b)) www.w3. org/1999/ xhtml" xmlns="http:// www.w3. org/1999/ xhtml"></html>'
>>> lxml.etree.
b'<html xmlns="http://
and
>>> b = b'<?xml version="1.0" encoding= "UTF-8" ?><!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.1//EN" "http:// www.w3. org/TR/ xhtml11/ DTD/xhtml11. dtd"><html xmlns="http:// www.w3. org/1999/ xhtml"></html>' tostring( lxml.html. fromstring( b)) www.w3. org/1999/ xhtml"/>'
>>> lxml.etree.
b'<html xmlns="http://