Element's serialised namespace can be different to in-memory ns after being inserted into an el

Bug #1424232 reported by Hal Blackburn
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
lxml
Confirmed
Medium
Unassigned

Bug Description

I've come across a situation where the namespace of an Element is correct when using say etree.QName(elem).namespace, but when the elem is serialised to a string, the written XML has a different (incorrect) namespace.

There's an attached script which reproduces the issue, but here's a brief description of how this situation occurs:

You have two XML trees:
<a xmlns="x:/foo"/>

<f:c xmlns="x:/bar" xmlns:f="x:/foo"/>

Note that {x:/foo}a and {x:/foo}c are in the same namespace, but c has a different default ns to a.

When c is inserted as a child of a, it loses it's "f": "x:/foo" entry in the nsmap, but retains the None: "x:/bar" entry (as expected), so when serialised, the c element is written without a prefix, but with the xmlns="x:/bar", so it effectively gets moved into the x:/bar ns. However, in memory c.tag reports "{x:/foo}c" as expected.

Interactive examples:

$ python
Python 3.4.2 (default, Oct 19 2014, 17:52:17)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml import etree
>>> a = etree.XML('<a xmlns="x:/foo"/>')
>>> c = etree.XML('<f:c xmlns="x:/bar" xmlns:f="x:/foo"/>')
>>> print(etree.tostring(a, encoding="unicode"))
<a xmlns="x:/foo"/>
>>> print(etree.tostring(c, encoding="unicode"))
<f:c xmlns="x:/bar" xmlns:f="x:/foo"/>
>>> c.nsmap
{'f': 'x:/foo', None: 'x:/bar'}
>>> a.insert(0, c)
>>> c.nsmap
{None: 'x:/bar'}
>>> c.tag
'{x:/foo}c'
>>> print(etree.tostring(a, encoding="unicode"))
<a xmlns="x:/foo"><c xmlns="x:/bar"/></a>

Note that this also happens in the same way if a and c start in the same document and c is inserted into a again:

>>> a = etree.XML('<a xmlns="x:/foo"><f:c xmlns="x:/bar" xmlns:f="x:/foo"/></a>')
>>> print(etree.tostring(a, encoding="unicode"))
<a xmlns="x:/foo"><f:c xmlns="x:/bar" xmlns:f="x:/foo"/></a>
>>> c = list(a)[0]
>>> a.insert(0, c)
>>> print(etree.tostring(a, encoding="unicode"))
<a xmlns="x:/foo"><c xmlns="x:/bar"/></a>

Strangely, if you insert c twice, it somewhat fixes itself by giving x:/foo a default prefix in the nsmap:

>>> a = etree.XML('<a xmlns="x:/foo"><f:c xmlns="x:/bar" xmlns:f="x:/foo"/></a>')
>>> print(etree.tostring(a, encoding="unicode"))
<a xmlns="x:/foo"><f:c xmlns="x:/bar" xmlns:f="x:/foo"/></a>
>>> c = list(a)[0]
>>> a.insert(0, c)
>>> print(etree.tostring(a, encoding="unicode"))
<a xmlns="x:/foo"><c xmlns="x:/bar"/></a>
>>> a.insert(0, c)
>>> print(etree.tostring(a, encoding="unicode"))
<a xmlns="x:/foo"><ns0:c xmlns="x:/bar" xmlns:ns0="x:/foo"/></a>

My versions:
Python : sys.version_info(major=3, minor=4, micro=2, releaselevel='final', serial=0)
lxml.etree : (3, 4, 2, 0)
libxml used : (2, 9, 0)
libxml compiled : (2, 9, 0)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)

Revision history for this message
Hal Blackburn (hal-blackburn) wrote :
summary: Element's serialised namespace can be different to in-memory ns after
- being inserting into an el
+ being inserted into an el
description: updated
Revision history for this message
Funky Future (funky-future) wrote :

The bug stems from a non-cleaned-up nsmap-property:

>>> x = etree.fromstring('<div xmlns="foo"/>')
>>> x.tag = 'div'
>>> assert etree.fromstring(etree.tostring(x)).tag == 'div'
Traceback (most recent call last):
  File "<input>", line 1, in <module>
AssertionError

>>> x.nsmap
{None: 'foo'}

Revision history for this message
scoder (scoder) wrote :
Revision history for this message
scoder (scoder) wrote :

Looks like a bug to me. If the element redefines the (potentially empty) prefix that its namespace URI is mapped to by the parent nodes, then it should not use that prefix.

Changed in lxml:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
scoder (scoder) wrote :

PR definitely welcome.

Revision history for this message
Mark A. Gibbs (indi) wrote :

Is this issue caused by the same bug?:

$ python3
Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml import etree
>>> a = etree.fromstring("""<a xmlns="A"/>""")
>>> b = etree.fromstring("""<b/>""")
>>> a.append(b)
>>> etree.tostring(a)
b'<a xmlns="A"><b/></a>'
>>> etree.tostring(b)
b'<b xmlns="A"/>'

Note the inserted element appears to "adopt" the namespace of the parent when being serialized. It seems to take on the parent's namespace map without taking into account that the prefix (or lack thereof) is reused:

# Continuing from above...
>>> a.tag
'{A}a'
>>> b.tag
'b'
>>> a.nsmap
{None: 'A'}
>>> b.nsmap
{None: 'A'}

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.