Element's serialised namespace can be different to in-memory ns after being inserted into an el

Bug #1424232 reported by Hal Blackburn on 2015-02-21
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
lxml
Undecided
Unassigned

Bug Description

I've come across a situation where the namespace of an Element is correct when using say etree.QName(elem).namespace, but when the elem is serialised to a string, the written XML has a different (incorrect) namespace.

There's an attached script which reproduces the issue, but here's a brief description of how this situation occurs:

You have two XML trees:
<a xmlns="x:/foo"/>

<f:c xmlns="x:/bar" xmlns:f="x:/foo"/>

Note that {x:/foo}a and {x:/foo}c are in the same namespace, but c has a different default ns to a.

When c is inserted as a child of a, it loses it's "f": "x:/foo" entry in the nsmap, but retains the None: "x:/bar" entry (as expected), so when serialised, the c element is written without a prefix, but with the xmlns="x:/bar", so it effectively gets moved into the x:/bar ns. However, in memory c.tag reports "{x:/foo}c" as expected.

Interactive examples:

$ python
Python 3.4.2 (default, Oct 19 2014, 17:52:17)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.51)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml import etree
>>> a = etree.XML('<a xmlns="x:/foo"/>')
>>> c = etree.XML('<f:c xmlns="x:/bar" xmlns:f="x:/foo"/>')
>>> print(etree.tostring(a, encoding="unicode"))
<a xmlns="x:/foo"/>
>>> print(etree.tostring(c, encoding="unicode"))
<f:c xmlns="x:/bar" xmlns:f="x:/foo"/>
>>> c.nsmap
{'f': 'x:/foo', None: 'x:/bar'}
>>> a.insert(0, c)
>>> c.nsmap
{None: 'x:/bar'}
>>> c.tag
'{x:/foo}c'
>>> print(etree.tostring(a, encoding="unicode"))
<a xmlns="x:/foo"><c xmlns="x:/bar"/></a>

Note that this also happens in the same way if a and c start in the same document and c is inserted into a again:

>>> a = etree.XML('<a xmlns="x:/foo"><f:c xmlns="x:/bar" xmlns:f="x:/foo"/></a>')
>>> print(etree.tostring(a, encoding="unicode"))
<a xmlns="x:/foo"><f:c xmlns="x:/bar" xmlns:f="x:/foo"/></a>
>>> c = list(a)[0]
>>> a.insert(0, c)
>>> print(etree.tostring(a, encoding="unicode"))
<a xmlns="x:/foo"><c xmlns="x:/bar"/></a>

Strangely, if you insert c twice, it somewhat fixes itself by giving x:/foo a default prefix in the nsmap:

>>> a = etree.XML('<a xmlns="x:/foo"><f:c xmlns="x:/bar" xmlns:f="x:/foo"/></a>')
>>> print(etree.tostring(a, encoding="unicode"))
<a xmlns="x:/foo"><f:c xmlns="x:/bar" xmlns:f="x:/foo"/></a>
>>> c = list(a)[0]
>>> a.insert(0, c)
>>> print(etree.tostring(a, encoding="unicode"))
<a xmlns="x:/foo"><c xmlns="x:/bar"/></a>
>>> a.insert(0, c)
>>> print(etree.tostring(a, encoding="unicode"))
<a xmlns="x:/foo"><ns0:c xmlns="x:/bar" xmlns:ns0="x:/foo"/></a>

My versions:
Python : sys.version_info(major=3, minor=4, micro=2, releaselevel='final', serial=0)
lxml.etree : (3, 4, 2, 0)
libxml used : (2, 9, 0)
libxml compiled : (2, 9, 0)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)

Hal Blackburn (hal-blackburn) wrote :
summary: Element's serialised namespace can be different to in-memory ns after
- being inserting into an el
+ being inserted into an el
description: updated
Funky Future (funky-future) wrote :

The bug stems from a non-cleaned-up nsmap-property:

>>> x = etree.fromstring('<div xmlns="foo"/>')
>>> x.tag = 'div'
>>> assert etree.fromstring(etree.tostring(x)).tag == 'div'
Traceback (most recent call last):
  File "<input>", line 1, in <module>
AssertionError

>>> x.nsmap
{None: 'foo'}

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers