element with xmlns attribute is not rendered properly

Bug #1839185 reported by Tedd Terry
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
beautifulsoup4 (Ubuntu)
Fix Released
Undecided
Unassigned
lxml (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

lxml 4.4.0 introduces an issue with the following XML:

<?xml version="1.0" encoding="utf-8"?>
<NAMM_PO version="2009.2" xmlns="http://namm.com/PO/2009.2">
<Id>TEST_ID</Id>
<NAMM_PO>

This is the output XML:

<?xml version="1.0" encoding="utf-8"?>
<NAMM_PO version="2009.2" xmlns:="http://namm.com/PO/2009.2">
<Id>TEST_ID</Id>
<NAMM_PO/></NAMM_PO>

Note the missing closing tag of the NAMM_PO element, the additional NAMM_PO element, and the addition of a colon to the xmlns attribute.

Version info:

Python : sys.version_info(major=2, minor=7, micro=10, releaselevel='final', serial=0)
lxml.etree : (4, 4, 0, 0)
libxml used : (2, 9, 9)
libxml compiled : (2, 9, 9)
libxslt used : (1, 1, 33)
libxslt compiled : (1, 1, 33)

To repro:

Open and unarchive attached 'lxml_4_4_0_bug.zip'. Create a virtual environment, install requirements, and run 'python repro.py' to produce the above output.

Revision history for this message
scoder (scoder) wrote :

The input is not well-formed XML, so I guess you used the "recover" option to parse it at all.
Just reject invalid input instead.

Changed in lxml:
status: New → Invalid
Revision history for this message
Tedd Terry (tterry) wrote :

Sorry, my mistake, I messed up the input while trying to trim it down for a repro case.

There is definitely a bug (or unexpected change in behavior) that reproduces with lxml 4.4.0 and not lxml 4.3.5 or earlier.

Please find updated well formed XML with the same issue which caused XML parsed with lxml 4.4.0 to be rejected by a .NET service further down the line. Again, note the colon in the declaration of the xmlns attribute of the NAMM_PO element:

Input XML:
<?xml version="1.0" encoding="utf-8"?>
<NAMM_PO version="2009.2" xmlns="http://namm.com/PO/2009.2">
<Id>TEST_ID</Id>
</NAMM_PO>

Output XML (lxml 4.4.0):
<?xml version="1.0" encoding="utf-8"?>
<NAMM_PO version="2009.2" xmlns:="http://namm.com/PO/2009.2">
<Id>TEST_ID</Id>
</NAMM_PO>

Output XML (lxml 4.3.5):
<?xml version="1.0" encoding="utf-8"?>
<NAMM_PO version="2009.2" xmlns="http://namm.com/PO/2009.2">
<Id>TEST_ID</Id>
</NAMM_PO>

Revision history for this message
Tedd Terry (tterry) wrote :
Changed in lxml:
status: Invalid → New
Revision history for this message
scoder (scoder) wrote :

Sorry, but I'm not going to dig through BeautifulSoup to find out how (and why) it is generating this output. I'm pretty sure it's not lxml doing that.

Revision history for this message
Tedd Terry (tterry) wrote :

You don't have to dig too far... here's where lxml is invoked in the library: https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/head:/bs4/builder/_lxml.py

This bug doesn't reproduce if I use lxml 4.3.5 with BeautifulSoup so it sounds like an lxml issue to me.

Revision history for this message
scoder (scoder) wrote :

It's BS that generates the output here, though. It shouldn't write out "xmlns:=…". If you can find the reason why it does that, we'll know if the problem is in lxml or BS.

Revision history for this message
Gleb (punkerpunker) wrote :

Experiencing exact same problem. Originated for me when upgrading from lxml==4.3.5 to lxml==4.7.1.
BeatufiulSoup version remains unchanged. Since from x.y.z "y" was changed, it shouldn't break backward compatibility when actually it does (if we think that the BS is the root of the issue). Otherwise, sounds like an lxml issue.

Will try to dig deeper, anyway

Revision history for this message
Paride Legovini (paride) wrote :

Thanks, marked as Incomplete waiting for further information.

Changed in beautifulsoup4 (Ubuntu):
status: New → Incomplete
Revision history for this message
Gleb (punkerpunker) wrote :

One thing to mention - seems like the issue is resolved when using bs4==4.10.0 (could be reproduced with bs4==4.7.1)

Revision history for this message
scoder (scoder) wrote :

> issue is resolved when using bs4==4.10.0

Thanks for investigating. That means that it's no longer an issue then, right?
Closing this as a third-party issue.

Changed in lxml:
status: New → Invalid
Revision history for this message
Gleb (punkerpunker) wrote :

I think so, thank you for your help.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for beautifulsoup4 (Ubuntu) because there has been no activity for 60 days.]

Changed in beautifulsoup4 (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Paride Legovini (paride) wrote :

Jammy ships bs4 4.10.0:

 beautifulsoup4 | 4.10.0-2 | jammy | source

so I'm marking the beautifulsoup4 devel task as Fix Released.

Changed in lxml:
status: Invalid → Fix Released
Changed in beautifulsoup4 (Ubuntu):
status: Expired → Fix Released
affects: lxml → lxml (Ubuntu)
Changed in lxml (Ubuntu):
status: Fix Released → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.