When multiple prefixes are defined for a single namespace URI, a _SaxParserTarget can't know which prefix was originally used for a given element
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Triaged
|
Low
|
Unassigned |
Bug Description
This report comes from a bug reported against my project, Beautiful Soup: https:/
Here's the output of running the attached script:
===
Python : sys.version_
lxml.etree : (4, 6, 2, 0)
libxml used : (2, 9, 10)
libxml compiled : (2, 9, 10)
libxslt used : (1, 1, 34)
libxslt compiled : (1, 1, 34)
lxml
<package xmlns:opf="http://
<metadata>
<dc:identifier opf:scheme=
<dc:identifier opf2:scheme=
</metadata>
</package>
-------
Beautiful Soup
<?xml version="1.0" encoding="utf-8"?>
<package xmlns:dc="http://
<metadata>
<dc:identifier opf2:scheme=
<dc:identifier opf2:scheme=
</metadata>
</package>
===
The markup in the attached script defines two different prefixes for the namespace URI "http://
The "_SaxParserTarg
In this case, lxml gives me the attribute name '{http://
I don't consider this a serious problem, but I wanted to bring it to your attention; maybe I've missed something in the _SaxParserTarget interface that would make it an easy fix.
Thanks for the report. I don't think there is an easy way to improve this. The interface uses ElementTree's qualified tag names. It's intentional to resolve the prefixes here, in order to make it _easier_ for users to deal with namespaces. There is obviously a loss of parser available data in doing this, but since prefixes are not part of the XML information set, it's not a loss in the document information. Only round-trips suffer from this issue.
So, I do admit that it's an issue. But it seems a very rare issue and it isn't easy to do something about.
Sounds like a "won't fix".