When an XML file has multiple aliases for a single namespace URI, the last alias encountered is the only one used
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Confirmed
|
Low
|
Unassigned |
Bug Description
Original mailing list thread:
https:/
Consider markup like this:
<package xmlns:opf="http://
The "http://
This isn't invalid -- the XML document means the same thing as it did before -- but many processing tools rely on looking for specific namespace aliases rather than URIs. lxml is able to preserve the aliases, so it may be possible to do the same when Beautiful Soup uses the lxml parser, assuming the user doesn't mess with the aliases after parsing the document.
This probably requires tagging every Tag object with the alias it came in with, not just the namespace URI it came in with -- hopefully lxml makes this possible.
Changed in beautifulsoup: | |
status: | New → Confirmed |
We also have a smaller but much more serious problem that happens when a namespace's prefix is the empty string, as opposed to None. Attributes for that tag are output as ":foo" rather than "foo". This is not a problem for tag names, only attribute names.
This is fixed in revision 595. I'm leaving this issue open because it describes a real, though much less serious, problem.