Ampersands should always be escaped, even if they look like entities

Bug #1182183 reported by Leonard Richardson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Released
Undecided
Unassigned

Bug Description

This test made sense for Beautiful Soup 3, but not 4:

----
        self.assertEqual(
            self.sub.substitute_xml("ÁT&T"),
             "ÁT&T")
----

Depending on how BS3 parsed a document, entities like "Á" might or might not be turned into the corresponding Unicode characters. If you saw "Á" in a document, there was no way to tell whether the original document said "Á" or "&Aacute". So we had code that only turned "&" into "&" if it looked like the "&" was not the beginning of an entity.

In Beautiful Soup 4, entities are always turned into the corresponding Unicode characters. So there's no reason not to turn "&" into "&".

The one wrinkle is entities that aren't HTML entities, like '&foo;'. You could argue that '&foo;' should come out the same way it went in, instead of being turned into "&foo;". But that's not the way it works now, and no one has complained.

Revision history for this message
Leonard Richardson (leonardr) wrote :

Fixed in revision 301.

Changed in beautifulsoup:
status: New → Fix Committed
Changed in beautifulsoup:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.