strings generator improperly adds semicolon to unescaped ampersand
Bug #1685044 reported by
Mike Ottum
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
In the below example, the strings generator replaces the character sequence `&b` with `&b;`, presumably because the ampersand is unescaped. While the HTML is invalid, character sequences of this sort often exist in HTML in the wild, and it would be best for BeautifulSoup to leave the ampersand alone.
Actual behavior:
```
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(
>>> [s for s in soup.strings]
[u'example.
```
Desired behavior:
```
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(
>>> [s for s in soup.strings]
[u'example.
```
Changed in beautifulsoup: | |
status: | Confirmed → Won't Fix |
To post a comment you must log in.
Should have mentioned, this is with BeautifulSoup 4.5.3 and Python 2.7.10.