SVG inlined in HTML confuses beautifulsoup
Bug #1873640 reported by
paul@hammant.org
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
https:/
I'm trying to extract my strings for translations. With respect to this source:
<tspan font-family=
y="8">Hand wash in very hot soapy
</tspan>
Beautifulsoup thinks that is:
Hand wash in very hot soapy\n', '
Yup, trailing CR (perhaps correctly?), and a ', ' sequence that's not in the source at all.
To post a comment you must log in.
Can you provide the Python code you're using to extract this text?
Here's my best guess at a recreation:
import requests /cv-masks. github. io/ragmask- max.html").content markup, 'html.parser')
from bs4 import BeautifulSoup
markup = requests.get("https:/
soup = BeautifulSoup(
svg = soup.find('svg', width="162.7954")
[x for x in svg.strings]
The result is:
['\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', '\n', ' Produced by OmniGraffle 7.15\n ', '2020-04-15 11:40:49 +0000', '\n', '\n', '\n', 'Canvas 3', '\n', '\n', 'Layer 1', '\n', '\n', '\n', '\n', '\n', '22\n ', '\n', '\n', '\n', '\n', '\n', 'Hand wash in very hot soapy\n ', '\n', 'water and dry before use\n ', '\n', '\n', '\n', '\n', '\n']
This is the closest I could get to your observed output.