beautifulsoup 4-4.5.0 : parsing error on 'link' tags
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
Using BeautifulSoup 4-4.6.0 in python (3.6) i'm getting unexpected behaviour when parsing 'link' tags.
#Expected Result:
python3.5 -m pip freeze
beautifulsoup4=
Python 3.5.2 (v3.5.2:
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from bs4 import BeautifulSoup
>>> x = '''<line>
>>> BeautifulSoup(x, 'html.parser')
<line>www.
>>> y = '''<link>
>>> BeautifulSoup(y, 'html.parser')
<link>www.
>>>
#Unexpected Result:
python3.6 -m pip freeze
beautifulsoup4=
Python 3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:04:09)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from bs4 import BeautifulSoup
>>> x='''<line>
>>> BeautifulSoup(
<line>www.
>>> y='''<link>
>>> BeautifulSoup(
<link/>
>>>
I don't have html5lib lxml installed (fairly certain)
appreciate any help on this, let me know if i've made a mistake with the above.
Thanks for the report. The behavior you're seeing is by design. In revision 446, I changed the way Beautiful Soup processes HTML empty-element tags, so that any contents are moved outside the tag. This was necessary to fix bug 1676935 and make the processing of empty-element tags consistent across parsers.