beautifulsoup 4-4.5.0 : parsing error on 'link' tags

Bug #1741631 reported by Ronald MacEachern
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Won't Fix
Undecided
Unassigned

Bug Description

Using BeautifulSoup 4-4.6.0 in python (3.6) i'm getting unexpected behaviour when parsing 'link' tags.

#Expected Result:

python3.5 -m pip freeze
beautifulsoup4==4.5.1

Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 26 2016, 10:47:25)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from bs4 import BeautifulSoup
>>> x = '''<line>www.google.com</line>'''
>>> BeautifulSoup(x, 'html.parser')
<line>www.google.com</line>
>>> y = '''<link>www.google.com</link>'''
>>> BeautifulSoup(y, 'html.parser')
<link>www.google.com</link>
>>>

#Unexpected Result:

python3.6 -m pip freeze
beautifulsoup4==4.6.0

Python 3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:04:09)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from bs4 import BeautifulSoup
>>> x='''<line>www.google.ca</line>'''
>>> BeautifulSoup(x,'html.parser')
<line>www.google.ca</line>
>>> y='''<link>www.google.ca</link>'''
>>> BeautifulSoup(y,'html.parser')
<link/>www.google.ca
>>>

I don't have html5lib lxml installed (fairly certain)

appreciate any help on this, let me know if i've made a mistake with the above.

Revision history for this message
Leonard Richardson (leonardr) wrote :

Thanks for the report. The behavior you're seeing is by design. In revision 446, I changed the way Beautiful Soup processes HTML empty-element tags, so that any contents are moved outside the tag. This was necessary to fix bug 1676935 and make the processing of empty-element tags consistent across parsers.

Changed in beautifulsoup:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.