bs4 DataLossWarning

Bug #727014 reported by Zach Williams on 2011-03-01
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Undecided
Unassigned

Bug Description

I'm not sure if this is a bug or not -- maybe I'm doing something wrong? I keep receiving the following error whenever I first start up bs4.

Quick test case:

>>> import urllib2
>>> from bs4 import BeautifulSoup as bs
>>> site = 'http://www.crummy.com/'
>>> url = urllib2.urlopen(site)
>>> soup = bs(url.read())

Then I receive the following warnings:

/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/bs4/builder/_html5lib.py:60: DataLossWarning: namespaceHTMLElements not supported yet
  DataLossWarning)
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/bs4/builder/_html5lib.py:77: DataLossWarning: BeautifulSoup cannot represent elements in any namespace
  warnings.warn("BeautifulSoup cannot represent elements in any namespace", DataLossWarning)

Should I be doing something different than passing the url.read() into BeautifulSoup? Everything works out -- and the warning never occurs again -- but it always pops up on the first instance I use BeautifulSoup.

Leonard Richardson (leonardr) wrote :

html5lib supports namespaced elements (like <namespace:tag>), and Beautiful Soup doesn't yet. These warnings are mostly a reminder to myself that I need to add namespace support. Unless you're actually parsing code that has namespaced tags, there won't be any real data loss.

Changed in beautifulsoup:
status: New → Confirmed
Zach Williams (hey-zachwill) wrote :

Nice. Thanks for the quick reply, man.

Leonard Richardson (leonardr) wrote :

BS4 beta 8 supports namespaced elements and attributes in a very basic way, so I've removed the warning.

Changed in beautifulsoup:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers