Documentation of multi-valued attributes unclear

Bug #1970767 reported by Kevin Cole
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Fix Released
Undecided
Unassigned

Bug Description

The documentation on "multi_valued_attributes=None" is unclear to me
https://beautiful-soup-4.readthedocs.io/en/latest/#multi-valued-attributes

* Python 3.8.10
* BeautifulSoup 4.8.2-1 (python3-bs4)

Using the same string as source, the documentation seems to suggest that there should be two different results, depending upon the use of the multi_valued_attributes argument. However, I get identical results:

  >>> rel_soup = BeautifulSoup('<p>Back to the <a rel="index">homepage</a></p>')
  >>> rel_soup.a['rel']
  ['index']
  >>> rel_soup.a['rel'] = ['index', 'contents']
  >>> rel_soup.a['rel']
  ['index', 'contents']
  >>> print(rel_soup.p)
  <p>Back to the <a rel="index contents">homepage</a></p>

Adding in the multi_valued_attributes argument:

  >>> rel_soup = BeautifulSoup('<p>Back to the <a rel="index">homepage</a></p>', "html", multi_valued_attributes=None)
  >>> rel_soup.a['rel']
  'index'
  >>> rel_soup.a['rel'] = ['index', 'contents']
  >>> rel_soup.a['rel']
  ['index', 'contents']
  >>> print(rel_soup.p)
  <p>Back to the <a rel="index contents">homepage</a></p>

Kevin Cole (kjcole)
description: updated
Revision history for this message
Kevin Cole (kjcole) wrote (last edit ):

Apparently, it does not work with the rel attribute. It using the class attribute in both examples would illustrate the contrast better:

Without the multi_valued_attributes argument:

>>> no_list_soup = BeautifulSoup('<p class="body strikeout"></p>')
>>> no_list_soup.p['class']
['body', 'strikeout']

With the multi_valued_attributes argument:

>>> no_list_soup = BeautifulSoup('<p class="body strikeout"></p>', 'html', multi_valued_attributes=None)
>>> no_list_soup.p['class']
'body strikeout'

Revision history for this message
Leonard Richardson (leonardr) wrote :

In your original description of this issue, the results are identical but the output isn't. In your first example, the value of the rel attribute starts out as a list:

 >>> rel_soup.a['rel']
  ['index']

In the second example, the value of the rel attribute starts out as a string:

 >>> rel_soup.a['rel']
  'index'

That's the same as in your second example with 'class', except the list version starts off with one item in it.

Once you set an attribute value to a list, it will stay a list until you output to a string, no matter how the document was originally parsed. multi_valued_attributes only affects the parsing process. I slightly reworded the documentation to make it more clear.

I think the best I can do to avoid this kind of confusion in the future is to make the 'rel' example start out with multiple values, making it more obvious that the values get turned into a list. This change along with the rewording is in revision 0cdcc79.

Changed in beautifulsoup:
status: New → Fix Released
summary: - Documentation unclear
+ Documentation of multi-valued attributes unclear
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.