Provide convenience methods to add/remove class keywords

Bug #2052943 reported by Chris Papademetrious
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
New
Undecided
Unassigned

Bug Description

This is an enhancement request.

It would be nice to have convenience methods to add/remove keywords from the "class" attribute.

For example,

====
elt.add_class('foo')
elt.add_class(['bar', 'baz'])

elt.remove_class('foo')
elt.remove_class(['bar', 'baz'])
====

The methods should accept string or list-of-string values. Existing keywords should not be duplicated. remove_class() should delete the "class" attribute if it becomes empty.

I could propose a merge request for this, if you're open to it.

These are the helper functions we currently use:

====
def add_class(tag, these_classes):
    if isinstance(these_classes, str):
        these_classes = [these_classes]
    for this_class in these_classes:
        if tag.get('class', None) is None:
            tag['class'] = []
        if not this_class.isspace() and this_class not in tag['class']:
            tag['class'].append(this_class)
    return tag

def remove_class(tag, these_classes):
    if isinstance(these_classes, str):
        these_classes = [these_classes]
    for this_class in these_classes:
        if tag.get('class', None):
            tag['class'] = [x for x in tag.get('class') if not x in these_classes]
    if not tag['class']:
        del tag['class']
    return tag
====

We also have a helper function to test if a keyword exists in "class":

====
def has_class(tag, this_class):
    return bool(tag.get('class', []) and this_class in tag['class'])
====

but this would be provided by self-testing from #2052936, if that comes to be:

====
if elt.matches(class_ = ...):
    # ...
====

Revision history for this message
Leonard Richardson (leonardr) wrote :
Download full text (4.0 KiB)

This request is asking to create a distinction between the "class" attribute and other attributes that I don't think is appropriate. I can think of a better way to offer functionality like this, but I don't know how useful it'd be.

The main time Beautiful Soup treats the "class" attribute as special is when working around the fact that "class" is also a Python reserved word. The rest of the time, "class" is treated as part of a family of attributes that Beautiful Soup calls multi-valued attributes. The HTML spec calls these "CDATA list attributes," with "class" being the most common.

There are lots of CDATA list attributes: "accesskey" and "dropzone" work the same way as "class", and for certain tags, attributes like "rel" or "headers" may also work like that. For XML documents, you can set up the CDATA list attributes however you want. For this reason I'm *very* reluctant to add methods with "class" in the name.

(a dictionary listing HTML's CDATA list attributes is here; this configures Beautiful Soup's default behavior when given an HTML document): https://git.launchpad.net/beautifulsoup/tree/bs4/builder/__init__.py?h=4.13#n522

The crucial moment in Beautiful Soup where we treat CDATA list attributes differently from regular attributes is here:
https://git.launchpad.net/beautifulsoup/tree/bs4/builder/__init__.py?h=4.13#n394

The attribute value comes out of there either a string (for normal attributes) or a list of strings (for CDATA list attributes).

I think the best way to do what you want would be to define a subclass of `list` or `set`, add helper methods to that class, and have Beautiful Soup instantiate that class to hold the values of a CDATA list attribute. That way the functionality isn't specific to the "class" attribute; it'd be additional complexity available to the attribute value itself, if the attribute value happened to be of this kind.

Here's my summary of the functionality you want on this class:

1. Check for duplicates on insertion, like set does.
2. Treat insertion of a list as insertion of every item in the list (like list.extend does)
3. Preserve the original value order, like list does.
4. Empty values are treated as the absence of a value.

#3 is the current behavior and I'd want it preserved. I don't think there's a method you could add to this class that would give you #4 in a backwards compatible way (currently an empty list for a CDATA list attribute becomes a ). You could do #2 but unless you're okay with using both append() and extend(), your interface would deviate from what `list` offers.

As for #1, you could definitely do it, but again backwards compatibility is the issue. I've seen some really weird stuff and I'm almost positive somebody's workflow depends on sticking the same CSS class into a tag multiple times.

So I'm not wild about the "subclass of `list` idea" either--there are too many backwards compatibility pitfalls and deviations from how Python's built-in data structures work.

However, there has been a trend in recent years where I enable advanced use cases by allowing users to customize which classes Beautiful Soup instantiates in different circumstances. See for example the "element_cl...

Read more...

Revision history for this message
Chris Papademetrious (chrispitude) wrote :

Your four-item summary of the requested functionality is spot-on.

Indeed, these methods could (and should) support other attributes, such as:

====
def add_class(self, these_classes, attname='class'):
    ...

def remove_class(self, these_classes, attname='class'):
    ...
====

For example, if I am working with non-HTML content (such as DITA XML source), then that will have its own conventions for list-of-strings attributes. For example, DITA has profiling condition attributes that are also multi-value:

====
dita_tag.add_class('expert', attname='audience')
====

I am completely fine with not using "class" in the name. (I am just used to XML::Twig's methods.) The terminology used in the BS4 documentation is "multi-valued attributes" so we probably use something consistent with that. Some ideas are:

====
tag.add_value('foo')
tag.remove_value(['foo', 'bar'])

tag.add_multivalue('foo')
tag.remove_multivalue(['foo', 'bar'])
====

If the attribute name argument is required (instead of defaulting to 'class'), it would make the methods' purposes clearer in context:

====
tag.add_value('foo', attname='class')
tag.remove_value(['foo', 'bar'], attname='class')

tag.add_multivalue('foo', attname='class')
tag.remove_multivalue(['foo', 'bar'], attname='class')
====

or

====
tag.add_value('class', 'foo')
tag.remove_value('class', ['foo', 'bar'])

tag.add_multivalue('class', 'foo')
tag.remove_multivalue('class', ['foo', 'bar'])
====

What would the subclassed-list UI look like? How would it handle the addition of a value to an attribute that doesn't exist yet? Could it handle addition/removal of both single values and lists of values?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.