Comment 6 for bug 2047713

Revision history for this message
Leonard Richardson (leonardr) wrote : Re: enhance find*() methods to filter through all object types

Take a look at https://code.launchpad.net/~leonardr/beautifulsoup/+git/beautifulsoup/+merge/459082. I'd want to play around with terminology, and make the base class capable of being passed into the BeautifulSoup constructor as parse_only. But I'm pretty happy with this overall. It would let you write code that looked like this:

from bs4 import BeautifulSoup, NavigableString
from bs4.strainer import ElementMatcher

def non_whitespace(element):
    return not (isinstance(element, NavigableString) and element.text.isspace())

match = ElementMatcher(non_whitespace)

html_doc = """
<p>
  <b>bold</b>
  <i>italic</i>
  and
  <u>underline</u>
  <br />
</p>
"""
soup = BeautifulSoup(html_doc, 'lxml')

# get the first non-whitespace thing in <p>
this_thing = soup.find('p').find(match, recursive=False)

# print all following non-whitespace sibling elements in <p>
while this_thing:
    next_thing = this_thing.find_next_sibling(match)
    print(f"{repr(this_thing)} is followed by {repr(next_thing)}")
    this_thing = next_thing