Comment 24 for bug 1768330

Revision history for this message
Chris Papademetrious (chrispitude) wrote :

The call to find_parent() could be rewritten using next() as a more lightweight way to find the closest (lowest-level) enclosing block element (the rest of the code is unchanged):

====
def my_all_strings (soup, block_elements=True):
    strings = []
    last_block_container = None
    for element in soup.descendants:

        # determine if we have entered a new string context or not
        if isinstance(element, NavigableString):
            if (block_elements is True):
                # separate *every* string (current behavior)
                new_container = True
            elif (block_elements):
                # must be a list; use block-element semantics
                try:
                    this_block_container = next(parent for parent in element.parents if parent.name in block_elements)
                except StopIteration:
                    this_block_container = None
                new_container = (this_block_container is not last_block_container)
                last_block_container = this_block_container
            else:
                # return one big string
                new_container = False

            if new_container or not strings:
                # start a new string
                strings.append("")

            strings[-1] += element.text
    return strings
====