I might have a solution to this. The idea is to keep accumulating NavigableString fragments into a "current string item" as long as we're inside the same lowest-level containing block element. If we move into a new block element, then we start a new string item and accumulate into that. The behavior can be controlled by considering a "block_elements" argument that specifies the granularity of block context inference. If I have the following input document: ==== from bs4 import BeautifulSoup, NavigableString html_doc = """

sentence one.

sentence two.

Hello World!

Test

""" soup = BeautifulSoup(html_doc, 'lxml') ==== and I evaluate the following function: ==== def my_all_strings (soup, block_elements=True): strings = [] last_block_container = None for element in soup.descendants: # determine if we have entered a new string context or not if isinstance(element, NavigableString): if (block_elements is True): # separate *every* string (current behavior) new_container = True elif (block_elements): # must be a list; use block-element semantics this_block_container = element.find_parent(block_elements) new_container = (this_block_container is not last_block_container) last_block_container = this_block_container else: # return one big string new_container = False if new_container or not strings: # start a new string strings.append("") strings[-1] += element.text return strings block_elements = ['address', 'article', 'aside','blockquote', 'canvas', 'dd', 'div', 'dl', 'dt', 'fieldset', 'figcaption', 'figure', 'footer', 'form', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'header', 'hr', 'li', 'main', 'nav', 'noscript', 'ol', 'p', 'pre', 'section', 'table', 'tfoot', 'ul', 'video'] print(f"{'default:':>32s} {repr(my_all_strings(soup))}") print(f"{'block_elements = True:':>32s} {repr(my_all_strings(soup, block_elements=True))}") print(f"{'block_elements = :':>32s} {repr(my_all_strings(soup, block_elements=block_elements))}") print(f"{'block_elements = []:':>32s} {repr(my_all_strings(soup, block_elements=[]))}") print(f"{'block_elements = False:':>32s} {repr(my_all_strings(soup, block_elements=False))}") print(f"{'block_elements = None:':>32s} {repr(my_all_strings(soup, block_elements=None))}") ==== I get this: ==== default: ['\n', 'sentence one.', 'sentence two.', '\n', 'Hello W', 'orl', 'd!', 'Test', '\n', '\n'] block_elements = True: ['\n', 'sentence one.', 'sentence two.', '\n', 'Hello W', 'orl', 'd!', 'Test', '\n', '\n'] block_elements = : ['\n', 'sentence one.', 'sentence two.', '\n', 'Hello World!', 'Test', '\n\n'] block_elements = []: ['\nsentence one.sentence two.\nHello World!Test\n\n'] block_elements = False: ['\nsentence one.sentence two.\nHello World!Test\n\n'] block_elements = None: ['\nsentence one.sentence two.\nHello World!Test\n\n'] ==== My first version was more compact (~6 lines) but the logic was obfuscated by ternary operators and sneaky short-circuits. This version is more friendly to the human and should execute just as fast. block_elements can default to True, which matches the current behavior today. If you're agreeable to the approach, I could try to submit a merge request that uses it in the _all_strings method for Tag objects.