Comment 26 for bug 1768330

Revision history for this message
Chris Papademetrious (chrispitude) wrote :

Eep, I forgot about preserving newlines in <pre> blocks:

====
html_doc = """
<body>
  <p>line 1</p>
  <pre>line 2

line 3

line 4</pre>
  <p>line 5</p>
</body>
"""

soup = BeautifulSoup(html_doc, "lxml")
print(f"###{my_get_text(soup, block_elements=block_elements)}###")
====

====
###line 1
line 2
line 3
line 4
line 5###
====

so if we want to preserve newlines inside block elements, we'll need to write a manual concatenation loop that considers the end of the previous string and the beginning of the next string. It's a solvable problem, we just need to decide what the desired behavior is, then implement it. My guess is to insert a newline between any two block strings where non-newline characters would come together.