2023-11-22 17:37:19 |
Chris Papademetrious |
description |
This is a wishlist item.
Beautiful Soup has a wrap() function that wraps a single element in a tag. Super!
There are various Beautiful Soup requests for wrapping all elements contained *inside* a parent element (wrapping the inside instead of the outside):
https://stackoverflow.com/questions/20789798/how-to-use-beautifulsoup-to-wrap-body-contents-with-div-container
https://stackoverflow.com/questions/22632355/wrap-the-contents-of-a-tag-with-beautifulsoup
https://stackoverflow.com/questions/26448605/how-to-wrap-multiple-tags-under-a-new-tag-in-beautifulsoup
There are even more requests to wrap sequences of elements in a parent element that match a given criteria:
https://stackoverflow.com/questions/17605801/wrap-all-next-elements-in-beautifulsoup
https://stackoverflow.com/questions/73902333/wrap-groupings-of-tags-with-python-beautifulsoup
https://stackoverflow.com/questions/73913938/how-to-wrap-a-new-tag-around-multiple-tags-with-beautifulsoup
https://stackoverflow.com/questions/32274222/wrap-multiple-tags-with-beautifulsoup
https://stackoverflow.com/questions/59033884/wrap-multiple-list-items-in-a-new-tag-ul-ol-using-beautiful-soup
https://stackoverflow.com/questions/45009059/how-to-wrap-with-adjacent-tag-with-beautiful-soup
Most of the latter requests are about rebuilding hierarchical structure from flat HTML content using heading (<h1> through <h6>) elements:
####
html_doc = """
<body>
<h1>ABC Topic</h1>
<p/>
<h2>AB Subtopic</h2>
<p/>
<h3>AB Subsubtopic</h2>
<p/>
<h2>C Subtopic</h2>
<p/>
<h1>XYZ Topic</h1>
<p/>
<h2>XY Subtopic</h2>
<p/>
<h2>Z Subtopic</h2>
</body>
"""
####
It would be great if Beautiful Soup had some kind of clever wrap_children() method to wrap sequences of elements meeting some kind of criteria.
To wrap all contents, the child element criteria would simply be True.
For more complex cases, the criteria could be a tag list or a function -- the usual Soupy ways. With this, you could build structured HTML from flat HTML using a simple bottom-up loop:
####
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
# h6 sections starts at h6, stops at not(h1-h6)
# h5 sections starts at h5, stops at not(h1-h5)
# h4 sections starts at h4, stops at not(h1-h4)
# ...etc...
for h in reversed(range(1, 6+1)):
soup.body.wrap_children(***MAGIC***, 'article')
print(soup.prettify())
####
In addition to any user-specified arguments, the function would also somehow need (1) the current candidate object and (2) the current set of accumulated objects (if any), so that the proper decisions could be made. These could be passed to the function using a documented **kwargs convention ("candidate", "accumulated"). |
This is a wishlist item.
Beautiful Soup has a wrap() method that wraps a single element in a tag. Super!
There are various Beautiful Soup requests for wrapping all elements contained *inside* a parent element (wrapping the inside instead of the outside):
https://stackoverflow.com/questions/20789798/how-to-use-beautifulsoup-to-wrap-body-contents-with-div-container
https://stackoverflow.com/questions/22632355/wrap-the-contents-of-a-tag-with-beautifulsoup
https://stackoverflow.com/questions/26448605/how-to-wrap-multiple-tags-under-a-new-tag-in-beautifulsoup
There are even more requests to wrap sequences of elements in a parent element that match a given criteria:
https://stackoverflow.com/questions/17605801/wrap-all-next-elements-in-beautifulsoup
https://stackoverflow.com/questions/73902333/wrap-groupings-of-tags-with-python-beautifulsoup
https://stackoverflow.com/questions/73913938/how-to-wrap-a-new-tag-around-multiple-tags-with-beautifulsoup
https://stackoverflow.com/questions/32274222/wrap-multiple-tags-with-beautifulsoup
https://stackoverflow.com/questions/59033884/wrap-multiple-list-items-in-a-new-tag-ul-ol-using-beautiful-soup
https://stackoverflow.com/questions/45009059/how-to-wrap-with-adjacent-tag-with-beautiful-soup
Most of the latter requests are about rebuilding hierarchical structure from flat HTML content using heading (<h1> through <h6>) elements:
####
html_doc = """
<body>
<h1>ABC Topic</h1>
<p/>
<h2>AB Subtopic</h2>
<p/>
<h3>AB Subsubtopic</h2>
<p/>
<h2>C Subtopic</h2>
<p/>
<h1>XYZ Topic</h1>
<p/>
<h2>XY Subtopic</h2>
<p/>
<h2>Z Subtopic</h2>
</body>
"""
####
It would be great if Beautiful Soup had some kind of clever wrap_children() method to wrap sequences of elements meeting some kind of criteria.
To wrap all contents, the child element criteria would simply be True.
For more complex cases, the criteria could be a tag list or a function -- the usual Soupy ways. With this, you could build structured HTML from flat HTML using a simple bottom-up loop:
####
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
# h6 sections starts at h6, stops at not(h1-h6)
# h5 sections starts at h5, stops at not(h1-h5)
# h4 sections starts at h4, stops at not(h1-h4)
# ...etc...
for h in reversed(range(1, 6+1)):
soup.body.wrap_children(***MAGIC***, 'article')
print(soup.prettify())
####
In addition to any user-specified arguments, the function would also somehow need (1) the current candidate object and (2) the current set of accumulated objects (if any), so that the proper decisions could be made. These could be passed to the function using a documented **kwargs convention ("candidate", "accumulated"). |
|