Provide a method to wrap some/all children of an element
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Triaged
|
Wishlist
|
Unassigned |
Bug Description
This is a wishlist item.
Beautiful Soup has a wrap() method that wraps a single element in a tag. Super!
There are various Beautiful Soup requests for wrapping all elements contained *inside* a parent element (wrapping the inside instead of the outside):
https:/
There are even more requests to wrap sequences of elements in a parent element that match a given criteria:
https:/
https:/
https:/
https:/
Most of the latter requests are about rebuilding hierarchical structure from flat HTML content using heading (<h1> through <h6>) elements:
####
html_doc = """
<body>
<h1>ABC Topic</h1>
<p/>
<h2>AB Subtopic</h2>
<p/>
<h3>AB Subsubtopic</h2>
<p/>
<h2>C Subtopic</h2>
<p/>
<h1>XYZ Topic</h1>
<p/>
<h2>XY Subtopic</h2>
<p/>
<h2>Z Subtopic</h2>
</body>
"""
####
It would be great if Beautiful Soup had some kind of clever wrap_children() method to wrap sequences of elements meeting some kind of criteria.
To wrap all contents, the child element criteria would simply be True.
For more complex cases, the criteria could be a tag list or a function -- the usual Soupy ways. With this, you could build structured HTML from flat HTML using a simple bottom-up loop:
####
from bs4 import BeautifulSoup
soup = BeautifulSoup(
# h6 sections starts at h6, stops at not(h1-h6)
# h5 sections starts at h5, stops at not(h1-h5)
# h4 sections starts at h4, stops at not(h1-h4)
# ...etc...
for h in reversed(range(1, 6+1)):
soup.
print(soup.
####
In addition to any user-specified arguments, the function would also somehow need (1) the current candidate object and (2) the current set of accumulated objects (if any), so that the proper decisions could be made. These could be passed to the function using a documented **kwargs convention ("candidate", "accumulated").
description: | updated |
Changed in beautifulsoup: | |
importance: | Undecided → Wishlist |
Changed in beautifulsoup: | |
status: | New → Triaged |
I'm trying to... wrap my head around the request here.
The basic wrap_children() idea seems simple enough. You're inserting a tag in between a parent and its children.
<a>
<b1>
<b2>
->
<a>
<new>
<b1>
<b2>
My main question is whether this functionality is part of jQuery, because if so I want to reuse the name. The term "wrap" itself comes from jQuery, and it looks like the jQuery equivalent of this is wrapInner:
https:/ /api.jquery. com/wrapInner/
So I'd probably call the method wrap_inner, although wrap_children sounds more "Beautiful Soup"-ish.
Anyway, where I start to lose the plot is the idea of doing this selectively. That seems like a new level of complexity being added to the core Beautiful Soup methods. I'm mainly looking at https:/ /stackoverflow. com/questions/ 73902333/ wrap-groupings- of-tags- with-python- beautifulsoup since that expresses the problem clearly for me.
We want to go from this:
<h1>Heading for Sec 1</h1>
<p>some text sec 1</p>
<p>some text sec 1</p>
<h1>Heading for Sec 2</h1>
<p>some text sec 2</p>
<p>some text sec 2</p>
To this:
<div>
<h1>Heading for Sec 1</h1>
<p>some text sec 1</p>
<p>some text sec 1</p>
</div>
<div>
<h1>Heading for Sec 2</h1>
<p>some text sec 2</p>
<p>some text sec 2</p>
</div>
Assuming there's a <div> or <body> that encompasses all the markup, there could be a method on that tag which does that. And the arguments to that method would be some way of telling Beautiful Soup how to group the tags together. But this wouldn't be like anything else in Beautiful Soup, because we're dividing the children of a tag into groups and then operating on each group, inside the method call.
When I think about accomplishing this task, I envision selecting some text in a text editor and then right-clicking on the selection to wrap it. In programming terms, I'd create an object that represents a contiguous selection and then call a method on that object. Applying this to Beautiful Soup, I'd want to keep any iterative logic (such as "do this to each group") outside of the method calls.
Let's hypothesize a method which works like find() but which returns the thing you were searching for, *plus* a ResultSet of everything that the iterator found up to that point. Then you could write code like this:
next_h1 = body.find('h1') until_next_ sibling( "h1") wrap(soup. new_tag( "div"))
while next_h1:
selection, next_h1 = next_h1.
selection.
The sleight-of-hand here is, what does it mean to call wrap() on a ResultSet? I think it means:
* Reparent every item in the ResultSet to the new tag, effectively making the ResultSet that tag's .contents.
* Place the new tag at the same position in the tree where the _first_ item in the ResultSet was originally found.
This would work even if the ResultSet didn't represent a contiguous selection, though the most likely usages of it would be operating on a contiguous selection.
What do you think of this? To put it more concretely, can you sketch out the ***MAGIC*** that you had in your example code? Because that's the core of the issue, I think.