Add a way to make an object's generators generate the object itself at the start of the iteration

Bug #2067634 reported by Leonard Richardson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Triaged
Wishlist
Unassigned

Bug Description

Split out from bug #2052936.

There's a request to make it possible for a PageElement's search methods to consider the PageElement itself as well as its siblings, descendants, etc.

There are three possible ways of doing this:

1. Add a boolean argument called something like 'consider_self' to all of the find* methods. Since the actual implementation happens pretty far down in the call stack, this probably means doing one of the other implementations as well.

find_parents() and find_parent() take an argument called 'include_self' which could be a model for this. I really don't like the name 'include_self', though, because it implies the element in question will _always_ be included in the results, rather than merely considered for inclusion.

2. Add new generator properties to PageElement. We already have two such properties: self_and_descendants and self_and_parents. These can be combined with the new PageElement.filter() method to perform the actual searches.

3. Add a method PageElement.self_and() which takes a generator and yields the PageElement itself, then whatever the generator was going to yield.

I really don't want to go to #1 because this isn't a very common request and I'd be changing a whole lot of method signatures.

#3 doesn't look as good as I initially thought, because the method is _too_ flexible. You could pass in a generator that has nothing to do with Beautiful Soup.

Since there are already two self_and_* generators I'm okay with going route #2 and adding some more:

self_and_parents (already exists)
self_and_descendants (already exists)
self_and_next_elements
self_and_next_siblings
self_and_previous_elements
self_and_previous_siblings

This would be implemented with a private version of the self_and() method from strategy #3. At some future point we can look at usage of these methods and reevaluate strategy #1.

Now that I've noticed it, I also want to deprecate 'include_self' and rename it 'consider_self'. Since that argument has been there for many years I won't remove support for it entirely.

Changed in beautifulsoup:
status: New → Triaged
importance: Undecided → Wishlist
Revision history for this message
Mark Jones (markjones555) wrote :

It seems like you're exploring different approaches to enhance PageElement's search methods to include the element itself along with its siblings and descendants. Option #2, adding new generator properties like self_and_next_elements and self_and_next_siblings, alongside the existing self_and_descendants and self_and_parents, appears to be a pragmatic choice. This method could leverage the PageElement.filter() method effectively for comprehensive searches. Deprecating 'include_self' in favor of 'consider_self' reflects a thoughtful evolution in method naming, maintaining backward compatibility while aligning with your refined approach.
www.gbpro.pro

Revision history for this message
Leonard Richardson (leonardr) wrote :

What the heck is that, LLM spam? I can't delete that comment but I'll delete and recreate the entire issue if I have to.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.