Beautiful Soup

Bug #1882067
Comment #6

Comment 6 for bug 1882067

Revision history for this message

ptoche (ptoche) wrote on 2020-06-09:

Hi Leonard, thanks for your reply!

My effort is likely a result of my limited understanding of what the built-in functions can do. I agree that the keyword extract suggests nothing more than removal, so that extracting a tag in a sentence like this:

A single-line sentence with <b>useless</b> tag.

can reasonably be expected to yield:

A single-line sentence with tag.

with two empty spaces between 'with' and 'tag'.

Or a sentence like this:

    A multiple-line sentence with
    <b>useless</b>
    tag.

can reasonably be expected to yield:

A multiple-line sentence with

tag.

with one empty line in the middle.

However, it won't be all that often that users desire these blank spaces/lines. Now I understand that you may not feel that the option to remove these spaces should be accessed inside the `extract()` method. And indeed it makes sense to have a separate smooth-ing function. I had tried to use `smooth()` before hacking `extract()`, but my attempts had failed somewhere. I do not have a clear recollection of why, perhaps because I wasn't looping over all the children properly. I'm new to the package and my first reaction was to browse for solutions on stackoverflow and in github repos: Most of the approaches involve parsing a list and join-ing the strings together. As I was passing BeautifulSoup objects and tags around, I was bothered by the detour to string and thought about hacking extract(). But let me see if I can make `smooth()` do what I was looking for.

Thanks again!