My effort is likely a result of my limited understanding of what the built-in functions can do. I agree that the keyword extract suggests nothing more than removal, so that extracting a tag in a sentence like this:
A single-line sentence with <b>useless</b> tag.
can reasonably be expected to yield:
A single-line sentence with tag.
with two empty spaces between 'with' and 'tag'.
Or a sentence like this:
A multiple-line sentence with
<b>useless</b>
tag.
can reasonably be expected to yield:
A multiple-line sentence with
tag.
with one empty line in the middle.
However, it won't be all that often that users desire these blank spaces/lines. Now I understand that you may not feel that the option to remove these spaces should be accessed inside the `extract()` method. And indeed it makes sense to have a separate smooth-ing function. I had tried to use `smooth()` before hacking `extract()`, but my attempts had failed somewhere. I do not have a clear recollection of why, perhaps because I wasn't looping over all the children properly. I'm new to the package and my first reaction was to browse for solutions on stackoverflow and in github repos: Most of the approaches involve parsing a list and join-ing the strings together. As I was passing BeautifulSoup objects and tags around, I was bothered by the detour to string and thought about hacking extract(). But let me see if I can make `smooth()` do what I was looking for.
Hi Leonard, thanks for your reply!
My effort is likely a result of my limited understanding of what the built-in functions can do. I agree that the keyword extract suggests nothing more than removal, so that extracting a tag in a sentence like this:
A single-line sentence with <b>useless</b> tag.
can reasonably be expected to yield:
A single-line sentence with tag.
with two empty spaces between 'with' and 'tag'.
Or a sentence like this:
A multiple-line sentence with
<b>useless</b>
tag.
can reasonably be expected to yield:
A multiple-line sentence with
tag.
with one empty line in the middle.
However, it won't be all that often that users desire these blank spaces/lines. Now I understand that you may not feel that the option to remove these spaces should be accessed inside the `extract()` method. And indeed it makes sense to have a separate smooth-ing function. I had tried to use `smooth()` before hacking `extract()`, but my attempts had failed somewhere. I do not have a clear recollection of why, perhaps because I wasn't looping over all the children properly. I'm new to the package and my first reaction was to browse for solutions on stackoverflow and in github repos: Most of the approaches involve parsing a list and join-ing the strings together. As I was passing BeautifulSoup objects and tags around, I was bothered by the detour to string and thought about hacking extract(). But let me see if I can make `smooth()` do what I was looking for.
Thanks again!