Beautiful Soup

Overview
Code
Bugs
Blueprints
Translations
Answers

Bug #1768330
Comment #8

Comment 8 for bug 1768330

Revision history for this message

Leonard Richardson (leonardr) wrote on 2020-09-26:

I've marked bug 1882067 as a duplicate of this issue, although they're not directly related, because I think they come from the same place: a desire to use Beautiful Soup as a text preprocessor that can strip away "useless" markup.

In this case the concern is that some of the "useless" markup isn't so useless -- it conveys conceptual separations that are lost when you just extract all the text. In the case of 1882067, the concern is that some of the *text* is useless -- it's just whitespace and newlines that won't render in a web browser and ought to be collapsed for reading.

The challenge in both cases is distinguishing the "useless" stuff from the "useful" stuff.