Beautiful Soup

Overview
Code
Bugs
Blueprints
Translations
Answers

Bug #1873640
Comment #2

Comment 2 for bug 1873640

Revision history for this message

paul@hammant.org (i-paul-h) wrote on 2020-04-19:

A little less and a little more than you need, sorry:

data=myfile.readlines()
    soup = BeautifulSoup(str(contents_of_that_file), 'html.parser')
    text = soup.find_all(text=True)
    blacklist = [
     '[document]',
     'noscript',
     'header',
     'html',
     'meta',
     'head',
     'input',
     'script',
     'dc:date',
     'title',
     # there may be more elements you don't want, such as "style", etc.
    ]
    for t in text:
if t.parent.name not in blacklist:
     if "Hand wash" in t.strip():
         print(">>" + t.strip() + "<<")