python3-html-text binary package in Ubuntu Mantic amd64
How is html_text different from .xpath('//text()') from LXML or .get_text()
from Beautiful Soup ?
.
* Text extracted with html_text does not contain inline styles,
javascript, comments and other text that is not normally visible to
users;
* html_text normalizes whitespace, but in a way smarter than
.xpath(
are often used as block elements in html markup), and trying to avoid
adding extra spaces for punctuation;
* html-text can add newlines (e.g. after headers or paragraphs), so that
the output text looks more like how it is rendered in browsers.
Publishing history
Date | Status | Target | Component | Section | Priority | Phased updates | Version | ||
---|---|---|---|---|---|---|---|---|---|
2023-04-25 12:10:09 UTC | Published | Ubuntu Mantic amd64 | release | universe | python | Optional | 0.5.2-2 | ||
|