Calling 'string' or 'text' methods on a script-type tag returns no results
Bug #1906226 reported by
Mitar
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
.stripped_strings on <script> tags do not return its contents, while .string does. On the <p> both return the same. Example:
>>> BeautifulSoup(
'Test.'
>>> list(BeautifulS
[]
Compare with:
>>> BeautifulSoup(
'Test.'
>>> list(BeautifulS
['Test.']
Tested on beautifulsoup4=
summary: |
- .string vs. .stripped_strings on script tag + Calling 'string' or 'text' methods on a script-type tag returns no + results |
Changed in beautifulsoup: | |
status: | New → Fix Committed |
To post a comment you must log in.
Let's imagine you have markup like this:
<div>
Some text.
<script>
Some more text.
</script>
</div>
Generally speaking, users expect div.get_text() to say "Some text.", not "Some text. Some more text." People don't consider the contents of <script> tags to be "text". That's the original point of issue #1868861.
However, when you call get_text() *on a <script> tag*, it's reasonable to assume that you _do_ consider the contents of a <script> tag to be "text"--otherwise you wouldn't bother calling the method. The way I implemented #1868861 excludes "Some more text." even when you call script.get_text(). That's the underlying cause of this issue.
Revision 600 puts a system into place that changes the behavior of tags like <script>, <style>, and <template> to something more like what you are looking for.