Regression in bs4 4.4.0 when manipulating html document
Bug #1474732 reported by
Jozef Mlich
This bug report is a duplicate of:
Bug #1481520: .descendants behaves poorly on uprooted elements.
Edit
Remove
This bug affects 3 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
New
|
Undecided
|
Unassigned |
Bug Description
Please read more details in red hat buzilla.
https:/
To post a comment you must log in.
I can confirm something that looks a /lot/ like this same issue:
Traceback (most recent call last): Storage/ Scripts/ ReadableWebProx y/WebMirror/ processor/ HtmlProcessor. py", line 290, in decomposeItems Storage/ Scripts/ ReadableWebProx y/flask/ lib/python3. 4/site- packages/ bs4/element. py", line 1255, in find_all all(name, attrs, text, limit, generator, **kwargs) Storage/ Scripts/ ReadableWebProx y/flask/ lib/python3. 4/site- packages/ bs4/element. py", line 533, in _find_all Storage/ Scripts/ ReadableWebProx y/flask/ lib/python3. 4/site- packages/ bs4/element. py", line 1642, in search tag(markup) Storage/ Scripts/ ReadableWebProx y/flask/ lib/python3. 4/site- packages/ bs4/element. py", line 1613, in search_tag attr_map. get(attr) Storage/ Scripts/ ReadableWebProx y/flask/ lib/python3. 4/site- packages/ bs4/element. py", line 943, in get
File "/media/
have = soup.find_all(True, attrs=key)
File "/media/
return self._find_
File "/media/
found = strainer.search(i)
File "/media/
found = self.search_
File "/media/
attr_value = markup_
File "/media/
return self.attrs.get(key, default)
AttributeError: 'NoneType' object has no attribute 'get'
In this case, I'm doing some heavy permutation of the HTML tree, decomposing a number of elements. I can confirm the tree is non-None, and the same code works fine when I roll back to 4.3.2, so it looks like this is indeed a regression.
I can't pull out a nice test-case for the moment (too many dependencies), but I can see if I can get something together next weekend.