Unexpected exception: TypeError: cannot unpack non-iterable NoneType object
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
I'm getting the following stacktrace when running the following python script on the following input on beautifulsoup4, version: 4.9.1
```
$ python3 bs4_repro.py crash-7b7ff74a3
/home/jvoisin/
warnings.
Traceback (most recent call last):
File "bs4_repro.py", line 14, in <module>
main()
File "bs4_repro.py", line 12, in main
BeautifulSo
File "/home/
self._feed()
File "/home/
self.
File "/home/
parser.
File "/usr/lib/
self.goahead(0)
File "/usr/lib/
k = self.parse_
File "/usr/lib/
return self.parse_
File "/usr/lib/
sectName, j = self._scan_name( i+3, i )
TypeError: cannot unpack non-iterable NoneType object
zsh: exit 1 python3 bs4_repro.py
```
Reproducing script:
```
from bs4 import BeautifulSoup
import sys
def main ():
with open(sys.argv[1], 'rb') as f:
buf = f.read()
parsers = ['lxml-xml', 'html5lib', 'html.parser', 'lxml']
try:
idx = int(buf[0]) % len(parsers)
except ValueError:
return
main()
```
Input file (use `xxd -r` to transform the hexdump into a file):
```
$xxd crash-7b7ff74a3
00000000: 0a68 745c 6e74 6f75 6368 656e 646d 6c3e .ht\ntouchendml>
00000010: 3c42 6f64 793e 7fff ffff 643e 0002 693e <Body>....d>..i>
00000020: 2d75 6c59 743c 7472 743e 3c3c 6474 3e3e -ulYt<trt><<dt>>
00000030: 3cd5 7265 3c2f 6c69 3e3c 215b 2d20 2d3c <.re</li><![- -<
00000040: 103c 2f68 6c72 653e 3c68 723e 6d6c 6f6e .</hlre><hr>mlon
00000050: 7265 7365 743e 0a3c 6474 3e3c 70ae 653e reset>.<dt><p.e>
00000060: 3c68 723e
```
Thanks for taking the time to file this issue. The problem here is in Python 3's html.parser library. Here's a script that duplicates the error without using any Beautiful Soup code:
---
from html.parser import HTMLParser
import warnings
bad_markup = '\nht\\ ntouchendml> <Body>\ x7fÿÿÿd> \x00\x02i> -ulYt<trt> <<dt>>< Õre</li> <![- -<\x10< /hlre>< hr>mlonreset> \n<dt>< p®e><hr> '
class MyParser( HTMLParser) :
warnings. warn(msg)
def error(self, msg):
parser = MyParser() feed(bad_ markup)
parser.
---
Someone else filed this issue against Python last year: https:/ /bugs.python. org/issue37747
I've updated it with a link to this ticket and a copy of my duplication script.