UnboundLocalError: local variable 'match' referenced before assignment
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Beautiful Soup |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
I'm getting the following stacktrace when running the following python script on the following input on beautifulsoup4, version: 4.9.1
```
$ python3 bs4_repro.py crash-54df48249
/home/jvoisin/
warnings.
Traceback (most recent call last):
File "bs4_repro.py", line 14, in <module>
main()
File "bs4_repro.py", line 12, in main
BeautifulSo
File "/home/
self._feed()
File "/home/
self.
File "/home/
parser.
File "/usr/lib/
self.goahead(0)
File "/usr/lib/
k = self.parse_
File "/usr/lib/
return self.parse_
File "/usr/lib/
if not match:
UnboundLocalError: local variable 'match' referenced before assignment
zsh: exit 1 python3 bs4_repro.py
$
```
Reproducing script:
```
from bs4 import BeautifulSoup
import sys
def main ():
with open(sys.argv[1], 'rb') as f:
buf = f.read()
parsers = ['lxml-xml', 'html5lib', 'html.parser', 'lxml']
try:
idx = int(buf[0]) % len(parsers)
except ValueError:
return
main()
```
Input file (use `xxd -r` to transform the hexdump into a file):
```
00000000: 1a00 746d 6c2f 3c21 5b63 5f62 635f 683b ..tml/<![c_bc_h;
00000010: 3c75 6c3e 3c2f 6cc6 9e69 bb0c 0005 3e3c <ul></l..i....><
00000020: 212d 2d20 2d29 193b 0108 2d3e 3c2f 215b !-- -).;..-></![
00000030: 635f 623d 3b69 7469 3e3c 212d c_b=;iti><!-
```
It looks like the following section of /usr/lib/
```
# Override this to handle MS-word extension syntax <![if word]>content<
def parse_marked_
rawdata= self.rawdata
assert rawdata[i:i+3] == '<![', "unexpected call to parse_marked_
sectName, j = self._scan_name( i+3, i )
if j < 0:
return j
if sectName in {"temp", "cdata", "ignore", "include", "rcdata"}:
# look for standard ]]> ending
match= _markedsectionc
elif sectName in {"if", "else", "endif"}:
# look for MS Office ]> ending
match= _msmarkedsectio
else:
if not match:
return -1
if report:
j = match.start(0)
return match.end(0)
```
The `self.error` function isn't apparently supposed to return, but only to abort.
Thanks for taking the time to investigate this issue. As with bug #1883104, this is a problem with Python's html.parser library. It was filed against Python as https:/ /bugs.python. org/issue34480. I've added a comment to the Python bug referring to this issue.