UnboundLocalError: local variable 'match' referenced before assignment

Bug #1883264 reported by jvoisin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Won't Fix
Undecided
Unassigned

Bug Description

I'm getting the following stacktrace when running the following python script on the following input on beautifulsoup4, version: 4.9.1

```
$ python3 bs4_repro.py crash-54df4824926ab452e78144c7bdeb6b35397323db78f2919a54be6a643c83e224
/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/builder/_htmlparser.py:102: UserWarning: unknown status keyword 'c_bc_h' in marked section
  warnings.warn(msg)
Traceback (most recent call last):
  File "bs4_repro.py", line 14, in <module>
    main()
  File "bs4_repro.py", line 12, in main
    BeautifulSoup(buf, features=parsers[idx]).prettify()
  File "/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/__init__.py", line 345, in __init__
    self._feed()
  File "/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/__init__.py", line 431, in _feed
    self.builder.feed(self.markup)
  File "/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/builder/_htmlparser.py", line 377, in feed
    parser.feed(markup)
  File "/usr/lib/python3.8/html/parser.py", line 111, in feed
    self.goahead(0)
  File "/usr/lib/python3.8/html/parser.py", line 179, in goahead
    k = self.parse_html_declaration(i)
  File "/usr/lib/python3.8/html/parser.py", line 264, in parse_html_declaration
    return self.parse_marked_section(i)
  File "/usr/lib/python3.8/_markupbase.py", line 160, in parse_marked_section
    if not match:
UnboundLocalError: local variable 'match' referenced before assignment
zsh: exit 1 python3 bs4_repro.py
$
```

Reproducing script:

```
from bs4 import BeautifulSoup
import sys

def main ():
    with open(sys.argv[1], 'rb') as f:
        buf = f.read()
        parsers = ['lxml-xml', 'html5lib', 'html.parser', 'lxml']
        try:
            idx = int(buf[0]) % len(parsers)
        except ValueError:
            return
        BeautifulSoup(buf, features=parsers[idx]).prettify()

main()

```

Input file (use `xxd -r` to transform the hexdump into a file):

```
00000000: 1a00 746d 6c2f 3c21 5b63 5f62 635f 683b ..tml/<![c_bc_h;
00000010: 3c75 6c3e 3c2f 6cc6 9e69 bb0c 0005 3e3c <ul></l..i....><
00000020: 212d 2d20 2d29 193b 0108 2d3e 3c2f 215b !-- -).;..-></![
00000030: 635f 623d 3b69 7469 3e3c 212d c_b=;iti><!-
```

It looks like the following section of /usr/lib/python3.8/_markupbase.py is the problem:

```
    # Override this to handle MS-word extension syntax <![if word]>content<![endif]>
    def parse_marked_section(self, i, report=1):
        rawdata= self.rawdata
        assert rawdata[i:i+3] == '<![', "unexpected call to parse_marked_section()"
        sectName, j = self._scan_name( i+3, i )
        if j < 0:
            return j
        if sectName in {"temp", "cdata", "ignore", "include", "rcdata"}:
            # look for standard ]]> ending
            match= _markedsectionclose.search(rawdata, i+3)
        elif sectName in {"if", "else", "endif"}:
            # look for MS Office ]> ending
            match= _msmarkedsectionclose.search(rawdata, i+3)
        else:
            self.error('unknown status keyword %r in marked section' % rawdata[i+3:j])
        if not match:
            return -1
        if report:
            j = match.start(0)
            self.unknown_decl(rawdata[i+3: j])
        return match.end(0)
```

The `self.error` function isn't apparently supposed to return, but only to abort.

Tags: exception
Revision history for this message
jvoisin (julien-voisin) wrote :
Revision history for this message
Leonard Richardson (leonardr) wrote :

Thanks for taking the time to investigate this issue. As with bug #1883104, this is a problem with Python's html.parser library. It was filed against Python as https://bugs.python.org/issue34480. I've added a comment to the Python bug referring to this issue.

Changed in beautifulsoup:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.