Unexpected exception: TypeError: cannot unpack non-iterable NoneType object

Bug #1883104 reported by jvoisin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Won't Fix
Undecided
Unassigned

Bug Description

I'm getting the following stacktrace when running the following python script on the following input on beautifulsoup4, version: 4.9.1

```
$ python3 bs4_repro.py crash-7b7ff74a3ccefdf713361731ee391c24592bd6509f257b1f98193d87b35cd6c8
/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/builder/_htmlparser.py:102: UserWarning: expected name token at '<![- -<\x10</hlre><hr>m'
  warnings.warn(msg)
Traceback (most recent call last):
  File "bs4_repro.py", line 14, in <module>
    main()
  File "bs4_repro.py", line 12, in main
    BeautifulSoup(buf, features=parsers[idx]).prettify()
  File "/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/__init__.py", line 345, in __init__
    self._feed()
  File "/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/__init__.py", line 431, in _feed
    self.builder.feed(self.markup)
  File "/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/builder/_htmlparser.py", line 377, in feed
    parser.feed(markup)
  File "/usr/lib/python3.8/html/parser.py", line 111, in feed
    self.goahead(0)
  File "/usr/lib/python3.8/html/parser.py", line 179, in goahead
    k = self.parse_html_declaration(i)
  File "/usr/lib/python3.8/html/parser.py", line 264, in parse_html_declaration
    return self.parse_marked_section(i)
  File "/usr/lib/python3.8/_markupbase.py", line 149, in parse_marked_section
    sectName, j = self._scan_name( i+3, i )
TypeError: cannot unpack non-iterable NoneType object
zsh: exit 1 python3 bs4_repro.py
```

Reproducing script:

```
from bs4 import BeautifulSoup
import sys

def main ():
    with open(sys.argv[1], 'rb') as f:
        buf = f.read()
        parsers = ['lxml-xml', 'html5lib', 'html.parser', 'lxml']
        try:
            idx = int(buf[0]) % len(parsers)
        except ValueError:
            return
        BeautifulSoup(buf, features=parsers[idx]).prettify()

main()

```

Input file (use `xxd -r` to transform the hexdump into a file):

```
$xxd crash-7b7ff74a3ccefdf713361731ee391c24592bd6509f257b1f98193d87b35cd6c8
00000000: 0a68 745c 6e74 6f75 6368 656e 646d 6c3e .ht\ntouchendml>
00000010: 3c42 6f64 793e 7fff ffff 643e 0002 693e <Body>....d>..i>
00000020: 2d75 6c59 743c 7472 743e 3c3c 6474 3e3e -ulYt<trt><<dt>>
00000030: 3cd5 7265 3c2f 6c69 3e3c 215b 2d20 2d3c <.re</li><![- -<
00000040: 103c 2f68 6c72 653e 3c68 723e 6d6c 6f6e .</hlre><hr>mlon
00000050: 7265 7365 743e 0a3c 6474 3e3c 70ae 653e reset>.<dt><p.e>
00000060: 3c68 723e
```

Tags: exception
Revision history for this message
jvoisin (julien-voisin) wrote :
description: updated
Revision history for this message
Leonard Richardson (leonardr) wrote :

Thanks for taking the time to file this issue. The problem here is in Python 3's html.parser library. Here's a script that duplicates the error without using any Beautiful Soup code:

---
from html.parser import HTMLParser
import warnings

bad_markup = '\nht\\ntouchendml><Body>\x7fÿÿÿd>\x00\x02i>-ulYt<trt><<dt>><Õre</li><![- -<\x10</hlre><hr>mlonreset>\n<dt><p®e><hr>'

class MyParser(HTMLParser):
    def error(self, msg):
        warnings.warn(msg)

parser = MyParser()
parser.feed(bad_markup)
---

Someone else filed this issue against Python last year: https://bugs.python.org/issue37747

I've updated it with a link to this ticket and a copy of my duplication script.

Changed in beautifulsoup:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.