2020-06-11 12:29:25 |
jvoisin |
description |
I'm getting the following stacktrace when running the following python script on the following input:
```
$ python3 bs4_repro.py crash-7b7ff74a3ccefdf713361731ee391c24592bd6509f257b1f98193d87b35cd6c8
/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/builder/_htmlparser.py:102: UserWarning: expected name token at '<![- -<\x10</hlre><hr>m'
warnings.warn(msg)
Traceback (most recent call last):
File "bs4_repro.py", line 14, in <module>
main()
File "bs4_repro.py", line 12, in main
BeautifulSoup(buf, features=parsers[idx]).prettify()
File "/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/__init__.py", line 345, in __init__
self._feed()
File "/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/__init__.py", line 431, in _feed
self.builder.feed(self.markup)
File "/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/builder/_htmlparser.py", line 377, in feed
parser.feed(markup)
File "/usr/lib/python3.8/html/parser.py", line 111, in feed
self.goahead(0)
File "/usr/lib/python3.8/html/parser.py", line 179, in goahead
k = self.parse_html_declaration(i)
File "/usr/lib/python3.8/html/parser.py", line 264, in parse_html_declaration
return self.parse_marked_section(i)
File "/usr/lib/python3.8/_markupbase.py", line 149, in parse_marked_section
sectName, j = self._scan_name( i+3, i )
TypeError: cannot unpack non-iterable NoneType object
zsh: exit 1 python3 bs4_repro.py
```
Reproducing script:
```
from bs4 import BeautifulSoup
import sys
def main ():
with open(sys.argv[1], 'rb') as f:
buf = f.read()
parsers = ['lxml-xml', 'html5lib', 'html.parser', 'lxml']
try:
idx = int(buf[0]) % len(parsers)
except ValueError:
return
BeautifulSoup(buf, features=parsers[idx]).prettify()
main()
```
Input file (use `xxd -r` to transform the hexdump into a file):
```
$xxd crash-7b7ff74a3ccefdf713361731ee391c24592bd6509f257b1f98193d87b35cd6c8
00000000: 0a68 745c 6e74 6f75 6368 656e 646d 6c3e .ht\ntouchendml>
00000010: 3c42 6f64 793e 7fff ffff 643e 0002 693e <Body>....d>..i>
00000020: 2d75 6c59 743c 7472 743e 3c3c 6474 3e3e -ulYt<trt><<dt>>
00000030: 3cd5 7265 3c2f 6c69 3e3c 215b 2d20 2d3c <.re</li><![- -<
00000040: 103c 2f68 6c72 653e 3c68 723e 6d6c 6f6e .</hlre><hr>mlon
00000050: 7265 7365 743e 0a3c 6474 3e3c 70ae 653e reset>.<dt><p.e>
00000060: 3c68 723e
``` |
I'm getting the following stacktrace when running the following python script on the following input on beautifulsoup4, version: 4.9.1
```
$ python3 bs4_repro.py crash-7b7ff74a3ccefdf713361731ee391c24592bd6509f257b1f98193d87b35cd6c8
/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/builder/_htmlparser.py:102: UserWarning: expected name token at '<![- -<\x10</hlre><hr>m'
warnings.warn(msg)
Traceback (most recent call last):
File "bs4_repro.py", line 14, in <module>
main()
File "bs4_repro.py", line 12, in main
BeautifulSoup(buf, features=parsers[idx]).prettify()
File "/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/__init__.py", line 345, in __init__
self._feed()
File "/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/__init__.py", line 431, in _feed
self.builder.feed(self.markup)
File "/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/builder/_htmlparser.py", line 377, in feed
parser.feed(markup)
File "/usr/lib/python3.8/html/parser.py", line 111, in feed
self.goahead(0)
File "/usr/lib/python3.8/html/parser.py", line 179, in goahead
k = self.parse_html_declaration(i)
File "/usr/lib/python3.8/html/parser.py", line 264, in parse_html_declaration
return self.parse_marked_section(i)
File "/usr/lib/python3.8/_markupbase.py", line 149, in parse_marked_section
sectName, j = self._scan_name( i+3, i )
TypeError: cannot unpack non-iterable NoneType object
zsh: exit 1 python3 bs4_repro.py
```
Reproducing script:
```
from bs4 import BeautifulSoup
import sys
def main ():
with open(sys.argv[1], 'rb') as f:
buf = f.read()
parsers = ['lxml-xml', 'html5lib', 'html.parser', 'lxml']
try:
idx = int(buf[0]) % len(parsers)
except ValueError:
return
BeautifulSoup(buf, features=parsers[idx]).prettify()
main()
```
Input file (use `xxd -r` to transform the hexdump into a file):
```
$xxd crash-7b7ff74a3ccefdf713361731ee391c24592bd6509f257b1f98193d87b35cd6c8
00000000: 0a68 745c 6e74 6f75 6368 656e 646d 6c3e .ht\ntouchendml>
00000010: 3c42 6f64 793e 7fff ffff 643e 0002 693e <Body>....d>..i>
00000020: 2d75 6c59 743c 7472 743e 3c3c 6474 3e3e -ulYt<trt><<dt>>
00000030: 3cd5 7265 3c2f 6c69 3e3c 215b 2d20 2d3c <.re</li><![- -<
00000040: 103c 2f68 6c72 653e 3c68 723e 6d6c 6f6e .</hlre><hr>mlon
00000050: 7265 7365 743e 0a3c 6474 3e3c 70ae 653e reset>.<dt><p.e>
00000060: 3c68 723e
``` |
|