Beautiful Soup

Bug #1883104
Activity log

Activity log for bug #1883104

Date	Who	What changed	Old value	New value	Message
2020-06-11 12:24:39	jvoisin	bug			added bug
2020-06-11 12:24:39	jvoisin	attachment added		Crashing file https://bugs.launchpad.net/bugs/1883104/+attachment/5382926/+files/crash-7b7ff74a3ccefdf713361731ee391c24592bd6509f257b1f98193d87b35cd6c8
2020-06-11 12:29:25	jvoisin	description	I'm getting the following stacktrace when running the following python script on the following input: ``` $ python3 bs4_repro.py crash-7b7ff74a3ccefdf713361731ee391c24592bd6509f257b1f98193d87b35cd6c8 /home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/builder/_htmlparser.py:102: UserWarning: expected name token at '<![- -<\x10</hlre><hr>m' warnings.warn(msg) Traceback (most recent call last): File "bs4_repro.py", line 14, in <module> main() File "bs4_repro.py", line 12, in main BeautifulSoup(buf, features=parsers[idx]).prettify() File "/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/__init__.py", line 345, in __init__ self._feed() File "/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/__init__.py", line 431, in _feed self.builder.feed(self.markup) File "/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/builder/_htmlparser.py", line 377, in feed parser.feed(markup) File "/usr/lib/python3.8/html/parser.py", line 111, in feed self.goahead(0) File "/usr/lib/python3.8/html/parser.py", line 179, in goahead k = self.parse_html_declaration(i) File "/usr/lib/python3.8/html/parser.py", line 264, in parse_html_declaration return self.parse_marked_section(i) File "/usr/lib/python3.8/_markupbase.py", line 149, in parse_marked_section sectName, j = self._scan_name( i+3, i ) TypeError: cannot unpack non-iterable NoneType object zsh: exit 1 python3 bs4_repro.py ``` Reproducing script: ``` from bs4 import BeautifulSoup import sys def main (): with open(sys.argv[1], 'rb') as f: buf = f.read() parsers = ['lxml-xml', 'html5lib', 'html.parser', 'lxml'] try: idx = int(buf[0]) % len(parsers) except ValueError: return BeautifulSoup(buf, features=parsers[idx]).prettify() main() ``` Input file (use `xxd -r` to transform the hexdump into a file): ``` $xxd crash-7b7ff74a3ccefdf713361731ee391c24592bd6509f257b1f98193d87b35cd6c8 00000000: 0a68 745c 6e74 6f75 6368 656e 646d 6c3e .ht\ntouchendml> 00000010: 3c42 6f64 793e 7fff ffff 643e 0002 693e <Body>....d>..i> 00000020: 2d75 6c59 743c 7472 743e 3c3c 6474 3e3e -ulYt<trt><<dt>> 00000030: 3cd5 7265 3c2f 6c69 3e3c 215b 2d20 2d3c <.re</li><![- -< 00000040: 103c 2f68 6c72 653e 3c68 723e 6d6c 6f6e .</hlre><hr>mlon 00000050: 7265 7365 743e 0a3c 6474 3e3c 70ae 653e reset>.<dt><p.e> 00000060: 3c68 723e ```	I'm getting the following stacktrace when running the following python script on the following input on beautifulsoup4, version: 4.9.1 ``` $ python3 bs4_repro.py crash-7b7ff74a3ccefdf713361731ee391c24592bd6509f257b1f98193d87b35cd6c8 /home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/builder/_htmlparser.py:102: UserWarning: expected name token at '<![- -<\x10</hlre><hr>m' warnings.warn(msg) Traceback (most recent call last): File "bs4_repro.py", line 14, in <module> main() File "bs4_repro.py", line 12, in main BeautifulSoup(buf, features=parsers[idx]).prettify() File "/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/__init__.py", line 345, in __init__ self._feed() File "/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/__init__.py", line 431, in _feed self.builder.feed(self.markup) File "/home/jvoisin/dev/pythonfuzz/ven/lib/python3.8/site-packages/bs4/builder/_htmlparser.py", line 377, in feed parser.feed(markup) File "/usr/lib/python3.8/html/parser.py", line 111, in feed self.goahead(0) File "/usr/lib/python3.8/html/parser.py", line 179, in goahead k = self.parse_html_declaration(i) File "/usr/lib/python3.8/html/parser.py", line 264, in parse_html_declaration return self.parse_marked_section(i) File "/usr/lib/python3.8/_markupbase.py", line 149, in parse_marked_section sectName, j = self._scan_name( i+3, i ) TypeError: cannot unpack non-iterable NoneType object zsh: exit 1 python3 bs4_repro.py ``` Reproducing script: ``` from bs4 import BeautifulSoup import sys def main (): with open(sys.argv[1], 'rb') as f: buf = f.read() parsers = ['lxml-xml', 'html5lib', 'html.parser', 'lxml'] try: idx = int(buf[0]) % len(parsers) except ValueError: return BeautifulSoup(buf, features=parsers[idx]).prettify() main() ``` Input file (use `xxd -r` to transform the hexdump into a file): ``` $xxd crash-7b7ff74a3ccefdf713361731ee391c24592bd6509f257b1f98193d87b35cd6c8 00000000: 0a68 745c 6e74 6f75 6368 656e 646d 6c3e .ht\ntouchendml> 00000010: 3c42 6f64 793e 7fff ffff 643e 0002 693e <Body>....d>..i> 00000020: 2d75 6c59 743c 7472 743e 3c3c 6474 3e3e -ulYt<trt><<dt>> 00000030: 3cd5 7265 3c2f 6c69 3e3c 215b 2d20 2d3c <.re</li><![- -< 00000040: 103c 2f68 6c72 653e 3c68 723e 6d6c 6f6e .</hlre><hr>mlon 00000050: 7265 7365 743e 0a3c 6474 3e3c 70ae 653e reset>.<dt><p.e> 00000060: 3c68 723e ```
2020-06-11 20:24:32	Leonard Richardson	bug watch added		http://bugs.python.org/issue37747
2020-06-11 20:24:48	Leonard Richardson	beautifulsoup: status	New	Won't Fix