lxml crashes on certain pages
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
New
|
Undecided
|
Unassigned |
Bug Description
The following code randomly crashes Python interpreter (both 2.7.6 and 2.7.11 versions) on Windows 8:
from bs4 import BeautifulSoup
with open('page.html', 'r') as f:
content = f.read()
for i in xrange(1000000000):
print(i)
soup = BeautifulSoup(
As I stated in the summary of this bug, the crash happens only on certain pages, so I attached an example of such file to this report.
=======
There's no additional output in stdout / stderr so the only information I have at the moment is the standard error info from the corresponding Windows dialog (note that the Fault Module Name is "etree.pyd"):
Problem signature:
Problem Event Name: APPCRASH
Application Name: python.exe
Application Version: 0.0.0.0
Application Timestamp: 56634a05
Fault Module Name: etree.pyd
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 56470805
Exception Code: c0000005
Exception Offset: 0011e3fa
OS Version: 6.2.9200.
Locale ID: 1033
Additional Information 1: 5861
Additional Information 2: 5861822e1919d7c
Additional Information 3: dac6
Additional Information 4: dac6c2650fa14dd
Read our privacy statement online:
http://
If the online privacy statement is not available, please read our privacy statement offline:
C:\Windows\
=======
Moreover I noticed that the following code doesn't crash at all:
from lxml import etree
with open('page.html', 'r') as f:
content = f.read()
for i in xrange(1000000000):
print(i)
tree = etree.HTML(content)
I know that there must be some error in BeautifulSoup library then but I think that the incorrect usage of lxml should not crash an interpreter anyway.
=======
lxml versions -- 3.4.4 and 3.5.0
BeautifulSoup version -- 4.4.1 (the latest one at the time of writing)
description: | updated |
description: | updated |