2016-03-16 13:33:54 |
b0r3d0m |
bug |
|
|
added bug |
2016-03-16 13:33:54 |
b0r3d0m |
attachment added |
|
The page that forces lxml crash https://bugs.launchpad.net/bugs/1558076/+attachment/4601120/+files/page.html |
|
2016-03-16 13:42:43 |
b0r3d0m |
description |
The following code randomly crashes Python interpreter (both 2.7.6 and 2.7.11 versions) on Windows 8:
from bs4 import BeautifulSoup
with open('page.html', 'r') as f:
content = f.read()
for i in xrange(1000000000):
print(i)
soup = BeautifulSoup(content, 'lxml') # 'html.parser' and 'html5lib' parsers works perfectly
As I stated in the summary of this bug, the crash happens only on certain pages, so I attached an example of such file to this report.
==================================
There's no additional output in stdout / stderr so the only information I have at the moment is the standard error info from the corresponding Windows dialog (note that the Fault Module Name is "lxml.etree.pyd"):
Problem signature:
Problem Event Name: APPCRASH
Application Name: emls_aggregator_helper.exe
Application Version: 0.0.0.0
Application Timestamp: 514e2c2e
Fault Module Name: lxml.etree.pyd
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 553ba758
Exception Code: c0000005
Exception Offset: 000ed4aa
OS Version: 6.2.9200.2.0.0.768.100
Locale ID: 1033
Additional Information 1: 5861
Additional Information 2: 5861822e1919d7c014bbb064c64908b2
Additional Information 3: dac6
Additional Information 4: dac6c2650fa14dd558bd9f448e23afd1
Read our privacy statement online:
http://go.microsoft.com/fwlink/?linkid=190175
If the online privacy statement is not available, please read our privacy statement offline:
C:\Windows\system32\en-US\erofflps.txt
==================================
Moreover I noticed that the following code doesn't crash at all:
from lxml import etree
with open('page.html', 'r') as f:
content = f.read()
for i in xrange(1000000000):
print(i)
tree = etree.HTML(content)
I know that there must be some error in BeautifulSoup library then but I think that the incorrect usage of lxml should not crash an interpreter anyway.
==================================
lxml versions -- 3.4.4 and 3.5.0
BeautifulSoup version -- 4.4.1 (the latest one at the time of writing) |
The following code randomly crashes Python interpreter (both 2.7.6 and 2.7.11 versions) on Windows 8:
from bs4 import BeautifulSoup
with open('page.html', 'r') as f:
content = f.read()
for i in xrange(1000000000):
print(i)
soup = BeautifulSoup(content, 'lxml') # 'html.parser' and 'html5lib' parsers works perfectly
As I stated in the summary of this bug, the crash happens only on certain pages, so I attached an example of such file to this report.
==================================
There's no additional output in stdout / stderr so the only information I have at the moment is the standard error info from the corresponding Windows dialog (note that the Fault Module Name is "lxml.etree.pyd"):
Problem signature:
Problem Event Name: APPCRASH
Application Name: python.exe
Application Version: 0.0.0.0
Application Timestamp: 514e2c2e
Fault Module Name: lxml.etree.pyd
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 553ba758
Exception Code: c0000005
Exception Offset: 000ed4aa
OS Version: 6.2.9200.2.0.0.768.100
Locale ID: 1033
Additional Information 1: 5861
Additional Information 2: 5861822e1919d7c014bbb064c64908b2
Additional Information 3: dac6
Additional Information 4: dac6c2650fa14dd558bd9f448e23afd1
Read our privacy statement online:
http://go.microsoft.com/fwlink/?linkid=190175
If the online privacy statement is not available, please read our privacy statement offline:
C:\Windows\system32\en-US\erofflps.txt
==================================
Moreover I noticed that the following code doesn't crash at all:
from lxml import etree
with open('page.html', 'r') as f:
content = f.read()
for i in xrange(1000000000):
print(i)
tree = etree.HTML(content)
I know that there must be some error in BeautifulSoup library then but I think that the incorrect usage of lxml should not crash an interpreter anyway.
==================================
lxml versions -- 3.4.4 and 3.5.0
BeautifulSoup version -- 4.4.1 (the latest one at the time of writing) |
|
2016-03-16 13:43:46 |
b0r3d0m |
description |
The following code randomly crashes Python interpreter (both 2.7.6 and 2.7.11 versions) on Windows 8:
from bs4 import BeautifulSoup
with open('page.html', 'r') as f:
content = f.read()
for i in xrange(1000000000):
print(i)
soup = BeautifulSoup(content, 'lxml') # 'html.parser' and 'html5lib' parsers works perfectly
As I stated in the summary of this bug, the crash happens only on certain pages, so I attached an example of such file to this report.
==================================
There's no additional output in stdout / stderr so the only information I have at the moment is the standard error info from the corresponding Windows dialog (note that the Fault Module Name is "lxml.etree.pyd"):
Problem signature:
Problem Event Name: APPCRASH
Application Name: python.exe
Application Version: 0.0.0.0
Application Timestamp: 514e2c2e
Fault Module Name: lxml.etree.pyd
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 553ba758
Exception Code: c0000005
Exception Offset: 000ed4aa
OS Version: 6.2.9200.2.0.0.768.100
Locale ID: 1033
Additional Information 1: 5861
Additional Information 2: 5861822e1919d7c014bbb064c64908b2
Additional Information 3: dac6
Additional Information 4: dac6c2650fa14dd558bd9f448e23afd1
Read our privacy statement online:
http://go.microsoft.com/fwlink/?linkid=190175
If the online privacy statement is not available, please read our privacy statement offline:
C:\Windows\system32\en-US\erofflps.txt
==================================
Moreover I noticed that the following code doesn't crash at all:
from lxml import etree
with open('page.html', 'r') as f:
content = f.read()
for i in xrange(1000000000):
print(i)
tree = etree.HTML(content)
I know that there must be some error in BeautifulSoup library then but I think that the incorrect usage of lxml should not crash an interpreter anyway.
==================================
lxml versions -- 3.4.4 and 3.5.0
BeautifulSoup version -- 4.4.1 (the latest one at the time of writing) |
The following code randomly crashes Python interpreter (both 2.7.6 and 2.7.11 versions) on Windows 8:
from bs4 import BeautifulSoup
with open('page.html', 'r') as f:
content = f.read()
for i in xrange(1000000000):
print(i)
soup = BeautifulSoup(content, 'lxml') # 'html.parser' and 'html5lib' parsers works perfectly
As I stated in the summary of this bug, the crash happens only on certain pages, so I attached an example of such file to this report.
==================================
There's no additional output in stdout / stderr so the only information I have at the moment is the standard error info from the corresponding Windows dialog (note that the Fault Module Name is "etree.pyd"):
Problem signature:
Problem Event Name: APPCRASH
Application Name: python.exe
Application Version: 0.0.0.0
Application Timestamp: 56634a05
Fault Module Name: etree.pyd
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 56470805
Exception Code: c0000005
Exception Offset: 0011e3fa
OS Version: 6.2.9200.2.0.0.768.100
Locale ID: 1033
Additional Information 1: 5861
Additional Information 2: 5861822e1919d7c014bbb064c64908b2
Additional Information 3: dac6
Additional Information 4: dac6c2650fa14dd558bd9f448e23afd1
Read our privacy statement online:
http://go.microsoft.com/fwlink/?linkid=190175
If the online privacy statement is not available, please read our privacy statement offline:
C:\Windows\system32\en-US\erofflps.txt
==================================
Moreover I noticed that the following code doesn't crash at all:
from lxml import etree
with open('page.html', 'r') as f:
content = f.read()
for i in xrange(1000000000):
print(i)
tree = etree.HTML(content)
I know that there must be some error in BeautifulSoup library then but I think that the incorrect usage of lxml should not crash an interpreter anyway.
==================================
lxml versions -- 3.4.4 and 3.5.0
BeautifulSoup version -- 4.4.1 (the latest one at the time of writing) |
|