Comment 0 for bug 1558076

Revision history for this message
b0r3d0m (nikita-trophimov) wrote :

The following code randomly crashes Python interpreter (both 2.7.6 and 2.7.11 versions) on Windows 8:

from bs4 import BeautifulSoup

with open('page.html', 'r') as f:
    content = f.read()
    for i in xrange(1000000000):
        print(i)
        soup = BeautifulSoup(content, 'lxml') # 'html.parser' and 'html5lib' parsers works perfectly

As I stated in the summary of this bug, the crash happens only on certain pages, so I attached an example of such file to this report.

==================================

There's no additional output in stdout / stderr so the only information I have at the moment is the standard error info from the corresponding Windows dialog (note that the Fault Module Name is "lxml.etree.pyd"):

Problem signature:
  Problem Event Name: APPCRASH
  Application Name: emls_aggregator_helper.exe
  Application Version: 0.0.0.0
  Application Timestamp: 514e2c2e
  Fault Module Name: lxml.etree.pyd
  Fault Module Version: 0.0.0.0
  Fault Module Timestamp: 553ba758
  Exception Code: c0000005
  Exception Offset: 000ed4aa
  OS Version: 6.2.9200.2.0.0.768.100
  Locale ID: 1033
  Additional Information 1: 5861
  Additional Information 2: 5861822e1919d7c014bbb064c64908b2
  Additional Information 3: dac6
  Additional Information 4: dac6c2650fa14dd558bd9f448e23afd1

Read our privacy statement online:
  http://go.microsoft.com/fwlink/?linkid=190175

If the online privacy statement is not available, please read our privacy statement offline:
  C:\Windows\system32\en-US\erofflps.txt

==================================

Moreover I noticed that the following code doesn't crash at all:

from lxml import etree

with open('page.html', 'r') as f:
    content = f.read()
    for i in xrange(1000000000):
        print(i)
        tree = etree.HTML(content)

I know that there must be some error in BeautifulSoup library then but I think that the incorrect usage of lxml should not crash an interpreter anyway.

==================================

lxml versions -- 3.4.4 and 3.5.0
BeautifulSoup version -- 4.4.1 (the latest one at the time of writing)