def test():
i = 0
while True:
i += 1
page_html = open(filename, "r").read() lxml.html.document_fromstring(page_html)
print i
test()
"""
You'll get lots of output in the beginning which you can ignore. It only gets interesting right before the first "1" is printed, and then from that point onwards. If there's no stack trace that involves libxml2 before you kill it, valgrind is happy with it.
Ubuntu 13.10, but with a self-built libxml2 2.9.1.
So, on your side, you get lockups with all those files using the code you presented above? Could you run it under valgrind control?
I did this:
valgrind --tool=memcheck --leak-check=no --num-callers=30 \
--suppressi ons=LXML_ SRC_DIR/ valgrind- python. supp \
python html_parse_test.py SOME_FILE.html
where parse_test.py is this:
"""
import lxml.html
import sys
if len(sys.argv) > 1:
filename = sys.argv[1]
else:
filename = "problem.html"
def test():
lxml.html. document_ fromstring( page_html)
i = 0
while True:
i += 1
page_html = open(filename, "r").read()
print i
test()
"""
You'll get lots of output in the beginning which you can ignore. It only gets interesting right before the first "1" is printed, and then from that point onwards. If there's no stack trace that involves libxml2 before you kill it, valgrind is happy with it.