Freeze in parser.feed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
I have been debugging this issue the last 4 days without success. It occurs in my production environment consistently every 1-3 days. When frozen, the program still uses CPU, strace reports no system activity coming from the program, and it can still respond to signals such as USR2 which I used to get a backtrace during a freeze.
Currently, I am making an effort to get more data about circumstances surrounding the freeze, but since it's taking me a long time to get that data, first I will post the problematic piece of code here just in case there's something wrong with the way I'm using lxml rather than an issue with lxml itself!
The docs specifically mention to be careful with making sure to call parser.close() before using parsed elements, otherwise the behavior is undefined. In my case, the code only closes the parser when I find an element I'm interested in, otherwise I never use the target element (set to "res") so it doesn't matter if I close the parser, correct?
### simplified sample (not intended to be run as it could take days)
link = 'https:/
headers_override = {}
params = {}
timeout = 15
try:
get_req = await asyncio.
res = None
if get_req.status == 200:
parser = HTMLPullParser(
events = parser.
async for data in get_req.
# freezes on parser.feed... working on getting what "data" is in this case
parser.
for event, ele in events:
if ele.get('id') == 'maxotel_rooms':
parser.close()
res = ele
break
except asyncio.
pass
except (aiohttp.
pass
else:
await get_req.release()
# if res is not None ... use it
###
I am attempting to run my program with a debug build of python in order to see the exact parsing data that seems to make the freeze occur. I have seen a gdb low-level backtrace that does point me at least to the fact that parser.feed being where the program is stuck at. Getting a debug build of python working with a debug version of lxml has proven to be a lot of guess and check work... once I get it working hopefully I can get a good core-dump and see exactly what data is being fed that causes the parser to freeze!
Python : sys.version_
lxml.etree : (3, 7, 3, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 3)
libxslt used : (1, 1, 29)
libxslt compiled : (1, 1, 29)
I may have accidentally ran the problematic code on actually a different setup, it may have been this:
lxml.etree : (3, 6, 4, 0)
and python 3.5.2
Once I confirm that it was in fact these versions that caused the freeze, and not the newer versions I'll update this report.