Python crashes when setting elem.text during etree.iterparse

Bug #1743420 reported by danny0838
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Won't Fix
Undecided
Unassigned

Bug Description

I try to run a XML parser which sets elem.text during an iterparse, and it seems to cause the Python to always crash on parsing certain files. (check attached file for an illustration)

The crash can be reproduced on at least 2 computers running Windows 7 SP1.

Removing the setting of elem.text (Line 17-18 in script.py of the attached file) seems to stop the crash.

--
script.py (same as the attached file)
--
#!/usr/bin/env python3
import os
import platform
import traceback
import re
import lxml.etree as etree

def main():
    fsrc = 'data.xml'

    for event, elem in etree.iterparse(fsrc, events=('start', 'end')):
        print(event, elem.tag, elem.attrib, elem.text, elem.tail)
        if event == 'start':
            tag_name = elem.tag
            if re.search(r'^_.*_$', tag_name):
                tag_name = tag_name[1:-1]
                if elem.text is not None:
                    elem.text = re.sub(r'^\n', r'', elem.text)

        elif event == 'end':
            tag_name = elem.tag
            if re.search(r'^_.*_$', tag_name):
                tag_name = tag_name[1:-1]
                if elem.tail is not None:
                    elem.tail = re.sub(r'^\n', r'', elem.tail)

            elem.clear()

if __name__ == "__main__":
    if platform.system() == 'Windows' and not 'PROMPT' in os.environ:
        try:
            main()
        except Exception:
            traceback.print_exc()
        os.system('pause')
    else:
        main()

--
Python : sys.version_info(major=3, minor=6, micro=4, releaselevel='final', serial=0)
lxml.etree : (4, 1, 1, 0)
libxml used : (2, 9, 5)
libxml compiled : (2, 9, 5)
libxslt used : (1, 1, 30)
libxslt compiled : (1, 1, 30)

Tags: iterparse
Revision history for this message
danny0838 (danny0838) wrote :
description: updated
danny0838 (danny0838)
description: updated
description: updated
description: updated
summary: - Python crashes when setting elem.text or elem.tail during
- etree.iterparse
+ Python crashes when setting elem.text during etree.iterparse
Revision history for this message
scoder (scoder) wrote :

I agree that it shouldn't crash, but this is difficult to prevent and your usage example is explicitly forbidden in the docs.

http://lxml.de/parsing.html#modifying-the-tree

Changed in lxml:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.