Activity log for bug #2055758

Date Who What changed Old value New value Message
2024-03-02 07:57:27 kwunlyou bug added bug
2024-03-02 07:58:19 kwunlyou description I have a XML and need to remove a few types of tags. I used the iter method to check each element. I noticed if there is a nested element its child is also deleted. The following tag will not be deleted. For example as follows, the tag delete2 is a child of delete1 and both will be removed. The tag section can be deleted for some reason. Is this a bug? Or did I miss something? Thanks. # %% from lxml import etree # unexpected output when nested elements are deleted xml_str = """ <spdoc> <commentary> <body> <delete1> <delete2> </delete2> </delete1> <section name="delete"> </section> </body> </commentary> </spdoc> """ # works # xml_str = """ # <spdoc> # <commentary> # <body> # <delete1> # </delete1> Versions: Python : sys.version_info(major=3, minor=10, micro=13, releaselevel='final', serial=0) lxml.etree : (5, 1, 0, 0) libxml used : (2, 12, 3) libxml compiled : (2, 12, 3) libxslt used : (1, 1, 39) libxslt compiled : (1, 1, 39) # <delete2> # </delete2> # <section name="delete"> # </section> # </body> # </commentary> # </spdoc> # """ root = etree.fromstring(xml_str) for element in root.iter(): is_remove = False if element.tag == "delete1": is_remove = True if element.tag == "delete2": is_remove = True if element.tag == "section" and element.attrib.get("name") == "delete": is_remove = True print(f"{element} {is_remove}") if is_remove: element.getparent().remove(element) print(etree.tostring(root, encoding="utf-8").decode("utf-8")) # the unexpected output is : # <Element spdoc at 0x103489680> False # <Element commentary at 0x103489640> False # <Element body at 0x1034b5540> False # <Element delete1 at 0x1037eb700> True # <Element delete2 at 0x1037eba40> True # <spdoc> # <commentary> # <body> # <section name="delete"> # </section> # </body> # </commentary> # </spdoc> I have a XML and need to remove a few types of tags. I used the iter method to check each element. I noticed if there is a nested element its child is also deleted. The following tag will not be deleted. For example as follows, the tag delete2 is a child of delete1 and both will be removed. The tag section can't be deleted for some reason. Is this a bug? Or did I miss something? Thanks. # %% from lxml import etree # unexpected output when nested elements are deleted xml_str = """ <spdoc>   <commentary>     <body>       <delete1>         <delete2>         </delete2>       </delete1>       <section name="delete">       </section>     </body>   </commentary> </spdoc> """ # works # xml_str = """ # <spdoc> # <commentary> # <body> # <delete1> # </delete1> Versions: Python : sys.version_info(major=3, minor=10, micro=13, releaselevel='final', serial=0) lxml.etree : (5, 1, 0, 0) libxml used : (2, 12, 3) libxml compiled : (2, 12, 3) libxslt used : (1, 1, 39) libxslt compiled : (1, 1, 39) # <delete2> # </delete2> # <section name="delete"> # </section> # </body> # </commentary> # </spdoc> # """ root = etree.fromstring(xml_str) for element in root.iter():     is_remove = False     if element.tag == "delete1":         is_remove = True     if element.tag == "delete2":         is_remove = True     if element.tag == "section" and element.attrib.get("name") == "delete":         is_remove = True     print(f"{element} {is_remove}")     if is_remove:         element.getparent().remove(element) print(etree.tostring(root, encoding="utf-8").decode("utf-8")) # the unexpected output is : # <Element spdoc at 0x103489680> False # <Element commentary at 0x103489640> False # <Element body at 0x1034b5540> False # <Element delete1 at 0x1037eb700> True # <Element delete2 at 0x1037eba40> True # <spdoc> # <commentary> # <body> # <section name="delete"> # </section> # </body> # </commentary> # </spdoc>
2024-03-02 07:59:11 kwunlyou description I have a XML and need to remove a few types of tags. I used the iter method to check each element. I noticed if there is a nested element its child is also deleted. The following tag will not be deleted. For example as follows, the tag delete2 is a child of delete1 and both will be removed. The tag section can't be deleted for some reason. Is this a bug? Or did I miss something? Thanks. # %% from lxml import etree # unexpected output when nested elements are deleted xml_str = """ <spdoc>   <commentary>     <body>       <delete1>         <delete2>         </delete2>       </delete1>       <section name="delete">       </section>     </body>   </commentary> </spdoc> """ # works # xml_str = """ # <spdoc> # <commentary> # <body> # <delete1> # </delete1> Versions: Python : sys.version_info(major=3, minor=10, micro=13, releaselevel='final', serial=0) lxml.etree : (5, 1, 0, 0) libxml used : (2, 12, 3) libxml compiled : (2, 12, 3) libxslt used : (1, 1, 39) libxslt compiled : (1, 1, 39) # <delete2> # </delete2> # <section name="delete"> # </section> # </body> # </commentary> # </spdoc> # """ root = etree.fromstring(xml_str) for element in root.iter():     is_remove = False     if element.tag == "delete1":         is_remove = True     if element.tag == "delete2":         is_remove = True     if element.tag == "section" and element.attrib.get("name") == "delete":         is_remove = True     print(f"{element} {is_remove}")     if is_remove:         element.getparent().remove(element) print(etree.tostring(root, encoding="utf-8").decode("utf-8")) # the unexpected output is : # <Element spdoc at 0x103489680> False # <Element commentary at 0x103489640> False # <Element body at 0x1034b5540> False # <Element delete1 at 0x1037eb700> True # <Element delete2 at 0x1037eba40> True # <spdoc> # <commentary> # <body> # <section name="delete"> # </section> # </body> # </commentary> # </spdoc> I have a XML and need to remove a few types of tags. I used the iter method to check each element. I noticed if there is a nested element its child is also deleted. The following tag will not be deleted. For example as follows, the tag delete2 is a child of delete1 and both will be removed. The tag section can't be deleted for some reason. Is this a bug? Or did I miss something? Thanks. # %% from lxml import etree # unexpected output when nested elements are deleted xml_str = """ <spdoc>   <commentary>     <body>       <delete1>         <delete2>         </delete2>       </delete1>       <section name="delete">       </section>     </body>   </commentary> </spdoc> """ # works # xml_str = """ # <spdoc> # <commentary> # <body> # <delete1> # </delete1> # <delete2> # </delete2> # <section name="delete"> # </section> # </body> # </commentary> # </spdoc> # """ root = etree.fromstring(xml_str) for element in root.iter():     is_remove = False     if element.tag == "delete1":         is_remove = True     if element.tag == "delete2":         is_remove = True     if element.tag == "section" and element.attrib.get("name") == "delete":         is_remove = True     print(f"{element} {is_remove}")     if is_remove:         element.getparent().remove(element) print(etree.tostring(root, encoding="utf-8").decode("utf-8")) # the unexpected output is : # <Element spdoc at 0x103489680> False # <Element commentary at 0x103489640> False # <Element body at 0x1034b5540> False # <Element delete1 at 0x1037eb700> True # <Element delete2 at 0x1037eba40> True # <spdoc> # <commentary> # <body> # <section name="delete"> # </section> # </body> # </commentary> # </spdoc> Versions: Python : sys.version_info(major=3, minor=10, micro=13, releaselevel='final', serial=0) lxml.etree : (5, 1, 0, 0) libxml used : (2, 12, 3) libxml compiled : (2, 12, 3) libxslt used : (1, 1, 39) libxslt compiled : (1, 1, 39)
2024-03-02 07:59:29 kwunlyou description I have a XML and need to remove a few types of tags. I used the iter method to check each element. I noticed if there is a nested element its child is also deleted. The following tag will not be deleted. For example as follows, the tag delete2 is a child of delete1 and both will be removed. The tag section can't be deleted for some reason. Is this a bug? Or did I miss something? Thanks. # %% from lxml import etree # unexpected output when nested elements are deleted xml_str = """ <spdoc>   <commentary>     <body>       <delete1>         <delete2>         </delete2>       </delete1>       <section name="delete">       </section>     </body>   </commentary> </spdoc> """ # works # xml_str = """ # <spdoc> # <commentary> # <body> # <delete1> # </delete1> # <delete2> # </delete2> # <section name="delete"> # </section> # </body> # </commentary> # </spdoc> # """ root = etree.fromstring(xml_str) for element in root.iter():     is_remove = False     if element.tag == "delete1":         is_remove = True     if element.tag == "delete2":         is_remove = True     if element.tag == "section" and element.attrib.get("name") == "delete":         is_remove = True     print(f"{element} {is_remove}")     if is_remove:         element.getparent().remove(element) print(etree.tostring(root, encoding="utf-8").decode("utf-8")) # the unexpected output is : # <Element spdoc at 0x103489680> False # <Element commentary at 0x103489640> False # <Element body at 0x1034b5540> False # <Element delete1 at 0x1037eb700> True # <Element delete2 at 0x1037eba40> True # <spdoc> # <commentary> # <body> # <section name="delete"> # </section> # </body> # </commentary> # </spdoc> Versions: Python : sys.version_info(major=3, minor=10, micro=13, releaselevel='final', serial=0) lxml.etree : (5, 1, 0, 0) libxml used : (2, 12, 3) libxml compiled : (2, 12, 3) libxslt used : (1, 1, 39) libxslt compiled : (1, 1, 39) I have a XML and need to remove a few types of tags. I used the iter method to check each element. I noticed if there is a nested element its child is also deleted. The following tag will not be deleted. For example as follows, the tag delete2 is a child of delete1 and both will be removed. The tag section can't be deleted for some reason. Is this a bug? Or did I miss something? Thanks. # %% from lxml import etree # unexpected output when nested elements are deleted xml_str = """ <spdoc>   <commentary>     <body>       <delete1>         <delete2>         </delete2>       </delete1>       <section name="delete">       </section>     </body>   </commentary> </spdoc> """ root = etree.fromstring(xml_str) for element in root.iter():     is_remove = False     if element.tag == "delete1":         is_remove = True     if element.tag == "delete2":         is_remove = True     if element.tag == "section" and element.attrib.get("name") == "delete":         is_remove = True     print(f"{element} {is_remove}")     if is_remove:         element.getparent().remove(element) print(etree.tostring(root, encoding="utf-8").decode("utf-8")) # the unexpected output is : # <Element spdoc at 0x103489680> False # <Element commentary at 0x103489640> False # <Element body at 0x1034b5540> False # <Element delete1 at 0x1037eb700> True # <Element delete2 at 0x1037eba40> True # <spdoc> # <commentary> # <body> # <section name="delete"> # </section> # </body> # </commentary> # </spdoc> Versions: Python : sys.version_info(major=3, minor=10, micro=13, releaselevel='final', serial=0) lxml.etree : (5, 1, 0, 0) libxml used : (2, 12, 3) libxml compiled : (2, 12, 3) libxslt used : (1, 1, 39) libxslt compiled : (1, 1, 39)
2024-03-02 08:01:15 kwunlyou description I have a XML and need to remove a few types of tags. I used the iter method to check each element. I noticed if there is a nested element its child is also deleted. The following tag will not be deleted. For example as follows, the tag delete2 is a child of delete1 and both will be removed. The tag section can't be deleted for some reason. Is this a bug? Or did I miss something? Thanks. # %% from lxml import etree # unexpected output when nested elements are deleted xml_str = """ <spdoc>   <commentary>     <body>       <delete1>         <delete2>         </delete2>       </delete1>       <section name="delete">       </section>     </body>   </commentary> </spdoc> """ root = etree.fromstring(xml_str) for element in root.iter():     is_remove = False     if element.tag == "delete1":         is_remove = True     if element.tag == "delete2":         is_remove = True     if element.tag == "section" and element.attrib.get("name") == "delete":         is_remove = True     print(f"{element} {is_remove}")     if is_remove:         element.getparent().remove(element) print(etree.tostring(root, encoding="utf-8").decode("utf-8")) # the unexpected output is : # <Element spdoc at 0x103489680> False # <Element commentary at 0x103489640> False # <Element body at 0x1034b5540> False # <Element delete1 at 0x1037eb700> True # <Element delete2 at 0x1037eba40> True # <spdoc> # <commentary> # <body> # <section name="delete"> # </section> # </body> # </commentary> # </spdoc> Versions: Python : sys.version_info(major=3, minor=10, micro=13, releaselevel='final', serial=0) lxml.etree : (5, 1, 0, 0) libxml used : (2, 12, 3) libxml compiled : (2, 12, 3) libxslt used : (1, 1, 39) libxslt compiled : (1, 1, 39) I have a XML and need to remove a few types of tags. I used the iter method to check each element. I noticed if there is a nested element its child is also deleted. The following tag will not be deleted. For example as follows, the tag delete2 is a child of delete1 and both will be removed. The tag section can't be deleted for some reason. Is this a bug? Or did I miss something? Thanks. # %% from lxml import etree # unexpected output when nested elements are deleted xml_str = """ <spdoc>   <commentary>     <body>       <delete1>         <delete2>         </delete2>       </delete1>       <section name="delete">       </section>     </body>   </commentary> </spdoc> """ root = etree.fromstring(xml_str) for element in root.iter():     is_remove = False     if element.tag == "delete1":         is_remove = True     if element.tag == "delete2":         is_remove = True     if element.tag == "section" and element.attrib.get("name") == "delete":         is_remove = True     print(f"{element} {is_remove}")     if is_remove:         element.getparent().remove(element) print(etree.tostring(root, encoding="utf-8").decode("utf-8")) # the unexpected output is : <Element spdoc at 0x103489680> False <Element commentary at 0x103489640> False <Element body at 0x1034b5540> False <Element delete1 at 0x1037eb700> True <Element delete2 at 0x1037eba40> True <spdoc> <commentary> <body> <section name="delete"> </section> </body> </commentary> </spdoc> Versions: Python : sys.version_info(major=3, minor=10, micro=13, releaselevel='final', serial=0) lxml.etree : (5, 1, 0, 0) libxml used : (2, 12, 3) libxml compiled : (2, 12, 3) libxslt used : (1, 1, 39) libxslt compiled : (1, 1, 39)