2024-03-02 07:57:27 |
kwunlyou |
bug |
|
|
added bug |
2024-03-02 07:58:19 |
kwunlyou |
description |
I have a XML and need to remove a few types of tags. I used the iter method to check each element. I noticed if there is a nested element its child is also deleted. The following tag will not be deleted. For example as follows, the tag delete2 is a child of delete1 and both will be removed. The tag section can be deleted for some reason. Is this a bug? Or did I miss something? Thanks.
# %%
from lxml import etree
# unexpected output when nested elements are deleted
xml_str = """
<spdoc>
<commentary>
<body>
<delete1>
<delete2>
</delete2>
</delete1>
<section name="delete">
</section>
</body>
</commentary>
</spdoc>
"""
# works
# xml_str = """
# <spdoc>
# <commentary>
# <body>
# <delete1>
# </delete1>
Versions:
Python : sys.version_info(major=3, minor=10, micro=13, releaselevel='final', serial=0)
lxml.etree : (5, 1, 0, 0)
libxml used : (2, 12, 3)
libxml compiled : (2, 12, 3)
libxslt used : (1, 1, 39)
libxslt compiled : (1, 1, 39)
# <delete2>
# </delete2>
# <section name="delete">
# </section>
# </body>
# </commentary>
# </spdoc>
# """
root = etree.fromstring(xml_str)
for element in root.iter():
is_remove = False
if element.tag == "delete1":
is_remove = True
if element.tag == "delete2":
is_remove = True
if element.tag == "section" and element.attrib.get("name") == "delete":
is_remove = True
print(f"{element} {is_remove}")
if is_remove:
element.getparent().remove(element)
print(etree.tostring(root, encoding="utf-8").decode("utf-8"))
# the unexpected output is :
# <Element spdoc at 0x103489680> False
# <Element commentary at 0x103489640> False
# <Element body at 0x1034b5540> False
# <Element delete1 at 0x1037eb700> True
# <Element delete2 at 0x1037eba40> True
# <spdoc>
# <commentary>
# <body>
# <section name="delete">
# </section>
# </body>
# </commentary>
# </spdoc> |
I have a XML and need to remove a few types of tags. I used the iter method to check each element. I noticed if there is a nested element its child is also deleted. The following tag will not be deleted. For example as follows, the tag delete2 is a child of delete1 and both will be removed. The tag section can't be deleted for some reason. Is this a bug? Or did I miss something? Thanks.
# %%
from lxml import etree
# unexpected output when nested elements are deleted
xml_str = """
<spdoc>
<commentary>
<body>
<delete1>
<delete2>
</delete2>
</delete1>
<section name="delete">
</section>
</body>
</commentary>
</spdoc>
"""
# works
# xml_str = """
# <spdoc>
# <commentary>
# <body>
# <delete1>
# </delete1>
Versions:
Python : sys.version_info(major=3, minor=10, micro=13, releaselevel='final', serial=0)
lxml.etree : (5, 1, 0, 0)
libxml used : (2, 12, 3)
libxml compiled : (2, 12, 3)
libxslt used : (1, 1, 39)
libxslt compiled : (1, 1, 39)
# <delete2>
# </delete2>
# <section name="delete">
# </section>
# </body>
# </commentary>
# </spdoc>
# """
root = etree.fromstring(xml_str)
for element in root.iter():
is_remove = False
if element.tag == "delete1":
is_remove = True
if element.tag == "delete2":
is_remove = True
if element.tag == "section" and element.attrib.get("name") == "delete":
is_remove = True
print(f"{element} {is_remove}")
if is_remove:
element.getparent().remove(element)
print(etree.tostring(root, encoding="utf-8").decode("utf-8"))
# the unexpected output is :
# <Element spdoc at 0x103489680> False
# <Element commentary at 0x103489640> False
# <Element body at 0x1034b5540> False
# <Element delete1 at 0x1037eb700> True
# <Element delete2 at 0x1037eba40> True
# <spdoc>
# <commentary>
# <body>
# <section name="delete">
# </section>
# </body>
# </commentary>
# </spdoc> |
|
2024-03-02 07:59:11 |
kwunlyou |
description |
I have a XML and need to remove a few types of tags. I used the iter method to check each element. I noticed if there is a nested element its child is also deleted. The following tag will not be deleted. For example as follows, the tag delete2 is a child of delete1 and both will be removed. The tag section can't be deleted for some reason. Is this a bug? Or did I miss something? Thanks.
# %%
from lxml import etree
# unexpected output when nested elements are deleted
xml_str = """
<spdoc>
<commentary>
<body>
<delete1>
<delete2>
</delete2>
</delete1>
<section name="delete">
</section>
</body>
</commentary>
</spdoc>
"""
# works
# xml_str = """
# <spdoc>
# <commentary>
# <body>
# <delete1>
# </delete1>
Versions:
Python : sys.version_info(major=3, minor=10, micro=13, releaselevel='final', serial=0)
lxml.etree : (5, 1, 0, 0)
libxml used : (2, 12, 3)
libxml compiled : (2, 12, 3)
libxslt used : (1, 1, 39)
libxslt compiled : (1, 1, 39)
# <delete2>
# </delete2>
# <section name="delete">
# </section>
# </body>
# </commentary>
# </spdoc>
# """
root = etree.fromstring(xml_str)
for element in root.iter():
is_remove = False
if element.tag == "delete1":
is_remove = True
if element.tag == "delete2":
is_remove = True
if element.tag == "section" and element.attrib.get("name") == "delete":
is_remove = True
print(f"{element} {is_remove}")
if is_remove:
element.getparent().remove(element)
print(etree.tostring(root, encoding="utf-8").decode("utf-8"))
# the unexpected output is :
# <Element spdoc at 0x103489680> False
# <Element commentary at 0x103489640> False
# <Element body at 0x1034b5540> False
# <Element delete1 at 0x1037eb700> True
# <Element delete2 at 0x1037eba40> True
# <spdoc>
# <commentary>
# <body>
# <section name="delete">
# </section>
# </body>
# </commentary>
# </spdoc> |
I have a XML and need to remove a few types of tags. I used the iter method to check each element. I noticed if there is a nested element its child is also deleted. The following tag will not be deleted. For example as follows, the tag delete2 is a child of delete1 and both will be removed. The tag section can't be deleted for some reason. Is this a bug? Or did I miss something? Thanks.
# %%
from lxml import etree
# unexpected output when nested elements are deleted
xml_str = """
<spdoc>
<commentary>
<body>
<delete1>
<delete2>
</delete2>
</delete1>
<section name="delete">
</section>
</body>
</commentary>
</spdoc>
"""
# works
# xml_str = """
# <spdoc>
# <commentary>
# <body>
# <delete1>
# </delete1>
# <delete2>
# </delete2>
# <section name="delete">
# </section>
# </body>
# </commentary>
# </spdoc>
# """
root = etree.fromstring(xml_str)
for element in root.iter():
is_remove = False
if element.tag == "delete1":
is_remove = True
if element.tag == "delete2":
is_remove = True
if element.tag == "section" and element.attrib.get("name") == "delete":
is_remove = True
print(f"{element} {is_remove}")
if is_remove:
element.getparent().remove(element)
print(etree.tostring(root, encoding="utf-8").decode("utf-8"))
# the unexpected output is :
# <Element spdoc at 0x103489680> False
# <Element commentary at 0x103489640> False
# <Element body at 0x1034b5540> False
# <Element delete1 at 0x1037eb700> True
# <Element delete2 at 0x1037eba40> True
# <spdoc>
# <commentary>
# <body>
# <section name="delete">
# </section>
# </body>
# </commentary>
# </spdoc>
Versions:
Python : sys.version_info(major=3, minor=10, micro=13, releaselevel='final', serial=0)
lxml.etree : (5, 1, 0, 0)
libxml used : (2, 12, 3)
libxml compiled : (2, 12, 3)
libxslt used : (1, 1, 39)
libxslt compiled : (1, 1, 39) |
|
2024-03-02 07:59:29 |
kwunlyou |
description |
I have a XML and need to remove a few types of tags. I used the iter method to check each element. I noticed if there is a nested element its child is also deleted. The following tag will not be deleted. For example as follows, the tag delete2 is a child of delete1 and both will be removed. The tag section can't be deleted for some reason. Is this a bug? Or did I miss something? Thanks.
# %%
from lxml import etree
# unexpected output when nested elements are deleted
xml_str = """
<spdoc>
<commentary>
<body>
<delete1>
<delete2>
</delete2>
</delete1>
<section name="delete">
</section>
</body>
</commentary>
</spdoc>
"""
# works
# xml_str = """
# <spdoc>
# <commentary>
# <body>
# <delete1>
# </delete1>
# <delete2>
# </delete2>
# <section name="delete">
# </section>
# </body>
# </commentary>
# </spdoc>
# """
root = etree.fromstring(xml_str)
for element in root.iter():
is_remove = False
if element.tag == "delete1":
is_remove = True
if element.tag == "delete2":
is_remove = True
if element.tag == "section" and element.attrib.get("name") == "delete":
is_remove = True
print(f"{element} {is_remove}")
if is_remove:
element.getparent().remove(element)
print(etree.tostring(root, encoding="utf-8").decode("utf-8"))
# the unexpected output is :
# <Element spdoc at 0x103489680> False
# <Element commentary at 0x103489640> False
# <Element body at 0x1034b5540> False
# <Element delete1 at 0x1037eb700> True
# <Element delete2 at 0x1037eba40> True
# <spdoc>
# <commentary>
# <body>
# <section name="delete">
# </section>
# </body>
# </commentary>
# </spdoc>
Versions:
Python : sys.version_info(major=3, minor=10, micro=13, releaselevel='final', serial=0)
lxml.etree : (5, 1, 0, 0)
libxml used : (2, 12, 3)
libxml compiled : (2, 12, 3)
libxslt used : (1, 1, 39)
libxslt compiled : (1, 1, 39) |
I have a XML and need to remove a few types of tags. I used the iter method to check each element. I noticed if there is a nested element its child is also deleted. The following tag will not be deleted. For example as follows, the tag delete2 is a child of delete1 and both will be removed. The tag section can't be deleted for some reason. Is this a bug? Or did I miss something? Thanks.
# %%
from lxml import etree
# unexpected output when nested elements are deleted
xml_str = """
<spdoc>
<commentary>
<body>
<delete1>
<delete2>
</delete2>
</delete1>
<section name="delete">
</section>
</body>
</commentary>
</spdoc>
"""
root = etree.fromstring(xml_str)
for element in root.iter():
is_remove = False
if element.tag == "delete1":
is_remove = True
if element.tag == "delete2":
is_remove = True
if element.tag == "section" and element.attrib.get("name") == "delete":
is_remove = True
print(f"{element} {is_remove}")
if is_remove:
element.getparent().remove(element)
print(etree.tostring(root, encoding="utf-8").decode("utf-8"))
# the unexpected output is :
# <Element spdoc at 0x103489680> False
# <Element commentary at 0x103489640> False
# <Element body at 0x1034b5540> False
# <Element delete1 at 0x1037eb700> True
# <Element delete2 at 0x1037eba40> True
# <spdoc>
# <commentary>
# <body>
# <section name="delete">
# </section>
# </body>
# </commentary>
# </spdoc>
Versions:
Python : sys.version_info(major=3, minor=10, micro=13, releaselevel='final', serial=0)
lxml.etree : (5, 1, 0, 0)
libxml used : (2, 12, 3)
libxml compiled : (2, 12, 3)
libxslt used : (1, 1, 39)
libxslt compiled : (1, 1, 39) |
|
2024-03-02 08:01:15 |
kwunlyou |
description |
I have a XML and need to remove a few types of tags. I used the iter method to check each element. I noticed if there is a nested element its child is also deleted. The following tag will not be deleted. For example as follows, the tag delete2 is a child of delete1 and both will be removed. The tag section can't be deleted for some reason. Is this a bug? Or did I miss something? Thanks.
# %%
from lxml import etree
# unexpected output when nested elements are deleted
xml_str = """
<spdoc>
<commentary>
<body>
<delete1>
<delete2>
</delete2>
</delete1>
<section name="delete">
</section>
</body>
</commentary>
</spdoc>
"""
root = etree.fromstring(xml_str)
for element in root.iter():
is_remove = False
if element.tag == "delete1":
is_remove = True
if element.tag == "delete2":
is_remove = True
if element.tag == "section" and element.attrib.get("name") == "delete":
is_remove = True
print(f"{element} {is_remove}")
if is_remove:
element.getparent().remove(element)
print(etree.tostring(root, encoding="utf-8").decode("utf-8"))
# the unexpected output is :
# <Element spdoc at 0x103489680> False
# <Element commentary at 0x103489640> False
# <Element body at 0x1034b5540> False
# <Element delete1 at 0x1037eb700> True
# <Element delete2 at 0x1037eba40> True
# <spdoc>
# <commentary>
# <body>
# <section name="delete">
# </section>
# </body>
# </commentary>
# </spdoc>
Versions:
Python : sys.version_info(major=3, minor=10, micro=13, releaselevel='final', serial=0)
lxml.etree : (5, 1, 0, 0)
libxml used : (2, 12, 3)
libxml compiled : (2, 12, 3)
libxslt used : (1, 1, 39)
libxslt compiled : (1, 1, 39) |
I have a XML and need to remove a few types of tags. I used the iter method to check each element. I noticed if there is a nested element its child is also deleted. The following tag will not be deleted. For example as follows, the tag delete2 is a child of delete1 and both will be removed. The tag section can't be deleted for some reason. Is this a bug? Or did I miss something? Thanks.
# %%
from lxml import etree
# unexpected output when nested elements are deleted
xml_str = """
<spdoc>
<commentary>
<body>
<delete1>
<delete2>
</delete2>
</delete1>
<section name="delete">
</section>
</body>
</commentary>
</spdoc>
"""
root = etree.fromstring(xml_str)
for element in root.iter():
is_remove = False
if element.tag == "delete1":
is_remove = True
if element.tag == "delete2":
is_remove = True
if element.tag == "section" and element.attrib.get("name") == "delete":
is_remove = True
print(f"{element} {is_remove}")
if is_remove:
element.getparent().remove(element)
print(etree.tostring(root, encoding="utf-8").decode("utf-8"))
# the unexpected output is :
<Element spdoc at 0x103489680> False
<Element commentary at 0x103489640> False
<Element body at 0x1034b5540> False
<Element delete1 at 0x1037eb700> True
<Element delete2 at 0x1037eba40> True
<spdoc>
<commentary>
<body>
<section name="delete">
</section>
</body>
</commentary>
</spdoc>
Versions:
Python : sys.version_info(major=3, minor=10, micro=13, releaselevel='final', serial=0)
lxml.etree : (5, 1, 0, 0)
libxml used : (2, 12, 3)
libxml compiled : (2, 12, 3)
libxslt used : (1, 1, 39)
libxslt compiled : (1, 1, 39) |
|