Incorrect value in attribute when use xmlfile.element

Bug #2060160 reported by Semyon Pupkov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Undecided
scoder

Bug Description

I found that serialization of attributes is different between etree.Element and xmlfile.element if attribute value is not English word

    import io
    res_file = io.BytesIO()

    with etree.xmlfile(res_file, encoding="utf-8") as xf:
        with xf.element("Документ", attrib={"Тест": "Атрибут"}):
            el = etree.Element("Книга", attrib={"Тест": "Атрибут"})
            xf.write(el)

Result
print(res_file.getvalue().decode())

<Документ Тест="&#x410;&#x442;&#x440;&#x438;&#x431;&#x443;&#x442;"><Книга Тест="Атрибут"/></Документ>

It looks like attribute value is hex value

Python : sys.version_info(major=3, minor=11, micro=6, releaselevel='final', serial=0)

lxml.etree : (5, 2, 1, 0)
libxml used : (2, 12, 6)
libxml compiled : (2, 12, 6)
libxslt used : (1, 1, 39)
libxslt compiled : (1, 1, 39)

description: updated
description: updated
Revision history for this message
scoder (scoder) wrote :

It's not "incorrect", it just uses a different form of encoding/escaping. The text value is correctly propagated.

But the escaping is unnecessary for UTF-8 and uselessly grows the output data. It should just rely on the normal encoding mechanism.

Fixed in https://github.com/lxml/lxml/commit/b8cd0e253f985090478c71c8d30c629a136a944e

Thanks for the report.

Changed in lxml:
assignee: nobody → scoder (scoder)
milestone: none → 5.3
status: New → Fix Committed
scoder (scoder)
Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.