Validation error with non-ascii characters not printed in Python 2.7

Bug #1779866 reported by Oskar on 2018-07-03
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Undecided
scoder

Bug Description

When validating an element against an enumeration of values that contains non-ascii characters fails, the exception fails to show the error. This only happens in Python 2.

Test script (also added as attachment):

# -*- coding: utf-8 -*-
try:
    from StringIO import StringIO
except ImportError:
    from io import StringIO

from lxml import etree

if __name__ == '__main__':
    xml = StringIO('<status>stängt</status>')
    f = StringIO('''\
    <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <xsd:element name="status">
        <xsd:simpleType>
            <xsd:restriction base="xsd:string">
                <xsd:enumeration value="Öppet"/>
                <xsd:enumeration value="Stängt"/>
            </xsd:restriction>
        </xsd:simpleType>
    </xsd:element>
    </xsd:schema>
    ''')

    xmlschema_doc = etree.parse(f)
    xmlschema = etree.XMLSchema(xmlschema_doc)
    doc = etree.parse(xml)
    xmlschema.assertValid(doc)

Tested in Python 2.7.14:
Traceback (most recent call last):
  File "lxml_ascii_bug.py", line 28, in <module>
    xmlschema.assertValid(doc)
  File "src/lxml/etree.pyx", line 3532, in lxml.etree._Validator.assertValid
lxml.etree.DocumentInvalid: <exception str() failed>

Tested in Python 3.6.0:
Traceback (most recent call last):
  File "lxml_ascii_bug.py", line 28, in <module>
    xmlschema.assertValid(doc)
  File "src/lxml/etree.pyx", line 3532, in lxml.etree._Validator.assertValid
lxml.etree.DocumentInvalid: Element 'status': [facet 'enumeration'] The value 'stängt' is not an element of the set {'Öppet', 'Stängt'}., line 1

Environment:

Python : sys.version_info(major=2, minor=7, micro=14, releaselevel='final', serial=0)
lxml.etree : (4, 2, 3, 0)
libxml used : (2, 9, 8)
libxml compiled : (2, 9, 8)
libxslt used : (1, 1, 32)
libxslt compiled : (1, 1, 32)

Oskar (oskar-persson) wrote :
description: updated
scoder (scoder) wrote :

This is due to a bug in Cython's f-string formatting code that used PyObject_Str() instead of PyObject_Unicode() for normal string formatting. It's been fixed there. The next lxml release should have it.

Changed in lxml:
assignee: nobody → scoder (scoder)
importance: Undecided → Low
status: New → Fix Committed
scoder (scoder) wrote :

I take that back. That fix was already included in lxml 4.2.2.
It's Python 2.x that prints this broken error message.
Solution: don't let Python 2 print it.

Changed in lxml:
importance: Low → Undecided
status: Fix Committed → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers