Validation error with non-ascii characters not printed in Python 2.7

Bug #1779866 reported by Oskar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Invalid
Undecided
scoder

Bug Description

When validating an element against an enumeration of values that contains non-ascii characters fails, the exception fails to show the error. This only happens in Python 2.

Test script (also added as attachment):

# -*- coding: utf-8 -*-
try:
    from StringIO import StringIO
except ImportError:
    from io import StringIO

from lxml import etree

if __name__ == '__main__':
    xml = StringIO('<status>stängt</status>')
    f = StringIO('''\
    <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <xsd:element name="status">
        <xsd:simpleType>
            <xsd:restriction base="xsd:string">
                <xsd:enumeration value="Öppet"/>
                <xsd:enumeration value="Stängt"/>
            </xsd:restriction>
        </xsd:simpleType>
    </xsd:element>
    </xsd:schema>
    ''')

    xmlschema_doc = etree.parse(f)
    xmlschema = etree.XMLSchema(xmlschema_doc)
    doc = etree.parse(xml)
    xmlschema.assertValid(doc)

Tested in Python 2.7.14:
Traceback (most recent call last):
  File "lxml_ascii_bug.py", line 28, in <module>
    xmlschema.assertValid(doc)
  File "src/lxml/etree.pyx", line 3532, in lxml.etree._Validator.assertValid
lxml.etree.DocumentInvalid: <exception str() failed>

Tested in Python 3.6.0:
Traceback (most recent call last):
  File "lxml_ascii_bug.py", line 28, in <module>
    xmlschema.assertValid(doc)
  File "src/lxml/etree.pyx", line 3532, in lxml.etree._Validator.assertValid
lxml.etree.DocumentInvalid: Element 'status': [facet 'enumeration'] The value 'stängt' is not an element of the set {'Öppet', 'Stängt'}., line 1

Environment:

Python : sys.version_info(major=2, minor=7, micro=14, releaselevel='final', serial=0)
lxml.etree : (4, 2, 3, 0)
libxml used : (2, 9, 8)
libxml compiled : (2, 9, 8)
libxslt used : (1, 1, 32)
libxslt compiled : (1, 1, 32)

Revision history for this message
Oskar (oskar-persson) wrote :
description: updated
Revision history for this message
scoder (scoder) wrote :

This is due to a bug in Cython's f-string formatting code that used PyObject_Str() instead of PyObject_Unicode() for normal string formatting. It's been fixed there. The next lxml release should have it.

Changed in lxml:
assignee: nobody → scoder (scoder)
importance: Undecided → Low
status: New → Fix Committed
Revision history for this message
scoder (scoder) wrote :

I take that back. That fix was already included in lxml 4.2.2.
It's Python 2.x that prints this broken error message.
Solution: don't let Python 2 print it.

Changed in lxml:
importance: Low → Undecided
status: Fix Committed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.