unportable assumption about default character set

Bug #1522052 reported by Thomas Klausner
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Low
scoder

Bug Description

lxml-3.5.0 on NetBSD 7.99.22 with python-3.4.3 in the default C locale fails two tests:

Doctest: xpathxslt.txt
======================================================================
ERROR: test_etree_parse_io_error (lxml.tests.test_io.ETreeIOTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/pkg/lib/python3.4/unittest/case.py", line 58, in testPartExecutor
    yield
  File "/usr/pkg/lib/python3.4/unittest/case.py", line 577, in run
    testMethod()
  File "/disk/3/archive/obj/textproc/py-lxml/work/lxml-3.5.0/src/lxml/tests/test_io.py", line 276, in test_etree_parse_io_error
    dn = tempfile.mkdtemp(prefix=dirnameRU)
  File "/usr/pkg/lib/python3.4/tempfile.py", line 295, in mkdtemp
    _os.mkdir(file, 0o700)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 5-18: ordinal not in range(128)

======================================================================
ERROR: test_etree_parse_io_error (lxml.tests.test_io.ElementTreeIOTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/pkg/lib/python3.4/unittest/case.py", line 58, in testPartExecutor
    yield
  File "/usr/pkg/lib/python3.4/unittest/case.py", line 577, in run
    testMethod()
  File "/disk/3/archive/obj/textproc/py-lxml/work/lxml-3.5.0/src/lxml/tests/test_io.py", line 276, in test_etree_parse_io_error
    dn = tempfile.mkdtemp(prefix=dirnameRU)
  File "/usr/pkg/lib/python3.4/tempfile.py", line 295, in mkdtemp
    _os.mkdir(file, 0o700)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 5-18: ordinal not in range(128)

----------------------------------------------------------------------
Ran 1735 tests in 24.823s
````

If I understand the test correctly, it tries to convert a Russian string to an ASCII string (since the default C locale on NetBSD is ASCII) and fails, and that makes the test fail. In my understanding the test works on Linux because the default C locale is UTF-8 there and the conversion works.

I don't know what the test really wants to test. Please change this test to be more portable.

Revision history for this message
Thomas Klausner (tk-giga) wrote :

Having debugged a similar problem for cookiecutter, I'm pretty sure that this problem will also appear on Linux when you unset LANG* and LC_*, since the Linux C locale is ASCII too.

Revision history for this message
Thomas Klausner (tk-giga) wrote :

Still there in 3.6.1.

Revision history for this message
Thomas Klausner (tk-giga) wrote :

Same in 3.6.4.

Revision history for this message
Thomas Klausner (tk-giga) wrote :

This bug is still there in 3.8.0, with slightly different line numbers:

======================================================================
ERROR: test_etree_parse_io_error (lxml.tests.test_io.ETreeIOTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/pkg/lib/python3.6/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/pkg/lib/python3.6/unittest/case.py", line 605, in run
    testMethod()
  File "/scratch/textproc/py-lxml/work/lxml-3.8.0/src/lxml/tests/test_io.py", line 276, in test_etree_parse_io_error
    dn = tempfile.mkdtemp(prefix=dirnameRU)
  File "/usr/pkg/lib/python3.6/tempfile.py", line 368, in mkdtemp
    _os.mkdir(file, 0o700)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 5-18: ordinal not in range(128)

======================================================================
ERROR: test_etree_parse_io_error (lxml.tests.test_io.ElementTreeIOTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/pkg/lib/python3.6/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/pkg/lib/python3.6/unittest/case.py", line 605, in run
    testMethod()
  File "/scratch/textproc/py-lxml/work/lxml-3.8.0/src/lxml/tests/test_io.py", line 276, in test_etree_parse_io_error
    dn = tempfile.mkdtemp(prefix=dirnameRU)
  File "/usr/pkg/lib/python3.6/tempfile.py", line 368, in mkdtemp
    _os.mkdir(file, 0o700)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 5-18: ordinal not in range(128)

Revision history for this message
scoder (scoder) wrote :

Thanks for the report and for insisting. I'll make the test optional in lxml 5.0.

Changed in lxml:
assignee: nobody → scoder (scoder)
importance: Undecided → Low
milestone: none → 5.0
status: New → Fix Committed
scoder (scoder)
Changed in lxml:
milestone: 5.0 → 4.9.4
scoder (scoder)
Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.