unportable assumption about default character set
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| lxml |
Undecided
|
Unassigned |
Bug Description
lxml-3.5.0 on NetBSD 7.99.22 with python-3.4.3 in the default C locale fails two tests:
Doctest: xpathxslt.txt
=======
ERROR: test_etree_
-------
Traceback (most recent call last):
File "/usr/pkg/
yield
File "/usr/pkg/
testMethod()
File "/disk/
dn = tempfile.
File "/usr/pkg/
_os.mkdir(file, 0o700)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 5-18: ordinal not in range(128)
=======
ERROR: test_etree_
-------
Traceback (most recent call last):
File "/usr/pkg/
yield
File "/usr/pkg/
testMethod()
File "/disk/
dn = tempfile.
File "/usr/pkg/
_os.mkdir(file, 0o700)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 5-18: ordinal not in range(128)
-------
Ran 1735 tests in 24.823s
````
If I understand the test correctly, it tries to convert a Russian string to an ASCII string (since the default C locale on NetBSD is ASCII) and fails, and that makes the test fail. In my understanding the test works on Linux because the default C locale is UTF-8 there and the conversion works.
I don't know what the test really wants to test. Please change this test to be more portable.
Thomas Klausner (tk-giga) wrote : | #1 |
Thomas Klausner (tk-giga) wrote : | #2 |
Still there in 3.6.1.
Thomas Klausner (tk-giga) wrote : | #3 |
Same in 3.6.4.
Thomas Klausner (tk-giga) wrote : | #4 |
This bug is still there in 3.8.0, with slightly different line numbers:
=======
ERROR: test_etree_
-------
Traceback (most recent call last):
File "/usr/pkg/
yield
File "/usr/pkg/
testMethod()
File "/scratch/
dn = tempfile.
File "/usr/pkg/
_os.mkdir(file, 0o700)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 5-18: ordinal not in range(128)
=======
ERROR: test_etree_
-------
Traceback (most recent call last):
File "/usr/pkg/
yield
File "/usr/pkg/
testMethod()
File "/scratch/
dn = tempfile.
File "/usr/pkg/
_os.mkdir(file, 0o700)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 5-18: ordinal not in range(128)
Having debugged a similar problem for cookiecutter, I'm pretty sure that this problem will also appear on Linux when you unset LANG* and LC_*, since the Linux C locale is ASCII too.