UnicodeEncodeError for unrecognized files with unicode filenames in pbuilder

Bug #1025031 reported by marmuta
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
python-distutils-extra (Ubuntu)
Fix Released
Medium
Martin Pitt

Bug Description

[Test case]
Install version 2.34.1
1. have an unrecognized file with UTF-8 characters in it's filename using p3-d-e
   For example: $ touch "Ein Stück Lebensgeschichte und andere Erzählungen.txt"
2. run pdebuild
-> crash

One more, building the word prediction branch of Onboard in pbuilder fails:

make[1]: Entering directory `/tmp/buildd/onboard-0.97.0'
python3 setup.py clean -a
running clean
'build/lib.linux-x86_64-3.2' does not exist -- can't clean it
'build/bdist.linux-x86_64' does not exist -- can't clean it
'build/scripts-3.2' does not exist -- can't clean it
WARNING: the following files are not recognized by DistUtilsExtra.auto:
  HACKING
  Onboard/osk/osk_devices.c
  Onboard/osk/osk_devices.h
  Onboard/osk/osk_module.c
  Onboard/osk/osk_module.h
  Onboard/osk/osk_util.c
  Onboard/osk/osk_util.h
  Onboard/pypredict/README
  Onboard/pypredict/attic/Makefile
  Onboard/pypredict/attic/multilevel
  Onboard/pypredict/attic/ngram-test
  Onboard/pypredict/attic/randbench
  Onboard/pypredict/attic/test-client
  Onboard/pypredict/lm.cpython-32mu.so
  Onboard/pypredict/lm/lm.cpp
  Onboard/pypredict/lm/lm.h
  Onboard/pypredict/lm/lm_dynamic.cpp
  Onboard/pypredict/lm/lm_dynamic.h
  Onboard/pypredict/lm/lm_dynamic_cached.h
  Onboard/pypredict/lm/lm_dynamic_impl.h
  Onboard/pypredict/lm/lm_dynamic_kn.h
  Onboard/pypredict/lm/lm_merged.cpp
  Onboard/pypredict/lm/lm_merged.h
  Onboard/pypredict/lm/lm_python.cpp
  Onboard/pypredict/lm/pool_allocator.cpp
  Onboard/pypredict/tools/analyze
  Onboard/pypredict/tools/entropy
  Onboard/pypredict/tools/ksr
  Onboard/pypredict/tools/optimize
  Onboard/pypredict/tools/predict
  Onboard/pypredict/tools/split_corpus
  Onboard/pypredict/tools/train
  corpora/de/Deutsches Leben der Gegenwart.txt
Traceback (most recent call last):
  File "setup.py", line 221, in <module>
    cmdclass = {'test': TestCommand},
  File "/usr/lib/python3/dist-packages/DistUtilsExtra/auto.py", line 105, in setup
    print (' ' + f)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 19-20: ordinal not in range(128)

The file it fails to print is "corpora/de/Ein Stück Lebensgeschichte und andere Erzählungen.txt"

ProblemType: Bug
DistroRelease: Ubuntu 12.10
Package: python3-distutils-extra 2.34-1 [modified: usr/lib/python3/dist-packages/DistUtilsExtra/auto.py]
ProcVersionSignature: Ubuntu 3.5.0-4.4-generic 3.5.0-rc6
Uname: Linux 3.5.0-4-generic x86_64
ApportVersion: 2.3-0ubuntu4
Architecture: amd64
Date: Sun Jul 15 22:01:43 2012
PackageArchitecture: all
ProcEnviron:
 TERM=xterm
 PATH=(custom, username)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: python-distutils-extra
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
marmuta (marmuta) wrote :
Revision history for this message
marmuta (marmuta) wrote :

Btw., is there a way to turn the listing of unrecognized files off? Also, shouldn't files from C extensions not be recognized files?

Martin Pitt (pitti)
Changed in python-distutils-extra (Ubuntu):
status: New → In Progress
assignee: nobody → Martin Pitt (pitti)
importance: Undecided → Medium
Revision history for this message
Martin Pitt (pitti) wrote :

Please note that trying to handle non-ASCII file names under a C locale (as you appently try) is rather doomed in the first place. But I'll try to make it more robust for this case, I wrote a test suite case for this which reproduces this. It works fine in Python 3, but is still acting up with Python 2.

Revision history for this message
Martin Pitt (pitti) wrote :

Fixed in bzr.

Changed in python-distutils-extra (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
marmuta (marmuta) wrote :

Thank you, that was quick. With p-d-e trunk the build runs through now, for both Python 2 and 3.
You're right, but non-ASCII file names aren't actually handled during the build yet, they just happen to be around. If that changes we probably have to explicitly set the locale during (p)builds. Though, from what I found that seems cumbersome to do, I'd rather avoid it until it's really necessary.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package python-distutils-extra - 2.35-1

---------------
python-distutils-extra (2.35-1) unstable; urgency=low

  * auto.py: Fix printing of unrecognized non-ASCII file names under ASCII
    locales. (LP: #1025031)
  * auto.py: Fix detection of extensionless Python scripts with non-ASCII
    characters in the first few lines. (LP: #1025022)
 -- Martin Pitt <email address hidden> Fri, 03 Aug 2012 13:56:14 +0200

Changed in python-distutils-extra (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.