autolanding fails with unicode error

Bug #1657969 reported by Michi Henning on 2017-01-20
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Undecided
Unassigned
jenkins-launchpad-plugin
Undecided
Unassigned

Bug Description

I'm seeing a failure during autolanding after top-approved MR has built and passed all its tests. The same project used to autoland just fine until recently. Monitor slaves are up to date as far as their packages are concerned. I have no idea how to track this down right now. Any suggestions?

04:57:56 Started by upstream project "lp-thumbnailer-autoland" build number 99
04:57:56 originally caused by:
04:57:56 Started by remote host 10.43.64.7
04:57:56 Building remotely on jenkins-slave-monitor-3 (launchpad monitor) in workspace /var/lib/jenkins/slaves/jenkins-slave-monitor-3/workspace/lp-generic-land-mp
04:57:56 [lp-generic-land-mp] $ /bin/bash /tmp/hudson8331335214888849995.sh
04:57:56 + autoland --test-result=PASSED --build-job-url=https://jenkins.canonical.com/unity-api-1/job/lp-thumbnailer-autoland/99/ --merge-proposal=https://code.launchpad.net/~michihenning/thumbnailer/cmake-fixes/+merge/314793 --revision=368
04:57:57 DEBUG: fetching branch: lp:~michihenning/thumbnailer/cmake-fixes/
04:57:57 DEBUG: mp_link: https://code.launchpad.net/~michihenning/thumbnailer/cmake-fixes/+merge/314793.
04:57:57 DEBUG: mp.web_link: https://code.launchpad.net/~michihenning/thumbnailer/cmake-fixes/+merge/314793
04:58:02 DEBUG: Using temp dir at /tmp/tmpsY24mq

04:58:24 Traceback (most recent call last):
04:58:24 File "/usr/bin/autoland", line 9, in <module>
04:58:24 load_entry_point('jenkins-launchpad-plugin==0.1', 'console_scripts', 'autoland')()
04:58:24 File "/usr/lib/python2.7/dist-packages/jlp/commands/autoland.py", line 262, in autoland
04:58:24 ret = merge_and_commit(mp, args)
04:58:24 File "/usr/lib/python2.7/dist-packages/jlp/commands/autoland.py", line 83, in merge_and_commit
04:58:24 target = Branch.create(mp.target_branch, config=False, create_tree=True)
04:58:24 File "/usr/lib/python2.7/dist-packages/tarmac/branch.py", line 64, in create
04:58:24 clazz.create_tree()
04:58:24 File "/usr/lib/python2.7/dist-packages/tarmac/branch.py", line 98, in create_tree
04:58:24 self.tree = self.bzr_branch.create_checkout(self.temp_tree_dir)
04:58:24 File "/usr/lib/python2.7/dist-packages/bzrlib/branch.py", line 1469, in create_checkout
04:58:24 hardlink=hardlink)
04:58:24 File "/usr/lib/python2.7/dist-packages/bzrlib/bzrdir.py", line 907, in create_workingtree
04:58:24 accelerator_tree=accelerator_tree, hardlink=hardlink)
04:58:24 File "/usr/lib/python2.7/dist-packages/bzrlib/workingtree_4.py", line 1565, in initialize
04:58:24 delta_from_tree=delta_from_tree)
04:58:24 File "/usr/lib/python2.7/dist-packages/bzrlib/transform.py", line 2543, in build_tree
04:58:24 delta_from_tree)
04:58:24 File "/usr/lib/python2.7/dist-packages/bzrlib/transform.py", line 2661, in _build_tree
04:58:24 tt.finalize()
04:58:24 File "/usr/lib/python2.7/dist-packages/bzrlib/transform.py", line 1214, in finalize
04:58:24 delete_any(path)
04:58:24 File "/usr/lib/python2.7/dist-packages/bzrlib/osutils.py", line 1156, in delete_any
04:58:24 _delete_file_or_dir(path)
04:58:24 File "/usr/lib/python2.7/dist-packages/bzrlib/osutils.py", line 1175, in _delete_file_or_dir
04:58:24 if isdir(path): # Takes care of symlinks
04:58:24 File "/usr/lib/python2.7/dist-packages/bzrlib/osutils.py", line 621, in isdir
04:58:24 return stat.S_ISDIR(os.lstat(f)[stat.ST_MODE])
04:58:24 UnicodeEncodeError: 'ascii' codec can't encode characters in position 48-49: ordinal not in range(128)
04:58:24 Build step 'Execute shell' marked build as failure
04:58:24 Finished: FAILURE

Michi Henning (michihenning) wrote :

It looks like bzrlib doesn't handle file names containing non-ASCII characters.

The problem is triggered by a test file in our bzr repo that contains Chinese characters in the file name.

Setting LANG=en_US.UTF-8 can be used to work around the issue (but I believe it should work even if LANG is not set).

Changed in jenkins-launchpad-plugin:
status: New → Invalid
Vincent Ladeuil (vila) wrote :

Hmm, by setting LANG to utf8 you're telling bzr how to interpret the pathname which otherwise cannot be decoded so I can't see what bzr can do there

The error could be clearer though.

Michi Henning (michihenning) wrote :

Are there any filesystems that allow non-ASCII in path names and do not use UTF-8? I'd be surprised if there are.

Vincent Ladeuil (vila) wrote :

> Are there any filesystems that allow non-ASCII in path names and do not use UTF-8? I'd be surprised if there are.

Well, that's the point, by chosing a non-ut8 locale you're declaring that you don't want to use utf8 to decode your pathnames...

Michi Henning (michihenning) wrote :

Yes, OK, I take your point. What this says though is that, unless bzrlib is called with a UTF-8 locale, it won't work. Shouldn't the methods in berlin that manipulate the filesystem specify the UTF-8 locale as a matter of course?

Looking at this: https://docs.python.org/3/howto/unicode.html#unicode-filenames

it says: "On Unix systems, there will only be a filesystem encoding if you’ve set the LANG or LC_CTYPE environment variables; if you haven’t, the default encoding is UTF-8."

So, if LANG isn't set, should bzrlib automatically assume UTF-8?

Vincent Ladeuil (vila) wrote :

In python3 yes, finally !

I don't have the reference handy but the bzr devs argued about having that default in python2 and... couldn't get it.

I think that's pretty much the status right now, handling that "assume-utf8 default" (i.e. changing the actual behaviour) is not trivial especially given the bzr promise that unicode filenames are supported across different OSes (including non-Unix ones).

The actual behaviour is that the user has to specify which file system encoding he wants, knowing that python2 default may not be correct in all cases.

And as you found out, this is under user control via the locale env vars.

On Tue, Jan 31, 2017 at 07:16:11AM -0000, Michi Henning wrote:
> Are there any filesystems that allow non-ASCII in path names and do not
> use UTF-8? I'd be surprised if there are.

Yes. On Unix, file names are simply byte strings, and it's up to higher
layers to interpret those. From this point in history that looks pretty
unfortunate, but it's nevertheless true. There are absolutely
real-world directory trees, some even in revision control, that contain
non-UTF-8-compatible file names.

I think LANG=C.UTF-8 (rather than the unnecessarily-specific
en_US.UTF-8) would be a good choice for j-l-p to set as its default when
invoking bzr.

Michał Sawicz (saviq) on 2017-04-05
Changed in jenkins-launchpad-plugin:
status: Invalid → Triaged
Changed in bzr:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers