UnicodeEncodeError when creating user with non-ascii chars

Bug #1751051 reported by Andreas Hasenack on 2018-02-22
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Medium
Unassigned
cloud-init (Ubuntu)
Undecided
Unassigned
livecd-rootfs (Ubuntu)
Undecided
Michael Hudson-Doyle

Bug Description

I was testing subiquity, and at the user creation prompt typed in "André D'Silva" for the username, and just "andre" for the login.

The installer finished fine, but upon first login I couldn't login. Booting into rescue mode showed me that the user had not been created.

Checking cloud-init logs, I find the UnicodeEncodeError.
2018-02-22 12:44:01,386 - __init__.py[DEBUG]: Adding user andre
2018-02-22 12:44:01,387 - util.py[WARNING]: Failed to create user andre
2018-02-22 12:44:01,387 - util.py[DEBUG]: Failed to create user andre
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/distros/__init__.py", line 463, in add_user
    util.subp(adduser_cmd, logstring=log_adduser_cmd)
  File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 1871, in subp
    env=env, shell=shell)
  File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1275, in _execute_child
    restore_signals, start_new_session, preexec_fn)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 4: ordinal not in range(128)

user-data contains this:
#cloud-config
hostname: sbqt
users:
- gecos: "Andr\xE9 D'Silva"
  groups: [adm, cdrom, dip, lpadmin, plugdev, sambashare, debian-tor, libvirtd, lxd,
    sudo]
  lock-passwd: false
  name: andre
  passwd: $6$UaxxahbQam4Ko1g7$WB5tNuCR84DvWwI7ovxDiofIdLP47pG2USPel2iIQV/qzzT3pAb1VtlbelCR2iCNRxCoJgsVafcNtqdfz1/IL1
  shell: /bin/bash
  ssh_import_id: ['lp:ahasenack']

cloud-init is 17.2-34-g644048e3-0ubuntu1 from bionic/main.

Related branches

Scott Moser (smoser) wrote :

I think the issue is:
a.) there is no default locale set in the subiquity installed system.
b.) python3 subprocess is doing a 'decode' for each argument in the
command list.
python2 default encoding *is* supposed to be based on the environment [1],
but python3 default encoding is not. python3 is supposed to be utf-8.
In the trace above we are down in C code where it is clearly doing 'ascii'
encoding.

 [1] https://docs.python.org/2/library/sys.html?highlight=getdefaultencoding#sys.getdefaultencoding
 [2] https://docs.python.org/3/library/stdtypes.html?highlight=decode#str.encode

You can see the problem generally below. I only use 'json' as a convienent
way to pass in utf-8 characters. You can see that either unset LANG
or LANG=C causes the issue.

I guess I never thought that subprocess would be converting an argument
list of strings to bytes. That does make some sense.

So I think there are actually two changes:
a.) subiquity (via either curtin or cloud-init) should be setting a utf-8
default locale (all ubuntu generally do that). I'm not sure why the image
being installed didnt have one set.

b.) cloud-init's subp should probably just do the conversion to bytes
of whatever it gets as an argument list for the command, and always assume
that strings are to be encoded as utf-8.

$ cat go.py
#!/usr/bin/python3
import json, subprocess, sys
cmd = json.loads(sys.argv[1])
print("cmd=%s" % [x.encode("utf-8") for x in cmd])
subprocess.check_call(cmd)

# my default lang is en_US.utf-8
$ ./go.py '["echo", "Andr\u00e9 DSilva"]'
cmd=[b'echo', b'Andr\xc3\xa9 DSilva']
André DSilva

$ LANG=en_US.utf-8 ./go.py '["echo", "Andr\u00e9 DSilva"]'
cmd=[b'echo', b'Andr\xc3\xa9 DSilva']
André DSilva

$ env -u LANG ./go.py '["echo", "Andr\u00e9 DSilva"]'
cmd=[b'echo', b'Andr\xc3\xa9 DSilva']
Traceback (most recent call last):
  File "./go.py", line 5, in <module>
    subprocess.check_call(cmd)
  File "/usr/lib/python3.6/subprocess.py", line 286, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib/python3.6/subprocess.py", line 267, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1275, in _execute_child
    restore_signals, start_new_session, preexec_fn)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 4: ordinal not in range(128)

$ LANG=C ./go.py '["echo", "Andr\u00e9 DSilva"]'
cmd=[b'echo', b'Andr\xc3\xa9 DSilva']
Traceback (most recent call last):
  File "./go.py", line 5, in <module>
    subprocess.check_call(cmd)
  File "/usr/lib/python3.6/subprocess.py", line 286, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib/python3.6/subprocess.py", line 267, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1275, in _execute_child
    restore_signals, start_new_session, preexec_fn)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 4: ordinal not in range(128)

Changed in cloud-init:
status: New → Confirmed
importance: Undecided → Medium
Changed in cloud-init (Ubuntu):
status: New → Confirmed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 18.1-5-g40e77380-0ubuntu1

---------------
cloud-init (18.1-5-g40e77380-0ubuntu1) bionic; urgency=medium

  * New upstream snapshot.
    - GCE: fix reading of user-data that is not base64 encoded. (LP: #1752711)
    - doc: fix chef install from apt packages example in RTD.
    - Implement puppet 4 support [Romanos Skiadas] (LP: #1446804)
    - subp: Fix subp usage with non-ascii characters when no system locale.
      (LP: #1751051)
    - salt: configure grains in grains file rather than in minion config.
      [Daniel Wallace]

 -- Chad Smith <email address hidden> Thu, 01 Mar 2018 15:47:04 -0700

Changed in cloud-init (Ubuntu):
status: Confirmed → Fix Released
Steve Langasek (vorlon) wrote :

We need to make sure that the default locale when booting a subiquity image is C.UTF-8, not C. This probably needs fixing in livecd-rootfs and I don't think there are any code changes for subiquity.

affects: subiquity → livecd-rootfs
affects: livecd-rootfs → livecd-rootfs (Ubuntu)
tags: added: id-5a9fa29eda1dc1b22307ed30
Scott Moser (smoser) on 2018-03-08
Changed in cloud-init:
status: Confirmed → Fix Committed
Changed in livecd-rootfs (Ubuntu):
status: New → In Progress
assignee: nobody → Michael Hudson-Doyle (mwhudson)
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package livecd-rootfs - 2.515

---------------
livecd-rootfs (2.515) bionic; urgency=medium

  * Set the default locale to C.UTF-8 in all server and cloud images.
   (LP: #1751051, #1759003)

 -- Michael Hudson-Doyle <email address hidden> Tue, 27 Mar 2018 09:59:02 +1300

Changed in livecd-rootfs (Ubuntu):
status: In Progress → Fix Released

This bug is believed to be fixed in cloud-init in 18.2. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers