Fails to create user when uid is in use

Bug #1322549 reported by James Westby
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mojo: Continuous Delivery for Juju
Confirmed
Medium
Unassigned

Bug Description

2014-05-23 10:59:35 [INFO] Running script build
2014-05-23 10:59:36 [INFO] Running command in container 'click-appstore-web.precise': sudo -E -u root getent passwd jw2328
2014-05-23 10:59:37 [INFO] Running command in container 'click-appstore-web.precise': sudo -E -u root useradd --home /home/jw2328 --create-home --uid 1000 jw2328
useradd: UID 1000 is not unique
Traceback (most recent call last):
  File "/usr/bin/mojo", line 9, in <module>
    load_entry_point('mojo==0.1.6', 'console_scripts', 'mojo')()
  File "/usr/lib/python2.7/dist-packages/mojo/cli.py", line 408, in main
    args.func(args)
  File "/usr/lib/python2.7/dist-packages/mojo/cli.py", line 189, in run_from_manifest
    manifest.run(project, workspace, args.series, args.stage)
  File "/usr/lib/python2.7/dist-packages/mojo/__init__.py", line 146, in run
    phase.run(project, workspace, series, stage)
  File "/usr/lib/python2.7/dist-packages/mojo/phase.py", line 240, in run
    lxc=True, network=False)
  File "/usr/lib/python2.7/dist-packages/mojo/phase.py", line 221, in run
    lxc.run(script, env=env, network=network)
  File "/usr/lib/python2.7/dist-packages/mojo/contain.py", line 32, in wrapped
    return method(self, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/mojo/contain.py", line 318, in run
    self.utils.create_user(user)
  File "/usr/lib/python2.7/dist-packages/mojo/contain.py", line 436, in create_user
    raise e
subprocess.CalledProcessError: Command '['sudo', 'lxc-start', '-n', 'click-appstore-web.precise', '--share-net', '1', '--', 'sudo', '-E', '-u', 'root', 'useradd', '--home', '/home/jw2328', '--create-home', '--uid', '1000', 'jw2328']' returned non-zero exit status 1

So it found the user didn't exist, and tried to add it, failing because the uid was in use.

        # if the uid exists update the name to match the caller
        try:
            command = "useradd --home {} --create-home --uid {} {}" \
                "".format(home, uid, username)
            self.container.run(command, user="root")
        except subprocess.CalledProcessError as e:
            if e.returncode == 4:
                command = "cp /etc/sudoers.d/ubuntu-all " \
                    "/etc/sudoers.d/{}-all".format(username)
                 ...
            else:
                 raise e

is the code, so it appears to try and handle this.

However the useradd exited with returncode 1.

Thanks,

James

Related branches

Revision history for this message
Ricardo Kirkner (ricardokirkner) wrote :

UID 1000 is assigned to the first user that is created in the system. On a standard desktop setup that is usually your user; on the base image for the lxc containers, the default user is 'ubuntu', which causes the duplicate uid.

Changed in canonical-mojo:
status: New → Confirmed
Tom Haddon (mthaddon)
information type: Proprietary → Public
affects: canonical-mojo → mojo
Tom Haddon (mthaddon)
Changed in mojo:
importance: Undecided → Medium
Revision history for this message
Tom Haddon (mthaddon) wrote :

The fix here is trivial "if e.returncode in [1, 4]:", but we need to understand why it's returning a different error code and if these are the only two possibilities for return codes that we should be catching and handling in this way.

Paul Collins (pjdc)
Changed in mojo:
assignee: nobody → Paul Collins (pjdc)
Revision history for this message
Paul Collins (pjdc) wrote :

After some experimentation on a machine running 14.04 LTS with LXC 1.1.5 (trusty shipped 1.0.x, but this machine has 1.1.5 to support xenial containers) I believe the problem is that if the command passed to lxc-start exits non-zero, lxc-start decides this means that the container failed to initialize and exits 1, e.g.:

pjdc@wekufe:~$ sudo lxc-start -n mojo-is-mojo-how-to.trusty -F --share-net 1 -- sh -c 'exit 0' ; echo $?
0
pjdc@wekufe:~$ sudo lxc-start -n mojo-is-mojo-how-to.trusty -F --share-net 1 -- sh -c 'exit 1' ; echo $?
lxc-start: lxc_start.c: main: 344 The container failed to start.
lxc-start: lxc_start.c: main: 348 Additional information can be obtained by setting the --logfile and --logpriority options.
1
pjdc@wekufe:~$ sudo lxc-start -n mojo-is-mojo-how-to.trusty -F --share-net 1 -- sh -c 'exit 4' ; echo $?
lxc-start: lxc_start.c: main: 344 The container failed to start.
lxc-start: lxc_start.c: main: 348 Additional information can be obtained by setting the --logfile and --logpriority options.
4
pjdc@wekufe:~$ _

Mojo's LXCContainer.run() defaults stderr to None, which is why we never see the lxc-start error messages.

When running on vivid and later, LXCContainer starts the container and then uses lxc-attach to invoke commands, which properly preserves the exit statusand this also seems to work on our 14.04+1.1.5 box, e.g.:

pjdc@wekufe:~$ sudo lxc-attach -n mojo-is-mojo-how-to.trusty -- sh -c 'exit 4' ; echo $?
4
pjdc@wekufe:~$ _

Possible fixes: 1) unconditionally use lxc-start + lxc-attach; 2) parse the output from the command instead of relying on the exit status.

At first blush, I'd say #1 is the better option, and has the added benefit of making Mojo behave more consistently across Ubuntu releases, but maybe there are good reasons to prefer lxc-start on releases before vivid.

Revision history for this message
Paul Collins (pjdc) wrote :

Digging back in the Mojo history, there used to be a comment in mojo/contain.py that said:

- # The precise kernel does not have a feature needed to use lxc-attach
- # unless using the HWE kernel. lxc-execute requires that lxc be
- # installed inside the lxc container. Therefore we need to bootstrap
- # the lxc

which was revised in r101 to say

+ # lxc < 1.0 and the precise kernel do not have a feature needed to use
+ # lxc-start, lxc-execute requires that lxc be installed inside the lxc
+ # container.

which I'm not sure what to make of, since we clearly use lxc-start for everything earlier than vivid. But I doubt anyone is still using Mojo on precise, and precise only has a couple of months to live anyway.

Paul Collins (pjdc)
Changed in mojo:
status: Confirmed → In Progress
Paul Collins (pjdc)
Changed in mojo:
status: In Progress → Fix Committed
Paul Collins (pjdc)
Changed in mojo:
status: Fix Committed → Fix Released
Revision history for this message
Paul Collins (pjdc) wrote :

I've reverted these fixes as they cause problems on trusty. We'll revisit when our CI environments are on xenial.

Changed in mojo:
status: Fix Released → Confirmed
Paul Collins (pjdc)
Changed in mojo:
assignee: Paul Collins (pjdc) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.