pre-seeding lxd on Core appliances breaks console-conf user creation

Bug #1881588 reported by Oliver Grawert
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
Invalid
Undecided
Ian Johnson
subiquity
Fix Released
Undecided
Michael Hudson-Doyle
Core16
Won't Fix
Undecided
Unassigned
Core18
Fix Released
Undecided
Unassigned
Core20
Fix Released
Undecided
Unassigned

Bug Description

when seeding appliance images with lxd, user creation gets impossible.

console-conf skips the user creation, system-user assertions do not work either because there is already a user exisiting in the image.

the tty screen shows instructions to log in with "lxd@<IP ADDRESS>" ...

since the lxd user is a special case hack in Ubuntu Core images, "snap create-user ..." should probably learn to ignore its existence ...

Oliver Grawert (ogra)
description: updated
Revision history for this message
Ian Johnson (anonymouse67) wrote :

Reproduced with uc20 + console-conf from core20 snap. Adding a subiquity bug since there are some bugs with subiquity here. Namely, `snap create-user` actually works fine in this state, I added `dangerous systemd.debug-shell=1` to the kernel commandline for the uc20 VM, and got a root shell and manually ran `snap create-user` and it worked fine.

I think the main bug here is that console-conf is incorrectly detecting that a user was created previously. However, some additional bugs I see from this situation which may or may not be worth investigating or fixing since they only really arise because console-conf incorrectly thinks there is a user:

* after getting to the "This device is registered to ..." page, the email address is blank, i.e. it verbatim says "This device is registered to ."
* hitting enter on the "This device is registered" page brings you back to the start of console-conf where it says "Press enter to configure"

Also I double checked and when the lxd user is added via seeding like this, before manually creating my own user, `snap managed` returns false, while after creating a user normally through `snap create-user`, `snap managed` returns true, so perhaps that's the check that console-conf should do to see if it should prompt the user to create a user account on the system.

Changed in snapd:
status: New → Invalid
Oliver Grawert (ogra)
summary: - pre-seeding lxd on Core appliances breaks snap create-user
+ pre-seeding lxd on Core appliances breaks snap console-conf user
+ creation
summary: - pre-seeding lxd on Core appliances breaks snap console-conf user
- creation
+ pre-seeding lxd on Core appliances breaks console-conf user creation
Oliver Grawert (ogra)
affects: subiquity → subiquity (Ubuntu)
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Currently consoleconf is not using snapd apis to determine if the machine is managed, and what are the user ids managing the system.

Can snapd team please double check which commands / APIs should be used?

I.e. what shall we use to query if the machine is managed (and thus show/skip console-conf UX), and also what shall we use to query userids to show how to login (i.e. ssh xnox@ip)?

Is https://github.com/snapcore/snapd/wiki/REST-API#get-v2users and https://github.com/snapcore/snapd/wiki/REST-API#get-v2system-info the right things?

I think console-conf also still has shell scripts, are there `snap` command ways to figure that out?

Changed in snapd:
status: Invalid → New
Changed in subiquity:
status: New → Incomplete
Changed in subiquity (Ubuntu):
status: New → Invalid
Revision history for this message
Oliver Grawert (ogra) wrote :

i dont think there is a snap command to get the managed state (there is "snap known system-user" but that seems to only work if the user was actually created with an assertion)

if you actually want shell instead of a simple python http query:

root@pi4:~# cat is-managed.sh
#! /bin/sh

query_snapd() {
  RET="$(/bin/echo -e 'GET /v2/'"$1"' HTTP/1.0\r\n\r\n' | \
           nc -U /var/run/snapd.socket -q0 2>&1 | \
           grep -oP '(^.*"'"$2"'":)[^,]*' | \
           grep -o '[^:]*$')"
  echo "$RET" | sed 's/\]//g;s/\}//g;s/\"//g'
}

echo "System is managed: $(query_snapd system-info managed)"
echo "By user: $(query_snapd users email)"
root@pi4:~# ./is-managed.sh
System is managed: true
By user: <email address hidden>

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

waiting on more advice how console-conf (should or should not) query managed state, available users, and if those users can ssh in.

Changed in subiquity (Ubuntu Xenial):
status: New → Incomplete
Changed in subiquity (Ubuntu Bionic):
status: New → Incomplete
Revision history for this message
Ian Johnson (anonymouse67) wrote :

Assigning the snapd task to me, I need to chat with Samuele about how console-conf can use the snapd REST API to get the username of a created user (if there is one).

Changed in snapd:
assignee: nobody → Ian Johnson (anonymouse67)
Changed in snapd:
status: New → Triaged
Revision history for this message
Ian Johnson (anonymouse67) wrote :

I have discussed this and upon closer inspection, we actually we do have an API for console-conf to get the username of a managed device, but the logic that console-conf should follow is a bit more convoluted because eventually we may have devices that are managed (and thus should not allow configuring through console-conf), but do not have any users. So console-conf should not assume that a managed has users and that a device that has users is managed.

The flow that console-conf should follow is this:

if $(snap managed) is true:
    if $(snap /v2/users) is not nil:
        for user in $(snap /v2/users):
            if /home/$user/ exists:
                display "ssh user@IP..." forever
            # else fallback to saying there are not managed users
    display "device managed without user @ IP"
else:
    display console-conf setup screen

The first API request for determining if a device is managed can be done with an HTTP GET request to the snapd /v2/system-info endpoint after a device is seeded, looking at the result.managed key. This is the equivalent of running on the command line `snap managed`.

The second API request for determining the username of managing users can be done with an HTTP GET request to the snapd /v2/users endpoint, looking at the result key which is a list of user objects. console-conf should iterate over all of these users, starting with the user with the lowest ID (note that the ID here is not a UID, it is an internal tracking mechanism for snapd, essentially first-come first serve for created users), checking if the user has a /home directory, and if the user has a home directory then we presume that the user could login via SSH. The username key is the name of the user that should be displayed in "ssh <user>@IP".

@xnox does that all make sense?

Changed in snapd:
status: Triaged → Invalid
Changed in subiquity:
status: Incomplete → New
Changed in subiquity (Ubuntu Xenial):
status: Incomplete → Confirmed
Changed in subiquity (Ubuntu Bionic):
status: Incomplete → Confirmed
Changed in subiquity:
status: New → Confirmed
Revision history for this message
Stéphane Graber (stgraber) wrote :

This getting fixed in core18 (at least) is a blocker for 3 Ubuntu Core appliances we're currently preparing so it'd be great if we could get a fix rolled out soon!

tags: added: id-5ef638e8ed708765e909cfe8
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

IIRC one of the reasons console-conf-wrapper is implemented in shell is because some of our target platforms were too slow / memory constrained to run python as part of the boot process? Is that still true? Because this is starting to get a little complicated for shell now...

Revision history for this message
Oliver Grawert (ogra) wrote :

sadly this isnt true anymore (not that the slow boards got faster but we rather started to ignore that fact), all images nowadays run cloud-init by default so python is kind of mandatory and gets executed anyway ... with the focus on raspberry pi the actual embedded hardware is rather ignored.

Changed in subiquity:
status: Confirmed → In Progress
assignee: nobody → Michael Hudson-Doyle (mwhudson)
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I think https://github.com/CanonicalLtd/subiquity/pull/794 implements the logic described in comment #6 but I've entirely forgotten how to test this code!

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

There's something I don't understand about this bug though: it must be that having the lxd snap seeded causes "snap managed" to be "true". That seems wrong?

Revision history for this message
Oliver Grawert (ogra) wrote :

i was thinking the same when i looked at your patch (and noticed the move-around of the "snap managed" code)...

are we sure that version of the code is actually in core18 ?

Revision history for this message
Oliver Grawert (ogra) wrote :

answering my own question:

ogra@localhost:~$ grep managed /usr/share/subiquity/console-conf-wrapper
if [ "$(snap managed)" = "true" ]; then
ogra@localhost:~$ grep PRETTY /etc/os-release
PRETTY_NAME="Ubuntu Core 18"

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Yeah, so I think there's two bugs here:

1) having the lxd snap around should not make `snap managed` be true (this is a bug in snapd I think, although xnox said this might have been fixed?)

2) console-conf should never suggest that the user logs in as lxd@

My branch fixes 2). If 1) unexpectedly turns out not to be a bug, we can cope with that in console-conf too, but I'd rather hear from someone on the snapd team first about that.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Let's re-open the snapd side of this then.

Changed in snapd:
status: Invalid → New
Revision history for this message
Stéphane Graber (stgraber) wrote :

cat /var/lib/snapd/state.json | jq .data.auth.users

^ this would be interesting to see as my understanding is that this being "null" means managed is false whereas if it shows something, then managed is true

Changed in snapd:
status: New → Invalid
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

So I was confused here, I thought that console-conf wasn't running but it is running but not letting you configure an owner. My branch should still help with this.

Revision history for this message
Stéphane Graber (stgraber) wrote :

Yeah, the new logic should work fine.

An alternative for the older logic would be to skip uid < 1000 so we ignore system users.

Applying something like this worked here:

--- /usr/share/subiquity/console_conf/controllers/identity.py 2018-08-07 15:07:53.000000000 +0000
+++ identity.py 2020-07-03 03:12:47.405925882 +0000
@@ -58,9 +58,13 @@ def get_device_owner():
     except FileNotFoundError:
         return None
     with extrausers_fp:
- passwd_line = extrausers_fp.readline()
- if passwd_line and len(passwd_line) > 0:
- passwd = passwd_line.split(':')
+ for line in extrausers_fp:
+ line = line.strip()
+ if not line:
+ continue
+ passwd = line.split(':')
+ if int(passwd[2]) < 1000:
+ continue
             result = {
                 'realname': passwd[4].split(',')[0],
                 'username': passwd[0],

Revision history for this message
Stéphane Graber (stgraber) wrote :

https://paste.ubuntu.com/p/Y9gY9x9w3y/ for more correct indent :)

Revision history for this message
Oliver Grawert (ogra) wrote :

while ignoring UID's below 1000 might help in the short term, the "managed" state has a lot more meanings in brand stores and we will likely encounter more breakage introduced by it when customers pre-seed lxd on branded device images so this side of things needs to still be examined. the lxd user should never turn the device to managed state.

i cant see any code in the lxd snap that would create it, where exactly does the user creation even happen ?

Revision history for this message
Oliver Grawert (ogra) wrote :

ok, answering myself again:

https://github.com/lxc/lxd-pkg-snap/blob/latest-edge/snapcraft/commands/daemon.start#L192

which simply calls:

chroot /var/lib/snapd/hostfs/ \
  useradd --system -M -N --home /var/snap/lxd/common/lxd \
  --shell /bin/false --extrausers lxd || true

i dont see how the system can end up managed here ...

Revision history for this message
Ian Johnson (anonymouse67) wrote :

To be clear from my first comment, when seeding the lxd snap, `snap managed` is false, at least on UC20 and UC18. I did not have a chance to test UC16, but it should not be managed on UC16 either... If it is that is indeed a snapd bug, but absent evidence of that I don't think that there is a snapd bug here

Revision history for this message
Ian Johnson (anonymouse67) wrote :

I verified that mwhudson's PR is sufficient for UC20 at least, provided it is changed slightly to account for the actual JSON structure. I built a core20 snap with the new python dependencies that console-conf has, and updated the things that the PR changes in the core20 snap and built a snap with the lxd, core18, snapd, pc, core20, and pc-kernel snaps seeded and I was able to use console-conf to create a user. I also verified that the lxd snap had created the lxd user entry in extrausers passwd before running console-conf and it created lxd first, verifying the fix.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Do we need this in the core snap too, or just core18 images?

I.e. are all appliance images core18+snapd based?

Changed in subiquity (Ubuntu Bionic):
status: Confirmed → In Progress
Changed in subiquity:
status: In Progress → Fix Released
no longer affects: subiquity (Ubuntu Bionic)
no longer affects: subiquity (Ubuntu Xenial)
no longer affects: subiquity (Ubuntu)
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Updated target series to hopefully add clarity.

Code merged into git.

Fix available on core20 snap edge channel.

Fix is due to be available on core18 snap edge channel soon.

Not preparing updates for core snap yet, please clarify if this is needed.

Revision history for this message
Stéphane Graber (stgraber) wrote :

I'd expect that having core18 and core20 fixed should be enough for this.

Revision history for this message
Oliver Grawert (ogra) wrote :

i'd also only be interested in core18 and onward

i'm also in the "don't build anything new with UC16" camp in general but am not sure if we don't break a promise if we leave it unfixed there ...

Revision history for this message
Ian Johnson (anonymouse67) wrote :

I tested the edge channel of the core20 snap with UC20 and this bug is fixed there.

Revision history for this message
Ian Johnson (anonymouse67) wrote :

IMHO, if the fix is not complicated to backport, we should apply the same fix to the core snap / UC16.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.