Juju snap can no longer interact with LXD in devmode

Bug #1613845 reported by Nicholas Skaggs
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
snap-confine
Fix Released
High
Zygmunt Krynicki
snap-confine (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

Snaps running in devmode cannot interact with LXD installed in the classic distribution. This happens because the chroot in which all snaps execute there is no /var/lib/lxd directory (it is not a part of the core snap).

That directory doesn't exist so it cannot be bind-mounted from the classic distribution. Without access to this directory there's no way to access the lxd socket located inside.

This bug is fixed by adding a quirk system where snap-confine can mount tmpfs over /var/lib and populate that tmpfs with a forest of bind mounts to the contents of /var/lib in the core snap. This leaves us with a tmpfs, not a read only squashfs so /var/lib/lxd can be now created and bind mounted on demand.

For more information about the execution environment, please see this article http://www.zygoon.pl/2016/08/snap-execution-environment.html

[Test Case]

The test case can be found here:

https://github.com/snapcore/snap-confine/blob/master/spread-tests/regression/lp-1613845/task.yaml

The test case is ran automatically for each pull request and for each final release. It can be reproduced manually by executing the shell commands listed in the prepare/execute/restore phases manually.
The commands there assume that snapd and snap-confine are installed.
No other additional setup is necessary.

[Regression Potential]

 * Regression potential is small but the code change is more invasive so careful review and testing is recommended. The way this feature operates may interact with the namespace sharing feature that is introduced in 1.0.41.

As a known limitation (namespace sharing is not yet finalised and will be extended to support live mutation in subsequent releases) if the /var/lib/lxd directory does *not* exist on the classic distribution before a snap that wishes to use it is first started it will not be able to see the directory until the machine is re-started. In subsequent releases of snap-confine, snapd and snap-confine will collaborate to modify existing namespaces in reaction to changes in the mount configuration profile. At that time we can also investigate if quirks need to be adjusted in response to changes in the system.

* The fix was tested on Ubuntu via spread.

[Other Info]

* This bug is a part of a major SRU that brings snap-confine in Ubuntu 16.04 in line with the current upstream release 1.0.41.

* snap-confine is technically an integral part of snapd which has an SRU exception and is allowed to introduce new features and take advantage of accelerated procedure. For more information see https://wiki.ubuntu.com/SnapdUpdates

== # Pre-SRU bug description follows # ==

The juju snap package can no longer use LXD as a substrate, presumably because of changes to bind mounts. To replicate, assuming you have LXD installed and configured:

snap install juju --beta --devmode
/snap/bin/juju bootstrap lxd lxd

This command should complete successfully and did work until recently. Now, instead you get;

ERROR invalid config: can't connect to the local LXD server: LXD socket not found; is LXD installed & running?

Please install LXD by running:
 $ sudo apt-get install lxd
and then configure it with:
 $ newgrp lxd
 $ lxd init

Tags: conjure
description: updated
Revision history for this message
Adam Stokes (adam-stokes) wrote :

Broken for me as well.

tags: added: conjure
Revision history for this message
Nicholas Skaggs (nskaggs) wrote :

Explanation as to what caused this to break from zyga via IRC:

/var/lib is not bind mounted so you get what you'd get in an all-snap system (read only content of the core snap). Services should use /run or for sockets AFAIR (sadly we cannot support arbitrary locations)... the change is the use of chroot, in the past we bind mounted a few directories from the core snap to the root fs. Now we chroot to the core snap and bind mount some things from there

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

This is caused by the reliance of the juju snap on the pre-chroot layout where more parts of the host distribution were bleeding through the filesystem. This is never about devmode or not (that defines confinement, not the filesystem layout).

I have an idea on how to fix this, I will post an update here with pull request URL.

Changed in snap-confine:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Zygmunt Krynicki (zyga)
milestone: none → 1.0.40
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

This is fixed by the following pull request: https://github.com/snapcore/snap-confine/pull/109

Changed in snap-confine:
status: Confirmed → In Progress
status: In Progress → Fix Committed
Zygmunt Krynicki (zyga)
Changed in snap-confine:
status: Fix Committed → Fix Released
Revision history for this message
Nicholas Skaggs (nskaggs) wrote :

Zygmunt, can we get this into xenial? I can confirm 1.0.40 in yakkety works again. Xenial still does not, even with the proposed package 1.0.38-0ubuntu0.16.04.8

Revision history for this message
Adam Israel (aisrael) wrote :

I'm running into the same issue as Nicholas, missing the fix for Xenial.

Zygmunt Krynicki (zyga)
description: updated
Changed in snap-confine (Ubuntu):
status: New → Fix Released
Changed in snap-confine (Ubuntu Xenial):
status: New → In Progress
Revision history for this message
Adam Israel (aisrael) wrote :

snap-confine version 1.0.38-0ubuntu0.16.04.10 is in Xenial now, and seems to fix the issue for me. Can someone confirm the fix has been released?

Revision history for this message
Nicholas Skaggs (nskaggs) wrote :

1.0.41 was backported which has the fix. This should be released.

Changed in snap-confine (Ubuntu Xenial):
status: In Progress → Fix Released
Revision history for this message
Leo Arias (elopio) wrote :

@balloons, this hasn't been yet released to xenial. On xenial the latest snap-confine is 1.0.38-0ubuntu0.16.04.10

Revision history for this message
Leo Arias (elopio) wrote :

I verified that the snap-confine version in xenial-proposed fixes this bug:

I tested this in a clean xenial kvm, with snap-confine 1.0.38:

elopio@ubuntu-xenial:/$ snapd-hacker-toolbelt.busybox cat /var/lib/lxd/canary
cat: can't open '/var/lib/lxd/canary': No such file or directory
elopio@ubuntu-xenial:/$ /snap/bin/juju bootstrap lxd lxd
ERROR creating LXD client: can't connect to the local LXD server: LXD socket not found; is LXD installed & running?

Please install LXD by running:
 $ sudo apt-get install lxd
and then configure it with:
 $ newgrp lxd
 $ lxd init

I enabled proposed and upgraded to 1.0.43-0ubuntu1~16.04.1
elopio@ubuntu-xenial:/$ sudo apt install snap-confine
[...]
elopio@ubuntu-xenial:/$ snapd-hacker-toolbelt.busybox cat /var/lib/lxd/canary
test
elopio@ubuntu-xenial:/$ /snap/bin/juju bootstrap lxd lxd
Creating Juju controller "lxd" on lxd/localhost
Looking for packaged Juju agent version 2.0-rc3 for amd64
Launching controller instance(s) on lxd/localhost...
 - copying image for ubuntu-xenial from https://cloud-images.ubuntu.com/releases
[...]

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

We are considering discarding the quirk system behind this fix in 2.37 as we believe the bug no longer applies. The relevant snapd pull request is https://github.com/snapcore/snapd/pull/6123

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.