`snap run lxd.activate` in a disco lxd container fails after reboot

Bug #1824851 reported by Dan Watkins
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
AppArmor
New
Undecided
Unassigned
snapd
New
Undecided
Zygmunt Krynicki

Bug Description

In cloud-init integration testing, we are seeing failures to start the snap.lxd.activate.service properly. Working through the layers, it boils down to the systemd unit running `snap run lxd.activate` and I can reproduce the issue when I run the same command in a root shell:

```
# snap run lxd.activate
cannot change profile for the next exec call: No such file or directory
snap-update-ns failed with code 1: No such file or directory
```

A minimal reproducer (from comment #2):

```
lxc launch ubuntu-daily:d reproducer
lxc exec reproducer -- cloud-init status --wait
lxc exec reproducer -- snap run lxd.activate # this should succeed
lxc exec reproducer -- reboot
lxc exec reproducer -- cloud-init status --wait
lxc exec reproducer -- snap run lxd.activate # this should fail
```

Revision history for this message
Dan Watkins (oddbloke) wrote :

http://paste.ubuntu.com/p/P8sdxBSF4P/ is the output of `strace snap run lxd.activate`.

description: updated
Zygmunt Krynicki (zyga)
affects: snappy → snapd
Changed in snapd:
assignee: nobody → Zygmunt Krynicki (zyga)
Revision history for this message
Dan Watkins (oddbloke) wrote :

A minimal reproducer appears to be:

```
lxc launch ubuntu-daily:d reproducer
lxc exec reproducer -- cloud-init status --wait
lxc exec reproducer -- snap run lxd.activate # this should succeed
lxc exec reproducer -- reboot
lxc exec reproducer -- cloud-init status --wait
lxc exec reproducer -- snap run lxd.activate # this should fail
```

summary: - snap-update-ns failure on `snap run lxd.activate`
+ `snap run lxd.activate` in a disco lxd container fails after reboot
description: updated
Revision history for this message
Dan Watkins (oddbloke) wrote :

I've updated the description with the reproducer from #2; the original reproducer using the cloud-init integration tests is:

```
lxd init --auto
sudo apt update
sudo apt install tox
git clone https://git.launchpad.net/cloud-init
cd cloud-init
tox -e citest -- run --verbose --data-dir results --os-name disco --test modules/lxd_dir.yaml --preserve-instance
```

This will leave you with a stopped lxd container (named something like cloud-test-ubuntu-disco-modules-lxd-dir-l4iikcl4cb8eu6k3ttega9o) which reproduces the issue.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Can you please retry the strace with -f?

From what I can see so far when we are about to execute snap-update-ns we fail to change to the apparmor profile "snap-update-ns.lxd" (though I don't see the exact data, presumably due to missing -f).

Is the container used by cloud-init tests a privileged container?

Changed in snapd:
status: New → Incomplete
Revision history for this message
Dan Watkins (oddbloke) wrote :

(I've just tested and couldn't reproduce this in a ubuntu-daily:cosmic container.)

Revision history for this message
Dan Watkins (oddbloke) wrote :
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Note that perhaps this is a ordering issue?

When snap-confine executes, it assumes that the apparmor profile for snap-update-ns.$SNAP_NAME is loaded into the kernel. Those profiles are stored in /var/lib/snapd/apparmor/profiles and are written and loaded by snapd. The profiles are also loaded on very early boot by apparmor.service

If you still have the broken container, can you see if /var/lib/snapd/apparmor/profiles/snap-update-ns.lxd exists?

Revision history for this message
Dan Watkins (oddbloke) wrote :

I don't know off-hand whether the container is privileged or not, but I do know that this is a regression from cosmic to disco in whichever is used (because we test cloud-init HEAD with identical test code on cosmic and disco).

(I'll ask for clarification from other team members.)

Revision history for this message
Dan Watkins (oddbloke) wrote :

/var/lib/snapd/apparmor/profiles/snap-update-ns.lxd does exist. Note that this issue persists beyond the call in the snap.lxd.activate.service unit; whenever I run the command manually I can reproduce the failure.

Changed in snapd:
status: Incomplete → New
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Thank you for providing updates strace output. As suspected:

[pid 270] openat(AT_FDCWD, "/proc/270/attr/exec", O_WRONLY) = 8
[pid 270] write(8, "exec snap-update-ns.lxd", 23) = -1 ENOENT (No such file or directory)

A helper child process attempts to switch to snap-update-ns.lxd apparmor profile but fails to do so. Can you also provide the output of "sudo aa-status"?

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Interesting, so snapd created the apparmor profile but it is, apparently, not loaded into the kernel. Perhaps apparmor inside a disco container misbehaves and assumes (somehow) that apparmor is disabled and no action should be taken?

Could you provide logs for the apparmor.service from inside the container?

Changed in snapd:
status: New → In Progress
Revision history for this message
Joshua Powers (powersj) wrote :

No, these are not privileged

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Please update the bug report with the output from aa-status and from the logs of the apparmor.service

Changed in snapd:
status: In Progress → Incomplete
Revision history for this message
Dan Watkins (oddbloke) wrote :

```
$ lxc exec reproducer -- journalctl -u apparmor.service
-- Logs begin at Mon 2019-04-15 17:03:46 UTC, end at Mon 2019-04-15 17:19:29 UTC. --
Apr 15 17:03:49 reproducer systemd[1]: Starting Load AppArmor profiles...
Apr 15 17:03:49 reproducer apparmor.systemd[105]: Not starting AppArmor in container
Apr 15 17:03:49 reproducer systemd[1]: Started Load AppArmor profiles.
Apr 15 17:04:27 reproducer systemd[1]: Starting Load AppArmor profiles...
Apr 15 17:04:27 reproducer apparmor.systemd[116]: Not starting AppArmor in container
Apr 15 17:04:27 reproducer systemd[1]: Started Load AppArmor profiles.
```

Note, though, that the first "Not starting AppArmor in container" was from the first boot where `snap run lxd.activate` _does_ work:

```
$ lxc launch ubuntu-daily:d reproducer2
Creating reproducer2
Starting reproducer2
$ lxc exec reproducer2 /bin/bash

root@reproducer2:~# journalctl -u apparmor.service
-- Logs begin at Mon 2019-04-15 17:38:54 UTC, end at Mon 2019-04-15 17:39:06 UTC. --
Apr 15 17:38:59 reproducer2 systemd[1]: Starting Load AppArmor profiles...
Apr 15 17:38:59 reproducer2 apparmor.systemd[104]: Not starting AppArmor in container
Apr 15 17:38:59 reproducer2 systemd[1]: Started Load AppArmor profiles.
root@reproducer2:~# snap run lxd.activate
=> Starting LXD activation
==> Loading snap configuration
==> Checking for socket activation support
==> Setting LXD socket ownership
==> LXD never started on this system, no need to start it now
```

Revision history for this message
Dan Watkins (oddbloke) wrote :

aa-status on first boot:

```
apparmor module is loaded.
15 profiles are loaded.
15 profiles are in enforce mode.
   /snap/core/6673/usr/lib/snapd/snap-confine
   /snap/core/6673/usr/lib/snapd/snap-confine//mount-namespace-capture-helper
   snap-update-ns.core
   snap-update-ns.lxd
   snap.core.hook.configure
   snap.lxd.activate
   snap.lxd.benchmark
   snap.lxd.buginfo
   snap.lxd.check-kernel
   snap.lxd.daemon
   snap.lxd.hook.configure
   snap.lxd.hook.install
   snap.lxd.lxc
   snap.lxd.lxd
   snap.lxd.migrate
0 profiles are in complain mode.
0 processes have profiles defined.
0 processes are in enforce mode.
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.
```

aa-status on second boot:

```
apparmor module is loaded.
0 profiles are loaded.
0 profiles are in enforce mode.
0 profiles are in complain mode.
0 processes have profiles defined.
0 processes are in enforce mode.
0 processes are in complain mode.
0 processes are unconfined but have a profile defined.
```

Changed in snapd:
status: Incomplete → New
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

It seems to me that the bug is in apparmor. Perhaps apparmor regressed in loading profiles from /var/lib/snapd/apparmor/profiles on boot?

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I was made aware of https://bugs.launchpad.net/apparmor/+bug/1824812 and it seems this is a duplicate.

Revision history for this message
Dan Watkins (oddbloke) wrote :

Yep, I agree; marked as duplicate.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

This looks to be like bug #1824812

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.