services start before apparmor profiles are loaded

Bug #1871148 reported by Daniel Llewellyn on 2020-04-06
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
AppArmor
Undecided
Unassigned
snapd
High
Zygmunt Krynicki
apparmor (Ubuntu)
Critical
Jamie Strandboge
Focal
Critical
Jamie Strandboge
snapd (Ubuntu)
High
Michael Vogt
Focal
High
Michael Vogt
zsys (Ubuntu)
Undecided
Jean-Baptiste Lallement
Focal
Undecided
Jean-Baptiste Lallement

Bug Description

Per discussion with Zyga in #snapd on Freenode, I have hit a race condition where services are being started by the system before apparmor has been started. I have a complete log of my system showing the effect somewhere within at https://paste.ubuntu.com/p/Jyx6gfFc3q/. Restarting apparmor using `sudo systemctl restart apparmor` is enough to bring installed snaps back to full functionality.

Previously, when running any snap I would receive the following in the terminal:

---
cannot change profile for the next exec call: No such file or directory
snap-update-ns failed with code 1: File exists
---

Updated to add for Jamie:

$ snap version
snap 2.44.2+20.04
snapd 2.44.2+20.04
series 16
ubuntu 20.04
kernel 5.4.0-21-generic

Zygmunt Krynicki (zyga) wrote :

This was also reported by Alan Pope.

I think we should consider adding After=apparmor.service to all service units.

Changed in snapd:
status: New → Confirmed
importance: Undecided → High
Jamie Strandboge (jdstrand) wrote :

Note that apparmor is supposed to start very early in the boot process. What distribution and release is this on?

Jamie Strandboge (jdstrand) wrote :

FYI, the apparmor unit has:

After=local-fs.target

It would be interesting to see 'sudo systemd-analyze plot > ./1871148.svg'

Daniel Llewellyn (diddledan) wrote :

Jamie, I've added the following info to the description:

$ snap version
snap 2.44.2+20.04
snapd 2.44.2+20.04
series 16
ubuntu 20.04
kernel 5.4.0-21-generic

description: updated

Yes, I had this on 20.04, not had a chance to report the bug yet. Had to re-run apparmor in order to unbreak all my snaps.

Jamie Strandboge (jdstrand) wrote :

AFAICS, nothing is explicitly depending on apparmor to have finished, though it will tend to finish due to when it runs. If it takes awhile or systemd starts running parallel jobs, the daemons may not start.

That doesn't seem to sufficiently describe this situation. From Daniel's paste:

Apr 06 16:32:56 defiant systemd[1]: Finished Load AppArmor profiles.
...
Apr 06 16:32:58 defiant audit[3342]: AVC apparmor="DENIED" operation="change_onexec" info="label not found" error=-2 profile="/usr/lib/snapd/snap-confine" name="snap.multipass.multipassd" pid=3342 comm="snap-confine"
Apr 06 16:32:58 defiant multipass.multipassd[3342]: cannot change profile for the next exec call: No such file or directory

This multipass denial is the first denial caused by this bug, but it happened *after* the apparmor profiles finished loading. It seems like at the time of the denial, /var/lib/snapd/apparmor/profiles might've been (partially?) empty and so the cache was cleared.

Zygmunt Krynicki (zyga) wrote :

Apr 06 16:32:56 defiant systemd[1]: Starting Load AppArmor profiles...

Apr 06 16:32:56 defiant systemd[1]: Finished Load AppArmor profiles.

^ this supposedly takes zero seconds, it doesn't load any snap profiles.

Then two seconds later:

Apr 06 16:34:58 defiant audit[11920]: AVC apparmor="DENIED" operation="change_onexec" info="label not found" error=-2 profile="/usr/lib/snapd/snap-confine" name="snap-update-ns.telegram-desktop" pid=11920 comm="snap-confine"

Scanning the log doesn't show that this profile is ever loaded.

Some other profiles are loaded, though not at the time that is inside the scope of apparmor.service

Apr 06 16:35:01 defiant kernel: audit: type=1400 audit(1586187301.952:43): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/snap/core/8935/usr/lib/snapd/snap-confine" pid=12025 comm="apparmor_parser"

This is loaded over three minutes later. It could be from a manual intervention (the reporter was instructed to reload apparmor.service)

Jamie Strandboge (jdstrand) wrote :

At the time apparmor.service is running, /var/lib/snapd/apparmor/profiles does not exist as per:

https://paste.ubuntu.com/p/dMMdZRZBbN/

Both Alan and Daniel are using zfs-on-root and the /var/lib volume is not present at the time the service is run and is apparently not satisfied by `After=local-fs.target`.

Changed in snapd:
importance: High → Critical
Changed in zsys (Ubuntu):
status: New → Confirmed
importance: Undecided → Critical
Changed in zsys (Ubuntu Focal):
milestone: none → ubuntu-20.04
assignee: nobody → Jean-Baptiste Lallement (jibel)
Jamie Strandboge (jdstrand) wrote :

Reassigning the snapd task to apparmor in Ubuntu since it has a patch to rc.apparmor.functions to look for /var/lib/snapd/apparmor/profiles but does not add it to RequiresMountsFor.

affects: snapd → apparmor
Changed in apparmor:
assignee: nobody → Jamie Strandboge (jdstrand)
status: Confirmed → In Progress
importance: Critical → Undecided
assignee: Jamie Strandboge (jdstrand) → nobody
status: In Progress → Invalid
Changed in apparmor (Ubuntu Focal):
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → Jamie Strandboge (jdstrand)
Changed in apparmor (Ubuntu Focal):
status: In Progress → Fix Committed
Jamie Strandboge (jdstrand) wrote :

I uploaded 2.13.3-7ubuntu4 to address this: https://launchpad.net/ubuntu/+source/apparmor/2.13.3-7ubuntu4

There might be other fixes for zsys, but this should address the issue in snapd. It is currently in unapproved, but a member of the release team will hopefully approve it soon.

Changed in apparmor (Ubuntu Focal):
milestone: none → ubuntu-20.04

There is nothing to do on zsys's side. mount points are generated by the zfs generator and mount order is set by systemd. apparmor must wait until all its requirements are met to start which is what Jamie's fix does.
Closing zsys task.

Changed in zsys (Ubuntu Focal):
importance: Critical → Undecided
status: Confirmed → Invalid
Zygmunt Krynicki (zyga) wrote :

@jibel, can you explain how the mount generator makes local-fs.target satisfied *before* /var/lib is mounted?

I think this is worth investigating.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package apparmor - 2.13.3-7ubuntu4

---------------
apparmor (2.13.3-7ubuntu4) focal; urgency=medium

  * debian/apparmor.service: add /var/lib/snapd/apparmor/profiles to
    RequiresMountsFor since Ubuntu's rc.apparmor.functions looks for it
    (LP: #1871148)
  * libnss-systemd.patch: allow accessing the libnss-systemd VarLink sockets
    and DBus APIs. Patch partially based on work by Simon Deziel.
    (LP: #1796911, LP: #1869024)
  * upstream-mr-424-kerberos-dot-dirs.patch: abstractions/kerberosclient:
    allow reading /etc/krb5.conf.d/
  * upstream-mr-442-gnome-user-themes.patch: gnome abstraction: allow reading
    per-user themes from $XDG_DATA_HOME (Closes: #930031)
  * upstream-mr-443-ecryptfs-dirs.patch: abstractions/base: allow read access
    to top-level ecryptfs directories (LP: #1848919)
  * upstream-mr-445-uuidd-request.patch: abstractions/base: allow read access
    to /run/uuidd/request
  * upstream-mr-464-Mesa_i915_perf_interface.patch: let Mesa check if the
    kernel supports the i915 perf interface. Patch from Debian

 -- Jamie Strandboge <email address hidden> Mon, 06 Apr 2020 17:47:20 +0000

Changed in apparmor (Ubuntu Focal):
status: Fix Committed → Fix Released
Seth Arnold (seth-arnold) wrote :

Jamie, what happens on systems without snap installed? Will systemd start the apparmor service?

How much later does this push the already-too-late apparmor service?

Requiring a potentially new thing may push the apparmor unit further behind, allowing even more services to start before profiles have been loaded.

I have to think the better approach may have been to introduce something like apparmor@.service and configure an <email address hidden> that will load profiles before snapd is started -- at least if snap is not itself loading profiles before launching programs.

Thanks

Jamie Strandboge (jdstrand) wrote :
Download full text (4.1 KiB)

Seth, the service starts fine if snapd is not installed and the mountpoint is not present.

$ sudo systemctl status apparmor
● apparmor.service - Load AppArmor profiles
     Loaded: loaded (/lib/systemd/system/apparmor.service; enabled; vendor preset: enabled)
     Active: active (exited) since Wed 2020-04-08 07:05:57 CDT; 3min 9s ago
       Docs: man:apparmor(7)
             https://gitlab.com/apparmor/apparmor/wikis/home/
    Process: 309 ExecStart=/lib/apparmor/apparmor.systemd reload (code=exited, status=0/SUCCESS)
   Main PID: 309 (code=exited, status=0/SUCCESS)

Apr 08 07:05:58 sec-focal-amd64 apparmor.systemd[309]: Restarting AppArmor
Apr 08 07:05:58 sec-focal-amd64 apparmor.systemd[309]: Reloading AppArmor profiles
Apr 08 07:05:58 sec-focal-amd64 apparmor.systemd[320]: Skipping profile in /etc/apparmor.d/disable: usr.bin.firefox
Apr 08 07:05:58 sec-focal-amd64 apparmor.systemd[325]: Skipping profile in /etc/apparmor.d/disable: usr.sbin.rsyslogd
Warning: journal has been rotated since unit was started, output may be incomplete.

The service is not delayed on boot:

$ sudo systemd-analyze critical-chain apparmor.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

apparmor.service +183ms
└─local-fs.target @526ms
  └─run-user-125-gvfs.mount @6.196s
    └─run-user-125.mount @5.831s
      └─local-fs-pre.target @526ms
        └─keyboard-setup.service @195ms +330ms
          └─systemd-journald.socket @173ms
            └─system.slice @170ms
              └─-.slice @170ms

Compare this with before snapd was removed:

$ sudo systemd-analyze critical-chain apparmor.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

apparmor.service +278ms
└─local-fs.target @1.073s
  └─run-user-125-gvfs.mount @10.509s
    └─run-user-125.mount @9.900s
      └─local-fs-pre.target @756ms
        └─keyboard-setup.service @369ms +386ms
          └─systemd-journald.socket @334ms
            └─system.slice @329ms
              └─-.slice @329ms

The systemd documentation (https://www.freedesktop.org/software/systemd/man/systemd.unit.html) say:

"RequiresMountsFor: Takes a space-separated list of absolute paths. Automatically adds dependencies of type Requires= and After= for all mount units required to access the specified path."

Furthermore, I read the systemd code and for each RequiresMountsFor entry, it will start with the entry, then see if there are mount entries for each path going up to '/', only adding the Requires and After if there is a .mount unit (that it may have synthesized from /etc/fstab) for this path. Eg:

snapd is not installed, systemd will:
 * check if /var/lib/snapd/apparmor/profiles .mount entry exists. no, so do nothing
 * check if /var/lib/snapd/apparmor .mount entry exists. no, so do nothing
 * check if /var/lib/snapd .mount entry exists. no, so do nothing
 * check if /var/lib .mount entry exists. no, so do nothing
 * check if /var .mount entry exists. no, so do nothing
 * check if / .mount entry exists. yes, so add this (will be satisf...

Read more...

Jamie Strandboge (jdstrand) wrote :
Download full text (3.5 KiB)

All that said, Daniel and Jean-Baptiste, I installed 20.04 in a vm and tried to reproduce this and could not. The apparmor change was about correctness of the unit so I performed the upload, but I also hoped that it would address the issue you are seeing.

I'm not certain it will. On one boot, prior to upgrading apparmor, I saw:

$ sudo systemd-analyze critical-chain apparmor.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

apparmor.service +11.135s
└─local-fs.target @4.376s
  └─zfs-mount.service @4.327s +48ms
    └─var-lib-dpkg.mount @4.188s +137ms
      └─var-lib.mount @3.883s +250ms
        └─zfs-import.target @3.829s
          └─zfs-import-cache.service @3.125s +704ms
            └─zfs-load-module.service @3.121s +2ms
              └─systemd-udev-settle.service @1.183s +1.937s
                └─systemd-udev-trigger.service @933ms +248ms
                  └─systemd-udevd-kernel.socket @886ms
                    └─system.slice @535ms
                      └─-.slice @535ms

Note that var-lib.mount is already listed. On reboot though (without updating apparmor), I see:

$ sudo systemd-analyze critical-chain apparmor.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

apparmor.service +101ms
└─local-fs.target @2.812s
  └─run-user-122.mount @5.172s
    └─swap.target @1.823s
      └─dev-disk-by\x2duuid-f5ea22a0\x2de078\x2d4d8e\x2d9412\x2d1fad2171a080.swap @1.799s +22ms
        └─dev-disk-by\x2duuid-f5ea22a0\x2de078\x2d4d8e\x2d9412\x2d1fad2171a080.device @1.798s

Oddly, no zfs entries are listed apparently because local-fs.target isn't pulling them in:

$ sudo systemd-analyze critical-chain local-fs.target
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

local-fs.target @2.812s
└─run-user-122.mount @5.172s
  └─swap.target @1.823s
    └─dev-disk-by\x2duuid-f5ea22a0\x2de078\x2d4d8e\x2d9412\x2d1fad2171a080.swap @1.799s +22ms
      └─dev-disk-by\x2duuid-f5ea22a0\x2de078\x2d4d8e\x2d9412\x2d1fad2171a080.device @1.798s

Looking at var-lib.mount, I see zfs is in there:

$ sudo systemd-analyze critical-chain var-lib.mount
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

var-lib.mount +179ms
└─zfs-import.target @2.248s
  └─zfs-import-cache.service @1.845s +402ms
    └─zfs-load-module.service @1.840s +2ms
      └─systemd-udev-settle.service @692ms +1.143s
        └─systemd-udev-trigger.service @524ms +167ms
          └─systemd-udevd-kernel.socket @494ms
            └─system.slice @357ms
              └─-.slice @357ms

So why after a reboot did the dependencies change and drop the /var/lib entry from local-fs.target?

I then upgraded apparmor to have the RequiresMountsFor /var/lib/snapd/apparmor/profiles, rebooted and saw no difference:

$ sudo systemd-analyze critical-chain apparmor.service
The time when unit became active or started is printed after t...

Read more...

Changed in zsys (Ubuntu Focal):
status: Invalid → New
Jamie Strandboge (jdstrand) wrote :

Marking the zsys task back to New based on my last comment.

Zygmunt Krynicki (zyga) wrote :

Seth wrote:

I have to think the better approach may have been to introduce something like apparmor@.service and configure an <email address hidden> that will load profiles before snapd is started -- at least if snap is not itself loading profiles before launching programs.

Note that snapd is *not* a dependency for service startup. In fact snap services do not start after snapd.service.

Jamie Strandboge (jdstrand) wrote :

Here is an 'sudo systemd-analyze plot > ./1871148-vm-no-varlib-mount.svg' on a focal VM that reports the following critical-chain:

$ sudo systemd-analyze critical-chain apparmor.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

apparmor.service +222ms
└─local-fs.target @2.562s
  └─run-user-122.mount @4.834s
    └─swap.target @1.687s
      └─dev-disk-by\x2duuid-f5ea22a0\x2de078\x2d4d8e\x2d9412\x2d1fad2171a080.swap @1.663s +24ms
        └─dev-disk-by\x2duuid-f5ea22a0\x2de078\x2d4d8e\x2d9412\x2d1fad2171a080.device @1.662s

Note that var.lib.mount is *not* listed in the critical chain. In the svg, we see:

zfs-load-module.service (3ms)
zfs-import-cache.service (268ms)
zfs-import.target
...
var-lib.mount (156ms)
...
zfs-volume-wait.service (235ms)
...
zfs-volumes.target
...
zfs-mount.service (66ms)
local-fs.target
apparmor.service (222ms)
...

Maybe everything is fine but critical-chain has a bug?

Jamie Strandboge (jdstrand) wrote :

Daniel responded on irc and said after several reboots with the new apparmor, everything was fine on every boot (though his critical-chain has var.lib.mount listed).

My attached systemd-analyze plot svg shows that apparmor.service is indeed starting after var.lib.mount on the VM where the critical-chain didn't show it or zfs. On irc Didier thought that critical-chain would only list the longest path to apparmor.service starting and may not show everything (the man page isn't clear on this point IMHO).

Based on all of this, I'm going to tentatively mark the zsys task back to Invalid. If people continue to see this bug, we can reopen as necessary (in which case it might be a systemd task for not generating the mount units/requires/after correctly/in a race-free manner or it might indicate zfs initialization is perhaps slow and apparmor.service is starting before var.lib.mount is generated (and therefore RequiresMountsFor is satisfied. Or it is something else ;)

Changed in zsys (Ubuntu Focal):
status: New → Invalid
Jamie Strandboge (jdstrand) wrote :

Daniel, this is a different cause but same result:

zfs-load-module.service (2ms)
zfs-import-cache.service (8ms)
zfs-import.target
...
var-lib.mount (69ms)
...
snap-multipass-1869.mount (1.358s)
...
apparmor.service (279ms)
...

In this case, apparmor correctly waited for var.lib.mount, but multipass started before apparmor.service completed.

Jamie Strandboge (jdstrand) wrote :

Adding a snapd bug task.

Zygmunt Krynicki (zyga) on 2020-04-09
Changed in snapd:
status: New → In Progress
assignee: nobody → Zygmunt Krynicki (zyga)
importance: Undecided → High
milestone: none → 2.44.3
Zygmunt Krynicki (zyga) wrote :

I've pushed a tentative fix for this to https://github.com/snapcore/snapd/pull/8467

Zygmunt Krynicki (zyga) on 2020-04-10
Changed in snapd:
status: In Progress → Fix Released
Jamie Strandboge (jdstrand) wrote :

Adding a snapd Ubuntu task, marking as In Progress and assigning to mvo since he is preparing a 20.04 upload.

Changed in snapd (Ubuntu Focal):
assignee: nobody → Michael Vogt (mvo)
status: New → In Progress
importance: Undecided → High
milestone: none → ubuntu-20.04
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers