device cgroup enforcement is partially ineffective for snap services

Bug #1838937 reported by Zygmunt Krynicki
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
Confirmed
Medium
Unassigned

Bug Description

During a debugging session last Friday we realised that the way snapd is using the device cgroup to control access to character and block devices is not working correctly for snap services. Snap services are assigned to a device cgroup by systemd, moving them from "snap.$SNAP_NAME.app_name" device cgroup to "system.slice/snap.$SNAP_NAME.app_name.service".

The layered approach, using both device cgroup and AppArmor constraints effect of this bug to applications using specific interfaces, such as joystick or serial port, where instead of being scoped to a specific device, the bug allows access to all devices of that class.

Work on fixing this is already in progress and consists of:

1) Reviewing all interfaces for mismatches between the device cgroup (via udev tagging) and apparmor for discrepancies. Those need to be addressed first so that eventual fix for the security bug does not break existing legitimate applications.

2) Writing proper regression test and improving existing tests that exercise this code path for non-deamon applications where systemd is not involved.

3) Changing snap-confine's behaviour when dealing with services, so that it cooperates with systemd. This results in cascading changes to snap-confine's udev interaction as well as the implementation of the snap-device-helper program.

4) Coordination with distributors using enforcing apparmor confinement for proper responsible release of the fixed version.

This bug will be updated with additional information as we progress.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

From IRC:
Day changed to 03 Aug 2019
00:23 < jdstrand> zyga and ondra (cc amurray, joe, niemeyer): I had some time to verify your assessment of the systemd/cgroup issue. I was thinking it shouldn't matter that systemd put the application in a cgroup cause snap-confine puts it in the right place and that is after systemd did its thing. I created a snap and sure enough, that is what happened and I could not reproduce. daemons were in the correct
00:23 < jdstrand> cgroup and it was all mediated
00:24 < jdstrand> zyga and ondras: I'm not saying that there isn't a bug, but it doesn't seem to be like what was described to me (it works afaict)
00:25 < jdstrand> zyga: before you do any fixing, we need to understand what is happening better, with a clear reproducer. once we have that, I'll look and we can meet
00:26 < jdstrand> zyga: fyi, I snap start/stop, systemctl start/stop/restart, udevadm trigger/control --reload-rules and all good. I only did it on bionic classic
00:45 < jdstrand> zyga:: ondras mentioned something about the client lingering. I don't know what this application is doing, but if it opened the file and then the interface as disconnected, the device cgroup would not revalidate. same if there was an fd pass then interface disconnect

Revision history for this message
Ondrej Kubik (ondrak) wrote :

Hi
I have debugged more issue with application we were facing, and it was caused by dev rules and device reports different subsystem than expected.
While service started and operated well, in certain situation is broke and device node was no more accessible. There is probably bug why it works in first place, it is not related to cgroups.

Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

@ondrak so if i'm reading this right, in the 'broken' scenario, there was no matching device listed by udev, what means that s-c did not set up the device cgroup at all?

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I wrote a reproducer for this bug. I will attach the patch shortly, just wrestling with unreliable network. The short story is that the error manifests itself by "snap connect" granting apparmor permissions without constraining them with device cgroup. The precondition is that the process is initially placed in a cgroup made by systemd (for a service) and then *not* moved to the cgroup made by snapd.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I discussed the issue with mborzecki and I think we have a very simple and natural fix. When nothing is tagged create a cgroup with the following constrain: "a *:* rwm". This ensures that as soon as connect is finished the process is already in the right cgroup that is now strictly limiting access.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I have a reliable test and am working on fixes for both master and backports, in case we need to patch distributions that shipped the older version and do not re-execute.

Revision history for this message
Ondrej Kubik (ondrak) wrote :

@Maciej there was nothing broken in snapd for udev rules generation. udev rules were generated correctly. Issue is mainly caused by kernel driver using different SUBSYSTEM.
Open question why service works and first time started, while it should not start even first time.
It seems like udev rules kick in only later
However all this is unrelated to this bug

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I've implemented a regression test. Attached.

summary: - device cgroup enforcement is ineffective for snap services
+ device cgroup enforcement is partially ineffective for snap services
Changed in snapd:
importance: Critical → Medium
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Demoted from private security to public per discussion with Jamie.

information type: Private Security → Public
Revision history for this message
Jamie Strandboge (jdstrand) wrote :

To summarize, when an interface is connected that uses the udev backend (and apparmor), then apparmor rules are put on the system and udev rules. If there are no devices on the system that match the snap via udev tagging, no cgroup is created by snap-confine (and the default one systemd sets up is used), which is the current, designed behavior.

This behavior, as Zygmunt mentioned, is not particularly deterministic and that should be fixed.

This issue Ondrej saw was in developing an update for an interface for a device whose driver was using a different SUBSYSTEM. Even with Zygmunt's fix, we will want to be careful in the intersection of apparmor rules and udev tagging and ensure there are no bugs in tagging otherwise we may end up with more open apparmor policy which assumes udev tagging is in effect.

Revision history for this message
Jamie Strandboge (jdstrand) wrote :

I forgot to mention, due to this bug, daemons that are running that do not have any interfaces connected that use the udev backend will not be moved into the snapd-created device cgroup after connecting interfaces that do use the udev backend (and have matching devices). Zygmunt's fix will address this fully.

Zygmunt Krynicki (zyga)
Changed in snapd:
assignee: nobody → Zygmunt Krynicki (zyga)
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I'm removing myself from this task temporarily. I will attempt to return to the issue after handling the tasks I'm currently working on. If anyone wants to implement the solution please reach out to me or mborzecki as we are working on the related area and there are some unmarked patches that work towards at the solution.

Changed in snapd:
status: In Progress → Confirmed
assignee: Zygmunt Krynicki (zyga) → nobody
milestone: 2.41 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.