snapd

device cgroup enforcement is partially ineffective for snap services

Bug #1838937 reported by Zygmunt Krynicki on 2019-08-05

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	snapd	Confirmed	Medium	Unassigned

Bug Description

During a debugging session last Friday we realised that the way snapd is using the device cgroup to control access to character and block devices is not working correctly for snap services. Snap services are assigned to a device cgroup by systemd, moving them from "snap.$SNAP_NAME.app_name" device cgroup to "system.slice/snap.$SNAP_NAME.app_name.service".

The layered approach, using both device cgroup and AppArmor constraints effect of this bug to applications using specific interfaces, such as joystick or serial port, where instead of being scoped to a specific device, the bug allows access to all devices of that class.

Work on fixing this is already in progress and consists of:

1) Reviewing all interfaces for mismatches between the device cgroup (via udev tagging) and apparmor for discrepancies. Those need to be addressed first so that eventual fix for the security bug does not break existing legitimate applications.

2) Writing proper regression test and improving existing tests that exercise this code path for non-deamon applications where systemd is not involved.

3) Changing snap-confine's behaviour when dealing with services, so that it cooperates with systemd. This results in cascading changes to snap-confine's udev interaction as well as the implementation of the snap-device-helper program.

4) Coordination with distributors using enforcing apparmor confinement for proper responsible release of the fixed version.

This bug will be updated with additional information as we progress.

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2019-08-05:

From IRC:
Day changed to 03 Aug 2019
00:23 < jdstrand> zyga and ondra (cc amurray, joe, niemeyer): I had some time to verify your assessment of the systemd/cgroup issue. I was thinking it shouldn't matter that systemd put the application in a cgroup cause snap-confine puts it in the right place and that is after systemd did its thing. I created a snap and sure enough, that is what happened and I could not reproduce. daemons were in the correct
00:23 < jdstrand> cgroup and it was all mediated
00:24 < jdstrand> zyga and ondras: I'm not saying that there isn't a bug, but it doesn't seem to be like what was described to me (it works afaict)
00:25 < jdstrand> zyga: before you do any fixing, we need to understand what is happening better, with a clear reproducer. once we have that, I'll look and we can meet
00:26 < jdstrand> zyga: fyi, I snap start/stop, systemctl start/stop/restart, udevadm trigger/control --reload-rules and all good. I only did it on bionic classic
00:45 < jdstrand> zyga:: ondras mentioned something about the client lingering. I don't know what this application is doing, but if it opened the file and then the interface as disconnected, the device cgroup would not revalidate. same if there was an fd pass then interface disconnect

Revision history for this message

Ondrej Kubik (ondrak) wrote on 2019-08-06:

Hi
I have debugged more issue with application we were facing, and it was caused by dev rules and device reports different subsystem than expected.
While service started and operated well, in certain situation is broke and device node was no more accessible. There is probably bug why it works in first place, it is not related to cgroups.

Revision history for this message

Maciej Borzecki (maciek-borzecki) wrote on 2019-08-06:

@ondrak so if i'm reading this right, in the 'broken' scenario, there was no matching device listed by udev, what means that s-c did not set up the device cgroup at all?

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2019-08-06:

I wrote a reproducer for this bug. I will attach the patch shortly, just wrestling with unreliable network. The short story is that the error manifests itself by "snap connect" granting apparmor permissions without constraining them with device cgroup. The precondition is that the process is initially placed in a cgroup made by systemd (for a service) and then *not* moved to the cgroup made by snapd.

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2019-08-06:

I discussed the issue with mborzecki and I think we have a very simple and natural fix. When nothing is tagged create a cgroup with the following constrain: "a *:* rwm". This ensures that as soon as connect is finished the process is already in the right cgroup that is now strictly limiting access.

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2019-08-06:

I have a reliable test and am working on fixes for both master and backports, in case we need to patch distributions that shipped the older version and do not re-execute.

Revision history for this message

Ondrej Kubik (ondrak) wrote on 2019-08-06:

@Maciej there was nothing broken in snapd for udev rules generation. udev rules were generated correctly. Issue is mainly caused by kernel driver using different SUBSYSTEM.
Open question why service works and first time started, while it should not start even first time.
It seems like udev rules kick in only later
However all this is unrelated to this bug

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2019-08-06:

Regression test Edit (6.8 KiB, text/plain)

I've implemented a regression test. Attached.

summary:

- device cgroup enforcement is ineffective for snap services
+ device cgroup enforcement is partially ineffective for snap services

Jamie Strandboge (jdstrand) on 2019-08-06

Changed in snapd:
importance:	Critical → Medium

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2019-08-06:

Demoted from private security to public per discussion with Jamie.

information type:

Private Security → Public

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2019-08-06:

#10

To summarize, when an interface is connected that uses the udev backend (and apparmor), then apparmor rules are put on the system and udev rules. If there are no devices on the system that match the snap via udev tagging, no cgroup is created by snap-confine (and the default one systemd sets up is used), which is the current, designed behavior.

This behavior, as Zygmunt mentioned, is not particularly deterministic and that should be fixed.

This issue Ondrej saw was in developing an update for an interface for a device whose driver was using a different SUBSYSTEM. Even with Zygmunt's fix, we will want to be careful in the intersection of apparmor rules and udev tagging and ensure there are no bugs in tagging otherwise we may end up with more open apparmor policy which assumes udev tagging is in effect.

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2019-08-06:

#11

I forgot to mention, due to this bug, daemons that are running that do not have any interfaces connected that use the udev backend will not be moved into the snapd-created device cgroup after connecting interfaces that do use the udev backend (and have matching devices). Zygmunt's fix will address this fully.

Zygmunt Krynicki (zyga) on 2019-08-07

Changed in snapd:
assignee:	nobody → Zygmunt Krynicki (zyga)

Revision history for this message

Zygmunt Krynicki (zyga) wrote on 2019-09-25:

#12

I'm removing myself from this task temporarily. I will attempt to return to the issue after handling the tasks I'm currently working on. If anyone wants to implement the solution please reach out to me or mborzecki as we are working on the related area and there are some unmarked patches that work towards at the solution.

Changed in snapd:
status:	In Progress → Confirmed
assignee:	Zygmunt Krynicki (zyga) → nobody
milestone:	2.41 → none

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Patches

Regression test Edit

Add patch

Remote bug watches

Bug watches keep track of this bug in other bug trackers.