unable to allow an app to access all devices with a certain major number via a <majordev>:* device cgroup rule

Bug #1892895 reported by Dmitrii Shcherbakov on 2020-08-25
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
High
Unassigned

Bug Description

I found a race condition which can be avoided by using wildcard rules in device cgroups, however, I do not see a way to enable that in an interface.

There is a use-case for MicroStack where iSCSI targets are added to the host kernel as block devices via iscsid + the iscsi-tcp kernel module.

An immediate idea is to:

* add block-devices interface to nova-compute and libvirtd apps;
* as a result, get major and minor devices of the hot-plugged devices added to device cgroups of Nova and libvirtd (/sys/fs/cgroup/devices/snap.microstack.{nova-compute, libvirtd}/devices.list).
  * This part of the interface makes sure of that: https://github.com/snapcore/snapd/blob/2.46/interfaces/builtin/block_devices.go#L97

As it turns out, this approach is racy since the device is attempted to be used prior to its major and minor number being added to the relevant device cgroup via: /sys/fs/cgroup/devices/snap.microstack.{nova-compute, libvirtd}/devices.allow

snap-device-helper is responsible for that https://github.com/snapcore/snapd/blob/2.46/cmd/snap-confine/snap-device-helper#L73

In essence, the block special file is created and used prior to the time when snapd runs snap-device-helper and confined applications are not synchronized with the operation of the helper in any way.

In the failure mode I observe consistently, I get "Operation not permitted" which is the EPERM returned from the kernel when it enforces accesses based on what is present in the device cgroup:
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal/tree/security/device_cgroup.c?h=Ubuntu-5.4.0-44.48#n823

Specific to my use-case, what I see is that Nova tells libvirt to use a block device which fails with EPERM. Then Nova tries to remove the volume it just tried to attach and do `blockdev --flushbufs` in the process which fails as well:

* try: virt_driver.attach_volume (Nova) -> virStorageFileReportBrokenChain (libvirt) -> Cannot access storage file '/dev/sde': Operation not permitted -> libvirt.libvirtError Cannot access storage file '/dev/sde': Operation not permitted
* except: "Driver failed to attach volume..." -> volume_api.attachment_delete -> ... -> flush_device_io -> blockdev --flushbufs /dev/sde -> blockdev: cannot open /dev/sde: Operation not permitted
https://opendev.org/openstack/nova/src/branch/stable/ussuri/nova/virt/block_device.py#L498-L510 (Nova code)
https://git.launchpad.net/ubuntu/+source/libvirt/tree/src/util/virstoragefile.c?h=applied/ubuntu/focal#n4877 ("Cannot access storage" in libvirt)

https://paste.ubuntu.com/p/RTgq8XkzY6/ (logs)

If I add a wildcard rule to allow devices with any minor number and a certain major number to be used, this race condition is avoided.

sudo bash -c 'echo b 8:* rwm > /sys/fs/cgroup/devices/snap.microstack.libvirtd/devices.allow'
sudo bash -c 'echo b 8:* rwm > /sys/fs/cgroup/devices/snap.microstack.nova-compute/devices.allow'

---------------------------------------------------------------------

Another simple use-case this is valid for is working with loop devices.

If I have this in an interface:

const connectedPlugAppArmor = `
/dev/loop-control rw,
/dev/loop[0-9]* rw,
`

var microStackConnectedPlugUDev = []string{
 `SUBSYSTEM=="block", KERNEL=="loop[0-9]*"`,
 `SUBSYSTEM=="misc", KERNEL=="loop-control"`,
}

And try to use `losetup -f` when there are no free loop files available:

fallocate -l $loop_file_size $loop_file
losetup -f $loop_file

I will get "Operation not permitted" during the losetup invocation since the device cgroup entry is not added fast enough.

This is a much simpler reproducer then the one with iSCSI.

---------------------------------------------------------------------
Update (09-09-2020):

Found one more use-case which is LV activation after reboot:

* reboot -> LV Status NOT available;
* lvchange -a y <vgname-for-lvs> -> device-mapper: reload ioctl on (253:3) failed: Operation not permitted

description: updated
Changed in snapd:
assignee: nobody → Zygmunt Krynicki (zyga)
Zygmunt Krynicki (zyga) wrote :

I've analyzed the problem and I need to discuss my findings with the rest of the snapd team. I have several ides on how to avoid this problem, in addition the the suggestion provided by the reporter.

Dmitrii Shcherbakov (dmitriis) wrote :

Thanks for looking into this!

For the future, please also consider that in cgroupv2 there are no interface files for controlling access rules:

https://elixir.bootlin.com/linux/latest/source/Documentation/admin-guide/cgroup-v2.rst#L2018
"Cgroup v2 device controller has no interface files and is implemented on top of cgroup BPF. To control access to device files, a user may create bpf programs of the BPF_CGROUP_DEVICE type and attach them to cgroups. On an attempt to access a device file, corresponding BPF programs will be executed, and depending on the return value the attempt will succeed or fail with -EPERM."

description: updated
Zygmunt Krynicki (zyga) wrote :

We are well aware of cgroup v2 device model and plan to support it. There are ongoing patches that need review, which build towards that.

Changed in snapd:
status: New → Confirmed
Changed in snapd:
importance: Undecided → Medium
importance: Medium → High
Changed in snapd:
assignee: Zygmunt Krynicki (zyga) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers