partitions can have different major numbers

Bug #1900842 reported by Dimitri John Ledkov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
Fix Released
Critical
Ian Johnson

Bug Description

ll /dev/block/
total 0
drwxr-xr-x 2 root root 620 Nov 19 2019 ./
drwxr-xr-x 17 root root 4640 Apr 1 2020 ../
lrwxrwxrwx 1 root root 10 Apr 1 2020 179:0 -> ../mmcblk0
lrwxrwxrwx 1 root root 12 Apr 1 2020 179:1 -> ../mmcblk0p1
lrwxrwxrwx 1 root root 15 Apr 1 2020 179:16 -> ../mmcblk0boot1
lrwxrwxrwx 1 root root 12 Apr 1 2020 179:2 -> ../mmcblk0p2
lrwxrwxrwx 1 root root 12 Apr 1 2020 179:3 -> ../mmcblk0p3
lrwxrwxrwx 1 root root 12 Apr 1 2020 179:4 -> ../mmcblk0p4
lrwxrwxrwx 1 root root 12 Apr 1 2020 179:5 -> ../mmcblk0p5
lrwxrwxrwx 1 root root 12 Apr 1 2020 179:6 -> ../mmcblk0p6
lrwxrwxrwx 1 root root 12 Apr 1 2020 179:7 -> ../mmcblk0p7
lrwxrwxrwx 1 root root 15 Apr 1 2020 179:8 -> ../mmcblk0boot0
lrwxrwxrwx 1 root root 12 Apr 1 2020 259:0 -> ../mmcblk0p8
lrwxrwxrwx 1 root root 12 Apr 1 2020 259:1 -> ../mmcblk0p9
lrwxrwxrwx 1 root root 13 Apr 1 2020 259:10 -> ../mmcblk0p18
lrwxrwxrwx 1 root root 13 Apr 1 2020 259:2 -> ../mmcblk0p10
lrwxrwxrwx 1 root root 13 Apr 1 2020 259:3 -> ../mmcblk0p11
lrwxrwxrwx 1 root root 13 Apr 1 2020 259:4 -> ../mmcblk0p12
lrwxrwxrwx 1 root root 13 Apr 1 2020 259:5 -> ../mmcblk0p13
lrwxrwxrwx 1 root root 13 Apr 1 2020 259:6 -> ../mmcblk0p14
lrwxrwxrwx 1 root root 13 Apr 1 2020 259:7 -> ../mmcblk0p15
lrwxrwxrwx 1 root root 13 Apr 1 2020 259:8 -> ../mmcblk0p16
lrwxrwxrwx 1 root root 13 Apr 1 2020 259:9 -> ../mmcblk0p17

this seems to trip up the checks in snap-bootstrap that encrypted partition comes from the same block device as the boot device.

if/when kernel runs out of (or clahes) minor number, it can allocate a new major number and start using that for further partitions.

Thus inspecting something like $ cat /sys/block/dm-23/slaves/mmcblk0p1/../dev is a better way to find the parent device major:minor which should be the same across directly mounted things. and not.

Tags: core20
tags: added: core20
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

$ basename $(readlink /dev/block/$(cat /sys/block/$(basename $(readlink /dev/block/`grep /run/mnt/data /proc/self/mountinfo | tail -n1 | cut -d\ -f3`))/slaves/*/../dev))
sda

so this interates mountinfo; finds /run/mnt/data mountpoint; finds it major:minor; finds it's dm name; checks the slaves of said dm; to find underlying device; to find the parent of that device; to find major:minor of it; and map it back to it's name.

Doing this should reliably map "mountpoint -> parent device name" based on kernel information.

Changed in snapd:
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → Ian Johnson (anonymouse67)
Revision history for this message
Ian Johnson (anonymouse67) wrote :
Download full text (4.5 KiB)

Okay, so here's what snapd currently does:

1) we mount the partition that we think the kernel booted from somewhere (w/ uefi we know this exactly and can use the partition uuid that uefi var that shim writes to, w/o uefi var, we just guess and use `/dev/disk/by-label/ubuntu-boot`)

2) after that is mounted somewhere, we look back at what disk that was mounted from using mountinfo table.
2a) Specifically we parse /proc/self/mountinfo, and whatever entry is last in that table and has a mount destination matching /run/mnt/ubuntu-boot, we get the mountpoint source, and then query udev with `udevadm info --query property --name /dev/mmcblk0p16`.
2b) If we expect the device to be unencrypted, as is the case with ubuntu-boot, we grab ID_PART_ENTRY_DISK, and save that major/minor pair as the disk that this partition comes from.
2c) If we expect the device to be a mapper / decrypted device, as is the case with ubuntu-data in FDE, we instead read `/sys/dev/block/$MAJOR:$MINOR/dm/uuid` (where major/minor come from the mountinfo table entry we got in step 2a) and `/sys/dev/block/$MAJOR:$MINOR/dm/name`, then we do some manipulation of the uuid/name values we got to build up the partition uuid that matches the luks uuid pattern, which then we can use with /dev/disk/by-uuid to find the underlying, physical, encrypted partition for that decrypted mountpoint. We then query udev properties for that encrypted partition, and use ID_PART_ENTRY_DISK from that encrypted partition the same as we did in the unencrypted case in 2b.

3) With this all in hand, we know what disk the mountpoint we mounted came from, and now we want to find partitions with specific labels that come from the same disk as the one we just mounted, i.e. after mounting ubuntu-boot, then we want to mount ubuntu-seed. So to do this, we now build up a mapping of disk major/minor -> partitions by iterating through the devices with the same major number as the disk, but with increasing minor numbers starting with the disk minor itself +1, and assume that all adjacent major/minors are from the same disk (which is an INCORRECT assumption as we now know). We stop looking once we either don't find any more devices with the same major number, or we hit a device that is not a partition. To be specific, the pseudo code is something like this:

childPartitions = {}
devMinor := diskMinor
for true:
    devMinor++
    udevProps, err := $(udevadm info --query property --name /dev/block/$diskMajor:$devMinor)
    if err != nil
        break
    if udevProps["DEVTYPE"] != "partition"
        break
    if udevProps["ID_FS_LABEL_ENC"] == ""
        continue
    if udevProps["ID_PART_ENTRY_UUID"] == ""
        return err("broken")
    childPartitions[ udevProps["ID_FS_LABEL_ENC"] ] = udevProps["ID_PART_ENTRY_UUID"]

and then afterwards we check if the desired filesystem label of ubuntu-seed, etc. is inside the map childPartitions, and if it is we mount that partition uuid, otherwise we fail.

Hopefully that helps understand what snapd currently does.

I had a look at the udev database from the dragonboard in question that prompted this problem, and came up with this psuedo code replacement for step 3 (as step 2 ...

Read more...

Revision history for this message
Ian Johnson (anonymouse67) wrote :

While waiting for more feedback on the above detailed approach, I opened a draft PR implementing the above algorithm: https://github.com/snapcore/snapd/pull/9541

Revision history for this message
Ian Johnson (anonymouse67) wrote :

I ended up needing to slightly modify the proposed algorithm, I was thinking the "partition" file in sysfs was a boolean indicating if it was a partition, but actually it corresponds literally to the partition number for that device, so the proposed logic should just check for the existence of the partition file. the proposed algorithm should now read something like this (note the if condition is different inside the for loop - that is the only change)

udevProps, err := $(udevadm info --query property --name /dev/block/$diskMajor:$devMinor)
if err != nil
   return err("broken")

diskDevName = filepath.Base(udevProps["DEVNAME"])
diskDevPath = udevProps["DEVPATH"]

childPartitions = {}
paths := glob("/sys" + diskDevPath + "/" + diskDevPath + "*")
for path in paths:
    if fileExists(path + "/partition"):
        ueventFile = readFile(path + "/uevent")
        partitionMajor = ueventFile["MAJOR"]
        partitionMinor = ueventFile["MINOR"]
        udevProps := $(udevadm info --query property --name /dev/block/$partitionMajor:$partitionMinor)
        childPartitions[ udevProps["ID_FS_LABEL_ENC"] ] = udevProps["ID_PART_ENTRY_UUID"]

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

reading stuff is sysfs is good.

reading uevent file there is good.

using just the subdirs of the sysfs paths, should be done to iterate block devices.

there should be no need to call udevadm binary at all.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Revision history for this message
Ian Johnson (anonymouse67) wrote :

This was minimally fixed for uc20/snapd in 2.48, future improvements will come as mentioned in the other bug reports xnox referenced.

Changed in snapd:
milestone: none → 2.48
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.