Comment 4 for bug 1728742

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

I think the kernel side of it needs consideration but inherently persistent /dev/bcache<i> cannot be provided due to the fact that there is no deterministic enumeration and /dev/bcache<i> names cannot be changed afterwards (only symlinks can be created).

man 7 udev is clear about that:
"udev supplies the system software with device events, manages permissions of device nodes and may create additional symlinks in the /dev directory, or renames network interfaces. The kernel usually just assigns unpredictable device names based on the order of discovery."

bcache code uses alloc_disk from gendisk to get a struct gendisk which means no custom uevent variables on coldplug - it just gets whatever any block device gets (with exception of partitions).

http://elixir.free-electrons.com/linux/v4.14/source/drivers/base/core.c#L865 (other settings set from bcache code and genhd.c added to uevent)
http://elixir.free-electrons.com/linux/v4.14/source/block/genhd.c#L1234 (DEVTYPE = disk)

http://elixir.free-electrons.com/linux/v4.14/source/block/genhd.c#L1357 (alloc_disk -> alloc_disk_node -> device_initialize)

http://elixir.free-electrons.com/linux/v4.14/source/drivers/md/bcache/super.c#L788
     !(d->disk = alloc_disk(BCACHE_MINORS))) { // <--- alloc_disk
 snprintf(d->disk->disk_name, DISK_NAME_LEN, "bcache%i",
   minor / BCACHE_MINORS);

As a result, we have this on coldplug:

cat /sys/class/block/bcache0/uevent
MAJOR=252
MINOR=0
DEVNAME=bcache0
DEVTYPE=disk

hotplug uevents with necessary UUIDs only happen when init script runs in initramfs and are not handled as there is no udev to listen for those (they are just ignored) so no symlinks are ever set up.

Fixing the kernel side is not trivial as this would imply changing the DEVTYPE (userspace breakage) and adding custom uevents to that file - essentially rewriting alloc_disk.

So, given that this will realistically not land for 4.15 as there is no patch and the idea of changing uevent file is not perfect I think we should handle that from userspace.

We can read a bcache superblock of a backing device during cold-plug as we know that a device name will contain the word "bcache" and we can set up those symlinks with a small overhead of running several processes.

We just need a mapping from superblock UUID -> dname.

/dev/by-dname/<link> symlinks are respected by our user-space (MAAS API returns by-dname names and Juju storage uses that instead of /dev/<name>).

bcache-tools provide a way to see the "version" of a superblock (cache device or backing device) and are present in our cloud images.

The simplest way to do it is by using timestamp:

sudo bcache-super-show /dev/`ls -c -1t /sys/block/bcache0/slaves/ | tail -n1` | grep dev.uuid | cut -f 3
fb7c070c-5f96-4add-8565-1398bd831b1f

A more robust way with superblock version parsing (we can use a python script fwiw).

(for i in `ls -1 /sys/block/bcache0/slaves` ; do sudo bcache-super-show /dev/$i ; done) | grep -Pzo '(?s)(?<=sb.version\t\t1).*?dev.uuid.*?\n' | grep -a uuid | cut -f 3
fb7c070c-5f96-4add-8565-1398bd831b1f

We could also use bcache label to hold a dname and use that instead (just need to set it on creation but UUID is way more reliable and we should use that).

This is based on the following:

ubuntu@maas-xenial6:~$ sudo bcache-super-show /dev/`ls -c -1t /sys/block/bcache0/slaves/ | tail -n1`
sb.magic ok
sb.first_sector 8 [match]
sb.csum A61100F0EECCAB33 [match]
sb.version 1 [backing device]

dev.label (empty)
dev.uuid fb7c070c-5f96-4add-8565-1398bd831b1f
dev.sectors_per_block 1
dev.sectors_per_bucket 1024
dev.data.first_sector 16
dev.data.cache_mode 1 [writeback]
dev.data.cache_state 1 [clean]

cset.uuid 26bb5252-2f4d-4a55-b4f9-8666cc1cd4d6
ubuntu@maas-xenial6:~$ sudo bcache-super-show /dev/sdb
sb.magic ok
sb.first_sector 8 [match]
sb.csum 50DFF194DEF15F95 [match]
sb.version 3 [cache device]

dev.label (empty)
dev.uuid 00851121-652b-4cc3-be28-d2b7b30689ce
dev.sectors_per_block 1
dev.sectors_per_bucket 1024
dev.cache.first_sector 1024
dev.cache.cache_sectors 134216704
dev.cache.total_sectors 134217728
dev.cache.ordered yes
dev.cache.discard no
dev.cache.pos 0
dev.cache.replacement 0 [lru]

cset.uuid 26bb5252-2f4d-4a55-b4f9-8666cc1cd4d6

That way we can solve this in a relatively cheap way by only modifying curtin to create UUID -> dname mappings and 69-bcache.rules from bcache-tools to respect those mappings on coldplug.

https://git.launchpad.net/~usd-import-team/ubuntu/+source/bcache-tools/tree/69-bcache.rules