Comment 26 for bug 1861941

Revision history for this message
Ryan Harper (raharper) wrote :

That doesn't explain why they show up sometimes, but not all of
the time.

There are 3 devices in play here.

* The backing device, let's say /dev/vda; this is where we want
  to store the data.
* The caching device, let's say /dev/vdb; this holds the cache.
* The bcache device; this only appears when a backing device is
  detected; via bcache-probe which reads the header on the disk
  and indicates it's a bcache device.

Now, there are a few sequences that we see:

1) make-bcache -b /dev/vda -c /deb/vdb will create a bcache0
   with the cache and backing dev at the same time and emit the
   CACHED_UUID change event which runs *after*
   60-persistent-storage, and adds the bcache/by-uuid links

2) After a reboot (or if you tear down a bcache0 by stopping them
   in sysfs) we have several scenarios where the 2 real block
   devices are found:

2a) order: backing, bcacheN, caching

    In this scenario, vda will run through
    60-persistent-storage.rules and blkid will discover
    ID_FS_TYPE=bcache, and generates only by-path, by-id links
    for the device, 69-bcache.rules will run bcache-register
    /dev/vda which informs the kernel that a backing device has
    been found.

    If the cache device has not yet been handled by udev (we
    don't know if a cache device exists, bcache can work
    without one) then the kernel will create a bcacheN
    device, but without the cache device being present, it will
    *NOT* emit a CHANGE event with CACHED_UUID value.

    Now when the cache device (vdb) is going through,
    60-persistent-storage.rules, blkid will discover the same
    settings, TYPE=bcache, generate by-path, by-id symlinks.
    Then 69-bcache.rules runs, triggers a bcache-register on vdb
    registering the cache device in the kernel and because
    the superblock on the backing device specifies the
    cache device UUID it used the kernel can bind the cache
    device into the correct bcacheN device.

    A mainline kernel (no sauce patch) does not emit a CHANGE
    event with CACHED_UUID value. Our sauce patch changes
    the kernel to emit the CHANGE event with CACHED_UUID
    whenever we bind a cache device to an existing bcacheN
    precisely so the bcache/by-* links get generated.

2b) order: cache, backing, bcacheN

   Here, when vdb is plugged in, the cache device is registered
   in the kernel, but without a backing device, no bcacheN device
   is created yet.

   Then the backing is probed, as in 2a; the bcache-register command
   will register it with the kernel, and create a bcache0, and
   since the cache device is present, mainline kernels (and ours)
   will emit the CHANGE event with CACHED_UUID

With this in-mind; it appears to me that in the focal case
we're seeing the bcache driver CHANGE event with CACHED_UUID
and then additional change events from the block layer around
the bcache0 device (but not emitted from the bcache driver code)
So I'd like to understand why ... I expect those events to occur
but not before the driver change event.

And maybe that's the race here

The happy path:

  1. udev processing CHANGE on backing device, it's bcache so we register it
  2. kernel bcache driver creates a bcache0 block device (udev ADD event)
  3. udev processing CHANGE on bcache0 device
  4. udev processing CHANGE on cache device, it's bcache so we register it
  5. kernel bcache driver emits CHANGE event with CACHED_UUID
  6. udev processing CHANGE on bcache0 device with CACHED_UUID

The bad path

  1. udev processing CHANGE on cache device, it's bcache so we register it
  2. udev processing CHANGE on backing device, it's bcache so we register it
  3. kernel bcache driver creates a bcache0 block device (udev ADD event)
  4. kernel bcache driver emits CHANGE event with CACHED_UUID
  5. udev processing CHANGE on bcache0 device with CACHED_UUID
  6. udev processing CHANGE on bcache0 device without CACHED_UUID

if the bcache change event occurs
before the block layer change events, then the database does
get clobbered ... but we've an incomplete trace; we should
see pairs of events, one from KERNEL, and then one from UDEV.
We should definitely get a complete capture for this bug.

w.r.t the fix; I've a few concerns:

1) bcache-keep-symlinks relies on /dev/bcache/by-* links to
exist, if they are removed, then the keep won't work.

2) The removal of the /dev/bcache/by-uuid/ happens during
processing of 60-persistent-storage.rules, which will remove the
link before bcache-keep-symlinks can run (it's in
        69-bcache.rules)

3) since we cannot rely on the existing links, instead we should
read and parse the bcache superblock devices, much like the one
I put together (and you reminded me I wrote)

https://github.com/g2p/bcache-tools/pull/29/files