That doesn't explain why they show up sometimes, but not all of
the time.
There are 3 devices in play here.
* The backing device, let's say /dev/vda; this is where we want
to store the data.
* The caching device, let's say /dev/vdb; this holds the cache.
* The bcache device; this only appears when a backing device is
detected; via bcache-probe which reads the header on the disk
and indicates it's a bcache device.
Now, there are a few sequences that we see:
1) make-bcache -b /dev/vda -c /deb/vdb will create a bcache0
with the cache and backing dev at the same time and emit the
CACHED_UUID change event which runs *after*
60-persistent-storage, and adds the bcache/by-uuid links
2) After a reboot (or if you tear down a bcache0 by stopping them
in sysfs) we have several scenarios where the 2 real block
devices are found:
2a) order: backing, bcacheN, caching
In this scenario, vda will run through
60-persistent-storage.rules and blkid will discover
ID_FS_TYPE=bcache, and generates only by-path, by-id links
for the device, 69-bcache.rules will run bcache-register
/dev/vda which informs the kernel that a backing device has
been found.
If the cache device has not yet been handled by udev (we
don't know if a cache device exists, bcache can work
without one) then the kernel will create a bcacheN
device, but without the cache device being present, it will
*NOT* emit a CHANGE event with CACHED_UUID value.
Now when the cache device (vdb) is going through,
60-persistent-storage.rules, blkid will discover the same
settings, TYPE=bcache, generate by-path, by-id symlinks.
Then 69-bcache.rules runs, triggers a bcache-register on vdb
registering the cache device in the kernel and because
the superblock on the backing device specifies the
cache device UUID it used the kernel can bind the cache
device into the correct bcacheN device.
A mainline kernel (no sauce patch) does not emit a CHANGE
event with CACHED_UUID value. Our sauce patch changes
the kernel to emit the CHANGE event with CACHED_UUID
whenever we bind a cache device to an existing bcacheN
precisely so the bcache/by-* links get generated.
2b) order: cache, backing, bcacheN
Here, when vdb is plugged in, the cache device is registered
in the kernel, but without a backing device, no bcacheN device
is created yet.
Then the backing is probed, as in 2a; the bcache-register command
will register it with the kernel, and create a bcache0, and
since the cache device is present, mainline kernels (and ours)
will emit the CHANGE event with CACHED_UUID
With this in-mind; it appears to me that in the focal case
we're seeing the bcache driver CHANGE event with CACHED_UUID
and then additional change events from the block layer around
the bcache0 device (but not emitted from the bcache driver code)
So I'd like to understand why ... I expect those events to occur
but not before the driver change event.
And maybe that's the race here
The happy path:
1. udev processing CHANGE on backing device, it's bcache so we register it
2. kernel bcache driver creates a bcache0 block device (udev ADD event)
3. udev processing CHANGE on bcache0 device
4. udev processing CHANGE on cache device, it's bcache so we register it
5. kernel bcache driver emits CHANGE event with CACHED_UUID
6. udev processing CHANGE on bcache0 device with CACHED_UUID
The bad path
1. udev processing CHANGE on cache device, it's bcache so we register it
2. udev processing CHANGE on backing device, it's bcache so we register it
3. kernel bcache driver creates a bcache0 block device (udev ADD event)
4. kernel bcache driver emits CHANGE event with CACHED_UUID
5. udev processing CHANGE on bcache0 device with CACHED_UUID
6. udev processing CHANGE on bcache0 device without CACHED_UUID
if the bcache change event occurs
before the block layer change events, then the database does
get clobbered ... but we've an incomplete trace; we should
see pairs of events, one from KERNEL, and then one from UDEV.
We should definitely get a complete capture for this bug.
w.r.t the fix; I've a few concerns:
1) bcache-keep-symlinks relies on /dev/bcache/by-* links to
exist, if they are removed, then the keep won't work.
2) The removal of the /dev/bcache/by-uuid/ happens during
processing of 60-persistent-storage.rules, which will remove the
link before bcache-keep-symlinks can run (it's in 69-bcache.rules)
3) since we cannot rely on the existing links, instead we should
read and parse the bcache superblock devices, much like the one
I put together (and you reminded me I wrote)
That doesn't explain why they show up sometimes, but not all of
the time.
There are 3 devices in play here.
* The backing device, let's say /dev/vda; this is where we want
to store the data.
* The caching device, let's say /dev/vdb; this holds the cache.
* The bcache device; this only appears when a backing device is
detected; via bcache-probe which reads the header on the disk
and indicates it's a bcache device.
Now, there are a few sequences that we see:
1) make-bcache -b /dev/vda -c /deb/vdb will create a bcache0 t-storage, and adds the bcache/by-uuid links
with the cache and backing dev at the same time and emit the
CACHED_UUID change event which runs *after*
60-persisten
2) After a reboot (or if you tear down a bcache0 by stopping them
in sysfs) we have several scenarios where the 2 real block
devices are found:
2a) order: backing, bcacheN, caching
In this scenario, vda will run through persistent- storage. rules and blkid will discover FS_TYPE= bcache, and generates only by-path, by-id links
60-
ID_
for the device, 69-bcache.rules will run bcache-register
/dev/vda which informs the kernel that a backing device has
been found.
If the cache device has not yet been handled by udev (we
don't know if a cache device exists, bcache can work
without one) then the kernel will create a bcacheN
device, but without the cache device being present, it will
*NOT* emit a CHANGE event with CACHED_UUID value.
Now when the cache device (vdb) is going through, persistent- storage. rules, blkid will discover the same
60-
settings, TYPE=bcache, generate by-path, by-id symlinks.
Then 69-bcache.rules runs, triggers a bcache-register on vdb
registering the cache device in the kernel and because
the superblock on the backing device specifies the
cache device UUID it used the kernel can bind the cache
device into the correct bcacheN device.
A mainline kernel (no sauce patch) does not emit a CHANGE
event with CACHED_UUID value. Our sauce patch changes
the kernel to emit the CHANGE event with CACHED_UUID
whenever we bind a cache device to an existing bcacheN
precisely so the bcache/by-* links get generated.
2b) order: cache, backing, bcacheN
Here, when vdb is plugged in, the cache device is registered
in the kernel, but without a backing device, no bcacheN device
is created yet.
Then the backing is probed, as in 2a; the bcache-register command
will register it with the kernel, and create a bcache0, and
since the cache device is present, mainline kernels (and ours)
will emit the CHANGE event with CACHED_UUID
With this in-mind; it appears to me that in the focal case
we're seeing the bcache driver CHANGE event with CACHED_UUID
and then additional change events from the block layer around
the bcache0 device (but not emitted from the bcache driver code)
So I'd like to understand why ... I expect those events to occur
but not before the driver change event.
And maybe that's the race here
The happy path:
1. udev processing CHANGE on backing device, it's bcache so we register it
2. kernel bcache driver creates a bcache0 block device (udev ADD event)
3. udev processing CHANGE on bcache0 device
4. udev processing CHANGE on cache device, it's bcache so we register it
5. kernel bcache driver emits CHANGE event with CACHED_UUID
6. udev processing CHANGE on bcache0 device with CACHED_UUID
The bad path
1. udev processing CHANGE on cache device, it's bcache so we register it
2. udev processing CHANGE on backing device, it's bcache so we register it
3. kernel bcache driver creates a bcache0 block device (udev ADD event)
4. kernel bcache driver emits CHANGE event with CACHED_UUID
5. udev processing CHANGE on bcache0 device with CACHED_UUID
6. udev processing CHANGE on bcache0 device without CACHED_UUID
if the bcache change event occurs
before the block layer change events, then the database does
get clobbered ... but we've an incomplete trace; we should
see pairs of events, one from KERNEL, and then one from UDEV.
We should definitely get a complete capture for this bug.
w.r.t the fix; I've a few concerns:
1) bcache- keep-symlinks relies on /dev/bcache/by-* links to
exist, if they are removed, then the keep won't work.
2) The removal of the /dev/bcache/ by-uuid/ happens during storage. rules, which will remove the keep-symlinks can run (it's in
69-bcache. rules)
processing of 60-persistent-
link before bcache-
3) since we cannot rely on the existing links, instead we should
read and parse the bcache superblock devices, much like the one
I put together (and you reminded me I wrote)
https:/ /github. com/g2p/ bcache- tools/pull/ 29/files