I/O Error Test 6 (for the Cosmic kernel) ================ commit: 'Revert "bcache: set CACHE_SET_IO_DISABLE in bch_cached_dev_error()"' Problem: if one backing device hits I/O errors the cache device is disabled, but if that cache device is shared by other bcache devices they stop too (even with non-failing backing devices). Original kernel: all bcache devices that share cache device with failing backing device are stopped. Modified kernel: only the bcache device with the failing backing device is stopped. Original kernel --------------- root@guest-bcache:~# uname -rv 4.18.0-23-generic #24-Ubuntu SMP Wed Jun 12 18:17:39 UTC 2019 root@guest-bcache:~# lsblk -e 252 root@guest-bcache:~# root@guest-bcache:~# ./setup-two-bcache-one-cache.sh >/dev/null 2>&1 [ 35.686002] bcache: register_bdev() registered backing device dm-0 [ 35.695980] bcache: register_bdev() registered backing device dm-1 [ 35.704662] bcache: run_cache_set() invalidating existing data [ 35.719046] bcache: register_cache() registered cache device dm-2 [ 36.705686] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set fce8d558-4657-47dc-ab37-226ada14daf5 [ 36.711827] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set fce8d558-4657-47dc-ab37-226ada14daf5 root@guest-bcache:~# lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 1G 0 loop └─fake-loop0 253:0 0 1024M 0 dm └─bcache0 251:0 0 1024M 0 disk loop1 7:1 0 1G 0 loop └─fake-loop1 253:1 0 1024M 0 dm └─bcache1 251:128 0 1024M 0 disk loop2 7:2 0 1G 0 loop └─fake-loop2 253:2 0 1024M 0 dm ├─bcache0 251:0 0 1024M 0 disk └─bcache1 251:128 0 1024M 0 disk root@guest-bcache:~# echo writeback | tee /sys/block/dm-*/bcache/cache_mode writeback root@guest-bcache:~# cat /sys/block/dm-*/bcache/cache_mode writethrough [writeback] writearound none writethrough [writeback] writearound none root@guest-bcache:~# ./dm_fake_dev.sh /dev/loop0 bad [ 76.875749] Buffer I/O error on dev dm-0, logical block 262128, async page read [ 76.882159] Buffer I/O error on dev dm-0, logical block 262128, async page read [ 76.889453] bcache: register_bcache() error /dev/dm-0: device already registered (emitting change event) [ 76.892183] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 76.904907] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 76.907711] Buffer I/O error on dev bcache0, logical block 262112, async page read [ 76.912607] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 76.916905] Buffer I/O error on dev bcache0, logical block 262112, async page read [ 76.920345] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 76.924767] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 76.928404] Buffer I/O error on dev bcache0, logical block 1, async page read root@guest-bcache:~# dd if=/dev/zero of=/dev/bcache1 bs=4k & dd if=/dev/zero of=/dev/bcache0 bs=4k & [ 175.024811] Buffer I/O error on dev bcache0, logical block 0, lost async page write [ 175.029844] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 175.034652] Buffer I/O error on dev bcache0, logical block 1, lost async page write [ 175.037465] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 175.040373] Buffer I/O error on dev bcache0, logical block 2, lost async page write ... [ 175.092196] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 175.096635] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 175.101272] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 175.105829] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable ... [ 175.235700] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 175.239457] bcache: bch_cached_dev_error() stop bcache0: too many IO errors on backing device dm-0 [ 175.239457] [ 175.324069] bcache: bch_cache_set_error() CACHE_SET_IO_DISABLE already set [ 175.328998] bcache: error on fce8d558-4657-47dc-ab37-226ada14daf5: [ 175.328999] journal io error [ 175.331022] , disabling caching [ 175.334264] bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache0 is "auto" and cache is dirty, stop it to avoid potential data corruption. [ 175.338865] bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache1 is "auto" and cache is dirty, stop it to avoid potential data corruption. [ 175.344097] bcache: cached_dev_detach_finish() Caching disabled for dm-1 [ 176.080139] bcache: bcache_device_free() bcache0 stopped [ 176.083928] bcache: bch_count_io_errors() dm-2: IO error on writing btree. [ 176.188371] bcache: cache_set_free() Cache set fce8d558-4657-47dc-ab37-226ada14daf5 unregistered [ 176.841497] bcache: bcache_device_free() bcache1 stopped dd: error writing '/dev/bcache0': No space left on device 262142+0 records in 262141+0 records out 1073729536 bytes (1.1 GB, 1.0 GiB) copied, 1.81834 s, 591 MB/s dd: error writing '/dev/bcache1': No space left on device 262142+0 records in 262141+0 records out 1073729536 bytes (1.1 GB, 1.0 GiB) copied, 2.5749 s, 417 MB/s [1]- Exit 1 dd if=/dev/zero of=/dev/bcache1 bs=4k [2]+ Exit 1 dd if=/dev/zero of=/dev/bcache0 bs=4k root@guest-bcache:~# lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 1G 0 loop loop1 7:1 0 1G 0 loop └─fake-loop1 253:1 0 1024M 0 dm loop2 7:2 0 1G 0 loop └─fake-loop2 253:2 0 1024M 0 dm fake-loop0 253:0 0 1G 0 dm Notice that bcache0 and bcache1 are missing. Modified kernel --------------- root@guest-bcache:~# uname -rv 4.18.0-23-generic #24+test20190627b1 SMP Thu Jun 27 13:29:22 UTC 2019 root@guest-bcache:~# lsblk -e 252 root@guest-bcache:~# root@guest-bcache:~# ./setup-two-bcache-one-cache.sh >/dev/null 2>&1 [ 146.600391] bcache: register_bdev() registered backing device dm-0 [ 146.608618] bcache: register_bdev() registered backing device dm-1 [ 146.617808] bcache: run_cache_set() invalidating existing data [ 146.632355] bcache: register_cache() registered cache device dm-2 [ 147.615003] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set 6673bcb3-7a64-4675-a82f-59bb66886d66 [ 147.633610] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set 6673bcb3-7a64-4675-a82f-59bb66886d66 root@guest-bcache:~# lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 1G 0 loop └─fake-loop0 253:0 0 1024M 0 dm └─bcache0 251:0 0 1024M 0 disk loop1 7:1 0 1G 0 loop └─fake-loop1 253:1 0 1024M 0 dm └─bcache1 251:128 0 1024M 0 disk loop2 7:2 0 1G 0 loop └─fake-loop2 253:2 0 1024M 0 dm ├─bcache0 251:0 0 1024M 0 disk └─bcache1 251:128 0 1024M 0 disk root@guest-bcache:~# echo writeback | tee /sys/block/dm-*/bcache/cache_mode writeback root@guest-bcache:~# cat /sys/block/dm-*/bcache/cache_mode writethrough [writeback] writearound none writethrough [writeback] writearound none root@guest-bcache:~# ./dm_fake_dev.sh /dev/loop0 bad [ 174.138534] Buffer I/O error on dev dm-0, logical block 262128, async page read [ 174.145142] Buffer I/O error on dev dm-0, logical block 262128, async page read [ 174.152728] bcache: register_bcache() error /dev/dm-0: device already registered (emitting change event) [ 174.154780] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 174.159945] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 174.162933] Buffer I/O error on dev bcache0, logical block 262112, async page read [ 174.168696] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 174.172368] Buffer I/O error on dev bcache0, logical block 262112, async page read [ 174.175272] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 174.178593] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 174.181896] Buffer I/O error on dev bcache0, logical block 1, async page read root@guest-bcache:~# dd if=/dev/zero of=/dev/bcache1 bs=4k & dd if=/dev/zero of=/dev/bcache0 bs=4k &s [1] 1377 [2] 1378 [ 183.348428] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 183.354587] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 183.360488] Buffer I/O error on dev bcache0, logical block 0, lost async page write [ 183.364666] Buffer I/O error on dev bcache0, logical block 1, lost async page write [ 183.368326] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable ... [ 183.430652] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 183.434399] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 183.438198] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 183.441991] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable ... [ 183.635500] bcache: bch_cached_dev_error() stop bcache0: too many IO errors on backing device dm-0 [ 183.635500] [ 184.840023] bcache: bcache_device_free() bcache0 stopped dd: error writing '/dev/bcache0': No space left on device dd: error writing '/dev/bcache1': No space left on device 262142+0 records in 262141+0 records out 1073729536 bytes (1.1 GB, 1.0 GiB) copied, 2.18238 s, 492 MB/s 262142+0 records in 262141+0 records out 1073729536 bytes (1.1 GB, 1.0 GiB) copied, 3.69895 s, 290 MB/s [1]- Exit 1 dd if=/dev/zero of=/dev/bcache1 bs=4k [2]+ Exit 1 dd if=/dev/zero of=/dev/bcache0 bs=4k root@guest-bcache:~# lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 1G 0 loop loop1 7:1 0 1G 0 loop └─fake-loop1 253:1 0 1024M 0 dm └─bcache1 251:128 0 1024M 0 disk loop2 7:2 0 1G 0 loop └─fake-loop2 253:2 0 1024M 0 dm └─bcache1 251:128 0 1024M 0 disk fake-loop0 253:0 0 1G 0 dm Notice that only bcache0 is stopped, bcache1 is still present. And after reboot, the bcache devices are reattached. root@guest-bcache:~# dd if=/dev/zero of=/dev/bcache1 bs=4k dd: error writing '/dev/bcache1': No space left on device 262142+0 records in 262141+0 records out 1073729536 bytes (1.1 GB, 1.0 GiB) copied, 4.79076 s, 224 MB/s root@guest-bcache:~# root@guest-bcache:~# reboot root@guest-bcache:~# ./setup-two-bcache-one-cache.reboot.sh [ 104.421020] bcache: register_bdev() registered backing device dm-0 [ 104.492000] bcache: register_bdev() registered backing device dm-1 [ 104.685632] bcache: bch_journal_replay() journal replay done, 97526 keys in 57 entries, seq 359 [ 104.695263] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set 6673bcb3-7a64-4675-a82f-59bb66886d66 [ 104.704708] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set 6673bcb3-7a64-4675-a82f-59bb66886d66 [ 104.709640] bcache: register_cache() registered cache device dm-2 root@guest-bcache:~# lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 1G 0 loop └─fake-loop0 253:0 0 1024M 0 dm └─bcache0 251:0 0 1024M 0 disk loop1 7:1 0 1G 0 loop └─fake-loop1 253:1 0 1024M 0 dm └─bcache1 251:128 0 1024M 0 disk loop2 7:2 0 1G 0 loop └─fake-loop2 253:2 0 1024M 0 dm ├─bcache0 251:0 0 1024M 0 disk └─bcache1 251:128 0 1024M 0 disk