I/O Error Test 1 ================ commit "bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags" Problem: the cacheset is not retired immediately on I/O errors in cache device if I/O requests keep coming. Original kernel: bcache device remains on top of caching device, and 'fio' never finishes. Modified kernel: bcache device is removed from caching device, and fio finishes. Original -------- # uname -rv 4.15.0-55-generic #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019 # ./setup.sh >/dev/null 2>&1 [ 285.677682] bcache: register_bdev() registered backing device dm-0 [ 285.697006] bcache: run_cache_set() invalidating existing data [ 285.710938] bcache: register_cache() registered cache device dm-1 [ 287.686924] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set c589879b-b1c3-49b3-9603-9795ddc750f5 # lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 1G 0 loop └─fake-loop0 253:0 0 1024M 0 dm └─bcache0 251:0 0 1024M 0 disk loop1 7:1 0 1G 0 loop └─fake-loop1 253:1 0 1024M 0 dm └─bcache0 251:0 0 1024M 0 disk # ./dm_fake_dev.sh /dev/loop1 bad [ 766.102586] Buffer I/O error on dev dm-1, logical block 262128, async page read [ 766.107602] Buffer I/O error on dev dm-1, logical block 262128, async page read [ 766.113889] bcache: register_bcache() error /dev/dm-1: device already registered On another shell: # fio --name=write --rw=randwrite --filename=/dev/bcache0 --bs=4k --iodepth=8 --ioengine=libaio --runtime=300s --continue_on_error=all write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=8 fio-3.1 Starting 1 process Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s] < fio never finishes, keeps showing the last line above > < console prints as below when fio starts running .... > [ 777.451177] bcache: bch_count_io_errors() dm-1: IO error on writing btree, recovering [ 777.490882] bcache: error on c589879b-b1c3-49b3-9603-9795ddc750f5: [ 777.490885] journal io error [ 777.494087] , disabling caching [ 807.900700] bcache: bch_count_io_errors() dm-1: IO error on writing btree, recovering (error msgs looping) bcache0 still present on top of cache device (fake-loop1) # lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 1G 0 loop └─fake-loop0 253:0 0 1024M 0 dm └─bcache0 251:0 0 1024M 0 disk loop1 7:1 0 1G 0 loop fake-loop1 253:1 0 1G 0 dm └─bcache0 251:0 0 1024M 0 disk Modified -------- # uname -rv 4.15.0-55-generic #60+test20190703build1bcache1-Ubuntu SMP Wed Jul 3 21:41:37 UTC # ./setup.sh >/dev/null 2>&1 [ 60.542088] bcache: register_bdev() registered backing device dm-0 [ 60.550509] bcache: run_cache_set() invalidating existing data [ 60.560109] bcache: register_cache() registered cache device dm-1 [ 62.548849] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set f6833a2c-53e6-468e-bf1d-a9f48b73d783 # lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 1G 0 loop └─fake-loop0 253:0 0 1024M 0 dm └─bcache0 251:0 0 1024M 0 disk loop1 7:1 0 1G 0 loop └─fake-loop1 253:1 0 1024M 0 dm └─bcache0 251:0 0 1024M 0 disk # ./dm_fake_dev.sh /dev/loop1 bad [ 72.639185] Buffer I/O error on dev dm-1, logical block 262128, async page read [ 72.644876] Buffer I/O error on dev dm-1, logical block 262128, async page read [ 72.650707] bcache: register_bcache() error /dev/dm-1: device already registered On another shell: # fio --name=write --rw=randwrite --filename=/dev/bcache0 --bs=4k --iodepth=8 --ioengine=libaio --runtime=300s --continue_on_error=all [ 97.858468] bcache: bch_count_io_errors() dm-1: IO error on writing btree, recovering [ 97.868519] bcache: error on f6833a2c-53e6-468e-bf1d-a9f48b73d783: [ 97.868520] journal io error [ 97.869998] , disabling caching [ 97.871441] bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache0 is "auto" and cache is clean, keep it alive. [ 97.874423] Buffer I/O error on dev bcache0, logical block 2814, lost async page write [ 97.878697] Buffer I/O error on dev bcache0, logical block 2816, lost async page write [ 97.881702] Buffer I/O error on dev bcache0, logical block 2817, lost async page write [ 97.884790] Buffer I/O error on dev bcache0, logical block 2818, lost async page write [ 97.887709] Buffer I/O error on dev bcache0, logical block 2819, lost async page write [ 97.890558] Buffer I/O error on dev bcache0, logical block 2820, lost async page write [ 97.892419] Buffer I/O error on dev bcache0, logical block 2821, lost async page write [ 97.894228] Buffer I/O error on dev bcache0, logical block 2822, lost async page write [ 97.896107] Buffer I/O error on dev bcache0, logical block 2823, lost async page write [ 97.897900] Buffer I/O error on dev bcache0, logical block 2824, lost async page write [ 97.916818] bcache: cached_dev_detach_finish() Caching disabled for dm-0 [ 97.918511] bcache: bch_count_io_errors() dm-1: IO error on writing btree, recovering [ 97.920581] bcache: cache_set_free() Cache set f6833a2c-53e6-468e-bf1d-a9f48b73d783 unregistered fio finished: write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=8 fio-3.1 Starting 1 process Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=135MiB/s][r=0,w=34.7k IOPS][eta 00m:00s] ... Run status group 0 (all jobs): WRITE: bw=219MiB/s (229MB/s), 219MiB/s-219MiB/s (229MB/s-229MB/s), io=1024MiB (1074MB), run=4685-4685msec bcache not on top of caching device: # lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 1G 0 loop └─fake-loop0 253:0 0 1024M 0 dm └─bcache0 251:0 0 1024M 0 disk loop1 7:1 0 1G 0 loop fake-loop1 253:1 0 1G 0 dm