Comment 11 for bug 1829563

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

I/O Error Test 1
================

commit "bcache: add CACHE_SET_IO_DISABLE to struct cache_set flags"

Problem: the cacheset is not retired immediately on I/O errors in
cache device if I/O requests keep coming.

Original kernel: bcache device remains on top of caching device,
and 'fio' never finishes.

Modified kernel: bcache device is removed from caching device,
and fio finishes.

Original
--------

# uname -rv
4.15.0-55-generic #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019

# ./setup.sh >/dev/null 2>&1
[ 285.677682] bcache: register_bdev() registered backing device dm-0
[ 285.697006] bcache: run_cache_set() invalidating existing data
[ 285.710938] bcache: register_cache() registered cache device dm-1
[ 287.686924] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set c589879b-b1c3-49b3-9603-9795ddc750f5

# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
└─fake-loop0 253:0 0 1024M 0 dm
  └─bcache0 251:0 0 1024M 0 disk
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
  └─bcache0 251:0 0 1024M 0 disk

# ./dm_fake_dev.sh /dev/loop1 bad
[ 766.102586] Buffer I/O error on dev dm-1, logical block 262128, async page read
[ 766.107602] Buffer I/O error on dev dm-1, logical block 262128, async page read
[ 766.113889] bcache: register_bcache() error /dev/dm-1: device already registered

On another shell:

# fio --name=write --rw=randwrite --filename=/dev/bcache0 --bs=4k --iodepth=8 --ioengine=libaio --runtime=300s --continue_on_error=all
write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=8
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]

< fio never finishes, keeps showing the last line above >
< console prints as below when fio starts running .... >

[ 777.451177] bcache: bch_count_io_errors() dm-1: IO error on writing btree, recovering
[ 777.490882] bcache: error on c589879b-b1c3-49b3-9603-9795ddc750f5:
[ 777.490885] journal io error
[ 777.494087] , disabling caching
[ 807.900700] bcache: bch_count_io_errors() dm-1: IO error on writing btree, recovering
(error msgs looping)

bcache0 still present on top of cache device (fake-loop1)

# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
└─fake-loop0 253:0 0 1024M 0 dm
  └─bcache0 251:0 0 1024M 0 disk
loop1 7:1 0 1G 0 loop
fake-loop1 253:1 0 1G 0 dm
└─bcache0 251:0 0 1024M 0 disk

Modified
--------

# uname -rv
4.15.0-55-generic #60+test20190703build1bcache1-Ubuntu SMP Wed Jul 3 21:41:37 UTC

# ./setup.sh >/dev/null 2>&1
[ 60.542088] bcache: register_bdev() registered backing device dm-0
[ 60.550509] bcache: run_cache_set() invalidating existing data
[ 60.560109] bcache: register_cache() registered cache device dm-1
[ 62.548849] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set f6833a2c-53e6-468e-bf1d-a9f48b73d783

# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
└─fake-loop0 253:0 0 1024M 0 dm
  └─bcache0 251:0 0 1024M 0 disk
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
  └─bcache0 251:0 0 1024M 0 disk

# ./dm_fake_dev.sh /dev/loop1 bad
[ 72.639185] Buffer I/O error on dev dm-1, logical block 262128, async page read
[ 72.644876] Buffer I/O error on dev dm-1, logical block 262128, async page read
[ 72.650707] bcache: register_bcache() error /dev/dm-1: device already registered

On another shell:

# fio --name=write --rw=randwrite --filename=/dev/bcache0 --bs=4k --iodepth=8 --ioengine=libaio --runtime=300s --continue_on_error=all

[ 97.858468] bcache: bch_count_io_errors() dm-1: IO error on writing btree, recovering
[ 97.868519] bcache: error on f6833a2c-53e6-468e-bf1d-a9f48b73d783:
[ 97.868520] journal io error
[ 97.869998] , disabling caching
[ 97.871441] bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache0 is "auto" and cache is clean, keep it alive.
[ 97.874423] Buffer I/O error on dev bcache0, logical block 2814, lost async page write
[ 97.878697] Buffer I/O error on dev bcache0, logical block 2816, lost async page write
[ 97.881702] Buffer I/O error on dev bcache0, logical block 2817, lost async page write
[ 97.884790] Buffer I/O error on dev bcache0, logical block 2818, lost async page write
[ 97.887709] Buffer I/O error on dev bcache0, logical block 2819, lost async page write
[ 97.890558] Buffer I/O error on dev bcache0, logical block 2820, lost async page write
[ 97.892419] Buffer I/O error on dev bcache0, logical block 2821, lost async page write
[ 97.894228] Buffer I/O error on dev bcache0, logical block 2822, lost async page write
[ 97.896107] Buffer I/O error on dev bcache0, logical block 2823, lost async page write
[ 97.897900] Buffer I/O error on dev bcache0, logical block 2824, lost async page write
[ 97.916818] bcache: cached_dev_detach_finish() Caching disabled for dm-0
[ 97.918511] bcache: bch_count_io_errors() dm-1: IO error on writing btree, recovering
[ 97.920581] bcache: cache_set_free() Cache set f6833a2c-53e6-468e-bf1d-a9f48b73d783 unregistered

fio finished:

write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=8
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=135MiB/s][r=0,w=34.7k IOPS][eta 00m:00s]
...
Run status group 0 (all jobs):
  WRITE: bw=219MiB/s (229MB/s), 219MiB/s-219MiB/s (229MB/s-229MB/s), io=1024MiB (1074MB), run=4685-4685msec

bcache not on top of caching device:

# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
└─fake-loop0 253:0 0 1024M 0 dm
  └─bcache0 251:0 0 1024M 0 disk
loop1 7:1 0 1G 0 loop
fake-loop1 253:1 0 1G 0 dm