commit "bcache: add stop_when_cache_set_failed option to backing device"
Problem: upon too many io errors on cache device, bcache code retires cache set,
but it doesn't stop the bcache device.
(in writeback mode, if it continue to write, writes go to backing device directly;
next time bcache re-registers, it writes dirty cached data again, overwriting data)
Original kernel: bcache device remains available for writes.
Modified kernel: bcache device is not available anymore.
# dd if=/dev/zero of=/dev/bcache0 bs=4k
dd: error writing '/dev/bcache0': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 5.14109 s, 209 MB/s
# ./dm_fake_dev.sh /dev/loop1 bad
[ 135.316335] Buffer I/O error on dev dm-1, logical block 262128, async page read
[ 135.330181] Buffer I/O error on dev dm-1, logical block 262128, async page read
[ 135.337019] bcache: register_bcache() error /dev/dm-1: device already registered
# dd if=/dev/zero of=/dev/bcache0 bs=4k
[ 142.754766] bcache: error on 68e969a1-0f60-4978-a109-b48559970472:
[ 142.754769] journal io error
[ 142.757583] , disabling caching
dd: error writing '/dev/bcache0': No space left on device
[ 143.299557] bcache: bch_count_io_errors() dm-1: IO error on reading dirty data from cache, recovering
[ 143.305560] bcache: bch_count_io_errors() dm-1: IO error on reading dirty data from cache, recovering
[ 143.308463] bcache: bch_count_io_errors() dm-1: IO error on reading dirty data from cache, recovering
<looping>
# ./dm_fake_dev.sh /dev/loop1 good
[ 171.220172] bcache: bch_count_io_errors() dm-1: IO error on reading dirty data from cache, recovering
[ 171.426360] bcache: register_bcache() error /dev/dm-1: device already registered
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 32.066 s, 33.5 MB/s
dd: closing input file '/dev/zero': Bad file descriptor
#
# [ 174.589309] bcache: cached_dev_detach_finish() Caching disabled for dm-0
[ 174.599062] bcache: cache_set_free() Cache set 68e969a1-0f60-4978-a109-b48559970472 unregistered
# dd if=/dev/zero of=/dev/bcache0 bs=4k
dd: error writing '/dev/bcache0': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 5.96102 s, 180 MB/s
# ./dm_fake_dev.sh /dev/loop1 bad
[ 81.798129] Buffer I/O error on dev dm-1, logical block 262128, async page read
[ 81.804533] Buffer I/O error on dev dm-1, logical block 262128, async page read
[ 81.810630] bcache: register_bcache() error /dev/dm-1: device already registered
[ 87.004373] bcache: bch_count_io_errors() dm-1: IO error on reading dirty data from cache, recovering
[ 87.011977] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable
[ 87.044118] bcache: bch_count_io_errors() dm-1: IO error on reading dirty data from cache, recovering
[ 87.047501] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable
[ 87.084123] bcache: bch_count_io_errors() dm-1: IO error on reading dirty data from cache, recovering
[ 87.087723] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable
[ 87.091177] bcache: bch_count_io_errors() dm-1: IO error on reading dirty data from cache, recovering
[ 87.094804] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable
[ 87.180123] bcache: bch_count_io_errors() dm-1: IO error on reading dirty data from cache, recovering
[ 87.183770] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable
[ 87.220126] bcache: bch_count_io_errors() dm-1: IO error on reading dirty data from cache, recovering
[ 87.223760] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable
...
[ 90.070935] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable
[ 90.075850] bcache: bch_count_io_errors() dm-1: IO error on reading dirty data from cache, recovering
[ 90.080099] bcache: bch_cached_dev_error() stop bcache0: too many IO errors on backing device dm-0
[ 90.080099]
[ 90.086639] bcache: bch_count_io_errors() dm-1: IO error on reading dirty data from cache, recovering
[ 90.091152] bcache: bcache_device_free() bcache0 stopped
dd: error writing '/dev/bcache0': No space left on device
1015982+0 records in
1015981+0 records out
4161458176 bytes (4.2 GB, 3.9 GiB) copied, 4.86963 s, 855 MB/s
I/O Error Test 2
================
commit "bcache: add stop_when_ cache_set_ failed option to backing device"
Problem: upon too many io errors on cache device, bcache code retires cache set,
but it doesn't stop the bcache device.
(in writeback mode, if it continue to write, writes go to backing device directly;
next time bcache re-registers, it writes dirty cached data again, overwriting data)
Original kernel: bcache device remains available for writes.
Modified kernel: bcache device is not available anymore.
Original
--------
# uname -rv
4.15.0-55-generic #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019
# ./setup.sh >/dev/null 2>&1 dev_attach( ) Caching dm-0 as bcache0 on set 68e969a1- 0f60-4978- a109-b485599704 72
[ 27.427543] bcache: register_bdev() registered backing device dm-0
[ 27.434844] bcache: run_cache_set() invalidating existing data
[ 27.445652] bcache: register_cache() registered cache device dm-1
[ 29.435245] bcache: bch_cached_
# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
└─fake-loop0 253:0 0 1024M 0 dm
└─bcache0 251:0 0 1024M 0 disk
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
└─bcache0 251:0 0 1024M 0 disk
# echo writeback > /sys/block/ bcache0/ bcache/ cache_mode
# dd if=/dev/zero of=/dev/bcache0 bs=4k
dd: error writing '/dev/bcache0': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 5.14109 s, 209 MB/s
# ./dm_fake_dev.sh /dev/loop1 bad
[ 135.316335] Buffer I/O error on dev dm-1, logical block 262128, async page read
[ 135.330181] Buffer I/O error on dev dm-1, logical block 262128, async page read
[ 135.337019] bcache: register_bcache() error /dev/dm-1: device already registered
# dd if=/dev/zero of=/dev/bcache0 bs=4k 0f60-4978- a109-b485599704 72: io_errors( ) dm-1: IO error on reading dirty data from cache, recovering io_errors( ) dm-1: IO error on reading dirty data from cache, recovering io_errors( ) dm-1: IO error on reading dirty data from cache, recovering
[ 142.754766] bcache: error on 68e969a1-
[ 142.754769] journal io error
[ 142.757583] , disabling caching
dd: error writing '/dev/bcache0': No space left on device
[ 143.299557] bcache: bch_count_
[ 143.305560] bcache: bch_count_
[ 143.308463] bcache: bch_count_
<looping>
# ./dm_fake_dev.sh /dev/loop1 good io_errors( ) dm-1: IO error on reading dirty data from cache, recovering
[ 171.220172] bcache: bch_count_
[ 171.426360] bcache: register_bcache() error /dev/dm-1: device already registered
262142+0 records in dev_detach_ finish( ) Caching disabled for dm-0 0f60-4978- a109-b485599704 72 unregistered
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 32.066 s, 33.5 MB/s
dd: closing input file '/dev/zero': Bad file descriptor
#
# [ 174.589309] bcache: cached_
[ 174.599062] bcache: cache_set_free() Cache set 68e969a1-
# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
└─fake-loop0 253:0 0 1024M 0 dm
└─bcache0 251:0 0 1024M 0 disk
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
Modified
--------
# uname -rv 3build1bcache1- Ubuntu SMP Wed Jul 3 21:41:37 UTC
4.15.0-55-generic #60+test2019070
# ./setup.sh >/dev/null 2>&1 dev_attach( ) Caching dm-0 as bcache0 on set 9b509864- 84be-428e- ad8a-293d546ec2 e9
[ 39.981825] bcache: register_bdev() registered backing device dm-0
[ 39.987359] bcache: run_cache_set() invalidating existing data
[ 40.000743] bcache: register_cache() registered cache device dm-1
[ 41.985275] bcache: bch_cached_
# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
└─fake-loop0 253:0 0 1024M 0 dm
└─bcache0 251:0 0 1024M 0 disk
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
└─bcache0 251:0 0 1024M 0 disk
# echo writeback > /sys/block/ bcache0/ bcache/ cache_mode
# dd if=/dev/zero of=/dev/bcache0 bs=4k
dd: error writing '/dev/bcache0': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 5.96102 s, 180 MB/s
# ./dm_fake_dev.sh /dev/loop1 bad io_errors( ) dm-1: IO error on reading dirty data from cache, recovering backing_ io_errors( ) dm-0: IO error on backing device, unrecoverable io_errors( ) dm-1: IO error on reading dirty data from cache, recovering backing_ io_errors( ) dm-0: IO error on backing device, unrecoverable io_errors( ) dm-1: IO error on reading dirty data from cache, recovering backing_ io_errors( ) dm-0: IO error on backing device, unrecoverable io_errors( ) dm-1: IO error on reading dirty data from cache, recovering backing_ io_errors( ) dm-0: IO error on backing device, unrecoverable io_errors( ) dm-1: IO error on reading dirty data from cache, recovering backing_ io_errors( ) dm-0: IO error on backing device, unrecoverable io_errors( ) dm-1: IO error on reading dirty data from cache, recovering backing_ io_errors( ) dm-0: IO error on backing device, unrecoverable backing_ io_errors( ) dm-0: IO error on backing device, unrecoverable io_errors( ) dm-1: IO error on reading dirty data from cache, recovering dev_error( ) stop bcache0: too many IO errors on backing device dm-0 io_errors( ) dm-1: IO error on reading dirty data from cache, recovering device_ free() bcache0 stopped
[ 81.798129] Buffer I/O error on dev dm-1, logical block 262128, async page read
[ 81.804533] Buffer I/O error on dev dm-1, logical block 262128, async page read
[ 81.810630] bcache: register_bcache() error /dev/dm-1: device already registered
[ 87.004373] bcache: bch_count_
[ 87.011977] bcache: bch_count_
[ 87.044118] bcache: bch_count_
[ 87.047501] bcache: bch_count_
[ 87.084123] bcache: bch_count_
[ 87.087723] bcache: bch_count_
[ 87.091177] bcache: bch_count_
[ 87.094804] bcache: bch_count_
[ 87.180123] bcache: bch_count_
[ 87.183770] bcache: bch_count_
[ 87.220126] bcache: bch_count_
[ 87.223760] bcache: bch_count_
...
[ 90.070935] bcache: bch_count_
[ 90.075850] bcache: bch_count_
[ 90.080099] bcache: bch_cached_
[ 90.080099]
[ 90.086639] bcache: bch_count_
[ 90.091152] bcache: bcache_
dd: error writing '/dev/bcache0': No space left on device
1015982+0 records in
1015981+0 records out
4161458176 bytes (4.2 GB, 3.9 GiB) copied, 4.86963 s, 855 MB/s
bcache device removed:
# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
└─fake-loop0 253:0 0 1024M 0 dm
loop1 7:1 0 1G 0 loop
fake-loop1 253:1 0 1G 0 dm