Comment 3 for bug 1837788

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

This is the warning trigger line:

    int bch_cached_dev_writeback_start(struct cached_dev *dc)
    ...
        WARN_ON(test_and_set_bit(BCACHE_DEV_WB_RUNNING, &dc->disk.flags));
    ...

This commit fixes this problem:

    commit 1f0ffa67349c56ea54c03ccfd1e073c990e7411e
    Author: Coly Li <email address hidden>
    Date: Mon Jun 10 06:13:35 2019 +0800

        bcache: only set BCACHE_DEV_WB_RUNNING when cached device attached

It is included in linux 5.2, so for now only Ubuntu Eoan has the fix;
I could reproduce the problem with Ubuntu Disco (5.0-based) kernel.

The problem can Oops the kernel (worse error) if the cache is not attached soon (reproduced below).

I'll provide a test kernel with the fix soon.

Steps to reproduce:
------------------

With attach (last line in script):

    $ sudo ./setup-bcache-wb_percent-before-attach.sh
    ...
    [ 36.130659] WARNING: CPU: 2 PID: 1288 at drivers/md/bcache/writeback.c:837 bch_cached_dev_writeback_start+0x156/0x164 [bcache]
    ...
    [ 36.155742] RIP: 0010:bch_cached_dev_writeback_start+0x156/0x164 [bcache]
    ...
    [ 36.182455] Call Trace:
    [ 36.183387] bch_cached_dev_attach+0x356/0x480 [bcache]
    [ 36.185104] ? bch_keybuf_init+0x70/0x70 [bcache]
    [ 36.186653] __cached_dev_store+0x59f/0x8b0 [bcache]
    [ 36.188274] bch_cached_dev_store+0x39/0xc0 [bcache]
    [ 36.189887] sysfs_kf_write+0x3b/0x40
    [ 36.191142] kernfs_fop_write+0x12e/0x1b0
    [ 36.192502] __vfs_write+0x1b/0x40
    [ 36.193686] vfs_write+0xab/0x1b0
    [ 36.194842] ksys_write+0x5c/0xd0
    [ 36.196026] __x64_sys_write+0x1a/0x20
    [ 36.197296] do_syscall_64+0x5a/0x110
    [ 36.198532] entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [ 36.200158] RIP: 0033:0x7f69a1d09154
    ...
    [ 36.222967] ---[ end trace ec65e906cd03afa6 ]---
    [ 36.225017] bcache: bch_cached_dev_attach() Caching loop0 as bcache0 on set 57322d7a-1421-4dfc-a8cb-9c7b4e60c50a

Without attach (last line in script):
in a few seconds:

    $ sudo ./setup-bcache-wb_percent-before-attach.sh
    ...
    [ 30.623677] BUG: unable to handle kernel NULL pointer dereference at 0000000000000340
    [ 30.627527] #PF error: [normal kernel read fault]
    [ 30.629825] PGD 0 P4D 0
    [ 30.631194] Oops: 0000 [#1] SMP PTI
    ...
    [ 30.638818] Workqueue: events update_writeback_rate [bcache]
    [ 30.640543] RIP: 0010:update_writeback_rate+0x36/0x330 [bcache]
    ...
    [ 30.668496] Call Trace:
    [ 30.670003] ? __schedule+0x2d8/0x840
    [ 30.671998] process_one_work+0x20f/0x410
    [ 30.674163] worker_thread+0x34/0x400
    [ 30.676160] kthread+0x120/0x140
    [ 30.678014] ? process_one_work+0x410/0x410
    [ 30.680238] ? __kthread_parkme+0x70/0x70
    [ 30.682411] ret_from_fork+0x35/0x40
    ...
    [ 30.704646] ---[ end trace 1bbc1b34c704f1c5 ]---
    [ 30.707086] RIP: 0010:update_writeback_rate+0x36/0x330 [bcache]

Test script:

$ cat setup-bcache-wb_percent-before-attach.sh
#!/bin/sh

#set -ex

dmesg -C
echo 9 > /proc/sys/kernel/printk

FILE_BACKING=/backing.img
FILE_CACHING=/caching.img

dd if=/dev/zero of=$FILE_BACKING bs=1G seek=1 count=0
dd if=/dev/zero of=$FILE_CACHING bs=1G seek=1 count=0

dd if=/dev/zero of=$FILE_BACKING bs=1M count=1 conv=notrunc
dd if=/dev/zero of=$FILE_CACHING bs=1M count=1 conv=notrunc

LOOP_BACKING=$(losetup -f --show $FILE_BACKING)
LOOP_CACHING=$(losetup -f --show $FILE_CACHING)

make-bcache -B $LOOP_BACKING
make-bcache -C $LOOP_CACHING
sleep 3

CSET_UUID=$(bcache-super-show $LOOP_CACHING | awk '/cset\.uuid/ { print $2 }')

# Change writeback_percent before attaching
echo 11 > /sys/block/$(basename $LOOP_BACKING)/bcache/writeback_percent
sleep 1

# Comment this line to Oops in a few seconds
echo $CSET_UUID > /sys/block/$(basename $LOOP_BACKING)/bcache/attach