I've setup our integration test that runs the the CDO-QA bcache/ceph setup.
On the updated kernel I got through 10 loops on the deployment before it stacktraced:
http://paste.ubuntu.com/p/zVrtvKBfCY/
[ 3939.846908] bcache: bch_cached_dev_attach() Caching vdd as bcache5 on set 275985b3-da58-41f8-9072-958bd960b490 [ 3939.878388] bcache: register_bcache() error /dev/vdd: device already registered (emitting change event) [ 3939.904984] bcache: register_bcache() error /dev/vdd: device already registered (emitting change event) [ 3939.972715] bcache: register_bcache() error /dev/vdd: device already registered (emitting change event) [ 3940.059415] bcache: register_bcache() error /dev/vdd: device already registered (emitting change event) [ 3940.129854] bcache: register_bcache() error /dev/vdd: device already registered (emitting change event) [ 3949.257051] md: md0: resync done. [ 4109.273558] INFO: task python3:19635 blocked for more than 120 seconds. [ 4109.279331] Tainted: P O 4.15.0-55-generic #60+lp796292 [ 4109.284767] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4109.288771] python3 D 0 19635 16361 0x00000000 [ 4109.288774] Call Trace: [ 4109.288818] __schedule+0x291/0x8a0 [ 4109.288822] ? __switch_to_asm+0x34/0x70 [ 4109.288824] ? __switch_to_asm+0x40/0x70 [ 4109.288826] schedule+0x2c/0x80 [ 4109.288852] bch_bucket_alloc+0x1fa/0x350 [bcache] [ 4109.288866] ? wait_woken+0x80/0x80 [ 4109.288872] __bch_bucket_alloc_set+0xfe/0x150 [bcache] [ 4109.288876] bch_bucket_alloc_set+0x4e/0x70 [bcache] [ 4109.288882] __uuid_write+0x59/0x150 [bcache] [ 4109.288895] ? submit_bio+0x73/0x140 [ 4109.288900] ? __write_super+0x137/0x170 [bcache] [ 4109.288905] bch_uuid_write+0x16/0x40 [bcache] [ 4109.288911] __cached_dev_store+0x1a1/0x6d0 [bcache] [ 4109.288916] bch_cached_dev_store+0x39/0xc0 [bcache] [ 4109.288992] sysfs_kf_write+0x3c/0x50 [ 4109.288998] kernfs_fop_write+0x125/0x1a0 [ 4109.289001] __vfs_write+0x1b/0x40 [ 4109.289003] vfs_write+0xb1/0x1a0 [ 4109.289004] SyS_write+0x55/0xc0 [ 4109.289010] do_syscall_64+0x73/0x130 [ 4109.289014] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 4109.289016] RIP: 0033:0x7f8d2833e154 [ 4109.289018] RSP: 002b:00007ffcda55a4e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 4109.289020] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f8d2833e154 [ 4109.289021] RDX: 0000000000000008 RSI: 00000000022b7360 RDI: 0000000000000003 [ 4109.289022] RBP: 00007f8d288396c0 R08: 0000000000000000 R09: 0000000000000000 [ 4109.289022] R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003 [ 4109.289026] R13: 0000000000000000 R14: 00000000022b7360 R15: 0000000001fe8db0 [ 4109.289033] INFO: task bcache_allocato:22317 blocked for more than 120 seconds. [ 4109.292172] Tainted: P O 4.15.0-55-generic #60+lp796292 [ 4109.295345] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4109.298208] bcache_allocato D 0 22317 2 0x80000000 [ 4109.298212] Call Trace: [ 4109.298217] __schedule+0x291/0x8a0 [ 4109.298225] schedule+0x2c/0x80 [ 4109.298232] bch_bucket_alloc+0x1fa/0x350 [bcache] [ 4109.298235] ? wait_woken+0x80/0x80 [ 4109.298241] bch_prio_write+0x19f/0x340 [bcache] [ 4109.298246] bch_allocator_thread+0x502/0xca0 [bcache] [ 4109.298260] kthread+0x121/0x140 [ 4109.298264] ? bch_invalidate_one_bucket+0x80/0x80 [bcache] [ 4109.298274] ? kthread_create_worker_on_cpu+0x70/0x70 [ 4109.298277] ret_from_fork+0x35/0x40
I've setup our integration test that runs the the CDO-QA bcache/ceph setup.
On the updated kernel I got through 10 loops on the deployment before it stacktraced:
http:// paste.ubuntu. com/p/zVrtvKBfC Y/
[ 3939.846908] bcache: bch_cached_ dev_attach( ) Caching vdd as bcache5 on set 275985b3- da58-41f8- 9072-958bd960b4 90 kernel/ hung_task_ timeout_ secs" disables this message. 0x291/0x8a0 to_asm+ 0x34/0x70 to_asm+ 0x40/0x70 alloc+0x1fa/ 0x350 [bcache] 0x80/0x80 alloc_set+ 0xfe/0x150 [bcache] alloc_set+ 0x4e/0x70 [bcache] write+0x59/ 0x150 [bcache] bio+0x73/ 0x140 super+0x137/ 0x170 [bcache] write+0x16/ 0x40 [bcache] dev_store+ 0x1a1/0x6d0 [bcache] dev_store+ 0x39/0xc0 [bcache] write+0x3c/ 0x50 fop_write+ 0x125/0x1a0 0x1b/0x40 0xb1/0x1a0 64+0x73/ 0x130 64_after_ hwframe+ 0x3d/0xa2 55a4e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 allocato: 22317 blocked for more than 120 seconds. kernel/ hung_task_ timeout_ secs" disables this message. 0x291/0x8a0 alloc+0x1fa/ 0x350 [bcache] 0x80/0x80 write+0x19f/ 0x340 [bcache] thread+ 0x502/0xca0 [bcache] one_bucket+ 0x80/0x80 [bcache] create_ worker_ on_cpu+ 0x70/0x70 fork+0x35/ 0x40
[ 3939.878388] bcache: register_bcache() error /dev/vdd: device already registered (emitting change event)
[ 3939.904984] bcache: register_bcache() error /dev/vdd: device already registered (emitting change event)
[ 3939.972715] bcache: register_bcache() error /dev/vdd: device already registered (emitting change event)
[ 3940.059415] bcache: register_bcache() error /dev/vdd: device already registered (emitting change event)
[ 3940.129854] bcache: register_bcache() error /dev/vdd: device already registered (emitting change event)
[ 3949.257051] md: md0: resync done.
[ 4109.273558] INFO: task python3:19635 blocked for more than 120 seconds.
[ 4109.279331] Tainted: P O 4.15.0-55-generic #60+lp796292
[ 4109.284767] "echo 0 > /proc/sys/
[ 4109.288771] python3 D 0 19635 16361 0x00000000
[ 4109.288774] Call Trace:
[ 4109.288818] __schedule+
[ 4109.288822] ? __switch_
[ 4109.288824] ? __switch_
[ 4109.288826] schedule+0x2c/0x80
[ 4109.288852] bch_bucket_
[ 4109.288866] ? wait_woken+
[ 4109.288872] __bch_bucket_
[ 4109.288876] bch_bucket_
[ 4109.288882] __uuid_
[ 4109.288895] ? submit_
[ 4109.288900] ? __write_
[ 4109.288905] bch_uuid_
[ 4109.288911] __cached_
[ 4109.288916] bch_cached_
[ 4109.288992] sysfs_kf_
[ 4109.288998] kernfs_
[ 4109.289001] __vfs_write+
[ 4109.289003] vfs_write+
[ 4109.289004] SyS_write+0x55/0xc0
[ 4109.289010] do_syscall_
[ 4109.289014] entry_SYSCALL_
[ 4109.289016] RIP: 0033:0x7f8d2833e154
[ 4109.289018] RSP: 002b:00007ffcda
[ 4109.289020] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f8d2833e154
[ 4109.289021] RDX: 0000000000000008 RSI: 00000000022b7360 RDI: 0000000000000003
[ 4109.289022] RBP: 00007f8d288396c0 R08: 0000000000000000 R09: 0000000000000000
[ 4109.289022] R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
[ 4109.289026] R13: 0000000000000000 R14: 00000000022b7360 R15: 0000000001fe8db0
[ 4109.289033] INFO: task bcache_
[ 4109.292172] Tainted: P O 4.15.0-55-generic #60+lp796292
[ 4109.295345] "echo 0 > /proc/sys/
[ 4109.298208] bcache_allocato D 0 22317 2 0x80000000
[ 4109.298212] Call Trace:
[ 4109.298217] __schedule+
[ 4109.298225] schedule+0x2c/0x80
[ 4109.298232] bch_bucket_
[ 4109.298235] ? wait_woken+
[ 4109.298241] bch_prio_
[ 4109.298246] bch_allocator_
[ 4109.298260] kthread+0x121/0x140
[ 4109.298264] ? bch_invalidate_
[ 4109.298274] ? kthread_
[ 4109.298277] ret_from_