Linux

zfs recv PANIC at range_tree.c:304:range_tree_find_impl()

Bug #1861235 reported by Seth Arnold on 2020-01-28

This bug affects 3 people

	Status	Importance	Assigned to
Linux	Fix Released	Unknown	github-zfs #8637
zfs-linux (Ubuntu)	Incomplete	Undecided	Unassigned
Bionic	Confirmed	Undecided	Unassigned

Bug Description

Same as bug 1861228 but with a newer kernel installed.

[ 790.702566] VERIFY(size != 0) failed
[ 790.702590] PANIC at range_tree.c:304:range_tree_find_impl()
[ 790.702611] Showing stack for process 28685
[ 790.702614] CPU: 17 PID: 28685 Comm: receive_writer Tainted: P O 4.15.0-76-generic #86-Ubuntu
[ 790.702615] Hardware name: Supermicro SSG-6038R-E1CR16L/X10DRH-iT, BIOS 2.0 12/17/2015
[ 790.702616] Call Trace:
[ 790.702626] dump_stack+0x6d/0x8e
[ 790.702637] spl_dumpstack+0x42/0x50 [spl]
[ 790.702640] spl_panic+0xc8/0x110 [spl]
[ 790.702645] ? __switch_to_asm+0x41/0x70
[ 790.702714] ? arc_prune_task+0x1a/0x40 [zfs]
[ 790.702740] ? dbuf_dirty+0x43d/0x850 [zfs]
[ 790.702745] ? getrawmonotonic64+0x43/0xd0
[ 790.702746] ? getrawmonotonic64+0x43/0xd0
[ 790.702775] ? dmu_zfetch+0x49a/0x500 [zfs]
[ 790.702778] ? getrawmonotonic64+0x43/0xd0
[ 790.702805] ? dmu_zfetch+0x49a/0x500 [zfs]
[ 790.702807] ? mutex_lock+0x12/0x40
[ 790.702833] ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs]
[ 790.702866] range_tree_find_impl+0x88/0x90 [zfs]
[ 790.702870] ? spl_kmem_zalloc+0xdc/0x1a0 [spl]
[ 790.702902] range_tree_clear+0x4f/0x60 [zfs]
[ 790.702930] dnode_free_range+0x11f/0x5a0 [zfs]
[ 790.702957] dmu_object_free+0x53/0x90 [zfs]
[ 790.702983] dmu_free_long_object+0x9f/0xc0 [zfs]
[ 790.703010] receive_freeobjects.isra.12+0x7a/0x100 [zfs]
[ 790.703036] receive_writer_thread+0x6d2/0xa60 [zfs]
[ 790.703040] ? set_curr_task_fair+0x2b/0x60
[ 790.703043] ? spl_kmem_free+0x33/0x40 [spl]
[ 790.703048] ? kfree+0x165/0x180
[ 790.703073] ? receive_free.isra.13+0xc0/0xc0 [zfs]
[ 790.703078] thread_generic_wrapper+0x74/0x90 [spl]
[ 790.703081] kthread+0x121/0x140
[ 790.703084] ? __thread_exit+0x20/0x20 [spl]
[ 790.703085] ? kthread_create_worker_on_cpu+0x70/0x70
[ 790.703088] ret_from_fork+0x35/0x40
[ 967.636923] INFO: task txg_quiesce:14810 blocked for more than 120 seconds.
[ 967.636979] Tainted: P O 4.15.0-76-generic #86-Ubuntu
[ 967.637024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 967.637076] txg_quiesce D 0 14810 2 0x80000000
[ 967.637080] Call Trace:
[ 967.637089] __schedule+0x24e/0x880
[ 967.637092] schedule+0x2c/0x80
[ 967.637106] cv_wait_common+0x11e/0x140 [spl]
[ 967.637114] ? wait_woken+0x80/0x80
[ 967.637122] __cv_wait+0x15/0x20 [spl]
[ 967.637210] txg_quiesce_thread+0x2cb/0x3d0 [zfs]
[ 967.637278] ? txg_delay+0x1b0/0x1b0 [zfs]
[ 967.637286] thread_generic_wrapper+0x74/0x90 [spl]
[ 967.637291] kthread+0x121/0x140
[ 967.637297] ? __thread_exit+0x20/0x20 [spl]
[ 967.637299] ? kthread_create_worker_on_cpu+0x70/0x70
[ 967.637304] ret_from_fork+0x35/0x40
[ 967.637326] INFO: task zfs:28590 blocked for more than 120 seconds.
[ 967.637371] Tainted: P O 4.15.0-76-generic #86-Ubuntu
[ 967.637416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 967.637467] zfs D 0 28590 28587 0x80000080
[ 967.637470] Call Trace:
[ 967.637474] __schedule+0x24e/0x880
[ 967.637477] schedule+0x2c/0x80
[ 967.637486] cv_wait_common+0x11e/0x140 [spl]
[ 967.637491] ? wait_woken+0x80/0x80
[ 967.637498] __cv_wait+0x15/0x20 [spl]
[ 967.637554] dmu_recv_stream+0xa51/0xef0 [zfs]
[ 967.637630] zfs_ioc_recv_impl+0x306/0x1100 [zfs]
[ 967.637679] ? dbuf_read+0x34a/0x920 [zfs]
[ 967.637725] ? dbuf_rele+0x36/0x40 [zfs]
[ 967.637728] ? _cond_resched+0x19/0x40
[ 967.637798] zfs_ioc_recv_new+0x33d/0x410 [zfs]
[ 967.637809] ? spl_kmem_alloc_impl+0xe5/0x1a0 [spl]
[ 967.637816] ? spl_vmem_alloc+0x19/0x20 [spl]
[ 967.637828] ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
[ 967.637834] ? nv_mem_zalloc.isra.0+0x2e/0x40 [znvpair]
[ 967.637840] ? nvlist_xalloc.part.2+0x50/0xb0 [znvpair]
[ 967.637905] zfsdev_ioctl+0x451/0x610 [zfs]
[ 967.637913] do_vfs_ioctl+0xa8/0x630
[ 967.637917] ? __audit_syscall_entry+0xbc/0x110
[ 967.637924] ? syscall_trace_enter+0x1da/0x2d0
[ 967.637927] SyS_ioctl+0x79/0x90
[ 967.637930] do_syscall_64+0x73/0x130
[ 967.637935] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 967.637938] RIP: 0033:0x7fc305a905d7
[ 967.637940] RSP: 002b:00007ffc45e39618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 967.637943] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00007fc305a905d7
[ 967.637945] RDX: 00007ffc45e39630 RSI: 0000000000005a46 RDI: 0000000000000006
[ 967.637946] RBP: 00007ffc45e39630 R08: 00007fc305d65e20 R09: 0000000000000000
[ 967.637948] R10: 00005648313c7010 R11: 0000000000000246 R12: 00007ffc45e3cc60
[ 967.637949] R13: 0000000000000006 R14: 00005648313cef10 R15: 000000000000000c
[ 967.637960] INFO: task receive_writer:28685 blocked for more than 120 seconds.
[ 967.638010] Tainted: P O 4.15.0-76-generic #86-Ubuntu
[ 967.638055] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 967.638107] receive_writer D 0 28685 2 0x80000080
[ 967.638109] Call Trace:
[ 967.638113] __schedule+0x24e/0x880
[ 967.638117] schedule+0x2c/0x80
[ 967.638126] spl_panic+0xfa/0x110 [spl]
[ 967.638175] ? arc_prune_task+0x1a/0x40 [zfs]
[ 967.638223] ? dbuf_dirty+0x43d/0x850 [zfs]
[ 967.638229] ? getrawmonotonic64+0x43/0xd0
[ 967.638232] ? getrawmonotonic64+0x43/0xd0
[ 967.638289] ? dmu_zfetch+0x49a/0x500 [zfs]
[ 967.638293] ? getrawmonotonic64+0x43/0xd0
[ 967.638344] ? dmu_zfetch+0x49a/0x500 [zfs]
[ 967.638348] ? mutex_lock+0x12/0x40
[ 967.638395] ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs]
[ 967.638461] range_tree_find_impl+0x88/0x90 [zfs]
[ 967.638468] ? spl_kmem_zalloc+0xdc/0x1a0 [spl]
[ 967.638530] range_tree_clear+0x4f/0x60 [zfs]
[ 967.638583] dnode_free_range+0x11f/0x5a0 [zfs]
[ 967.638635] dmu_object_free+0x53/0x90 [zfs]
[ 967.638685] dmu_free_long_object+0x9f/0xc0 [zfs]
[ 967.638738] receive_freeobjects.isra.12+0x7a/0x100 [zfs]
[ 967.638787] receive_writer_thread+0x6d2/0xa60 [zfs]
[ 967.638792] ? set_curr_task_fair+0x2b/0x60
[ 967.638800] ? spl_kmem_free+0x33/0x40 [spl]
[ 967.638805] ? kfree+0x165/0x180
[ 967.638852] ? receive_free.isra.13+0xc0/0xc0 [zfs]
[ 967.638860] thread_generic_wrapper+0x74/0x90 [spl]
[ 967.638863] kthread+0x121/0x140
[ 967.638869] ? __thread_exit+0x20/0x20 [spl]
[ 967.638872] ? kthread_create_worker_on_cpu+0x70/0x70
[ 967.638876] ret_from_fork+0x35/0x40
[ 1088.467860] INFO: task txg_quiesce:14810 blocked for more than 120 seconds.
[ 1088.467913] Tainted: P O 4.15.0-76-generic #86-Ubuntu
[ 1088.467958] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1088.468010] txg_quiesce D 0 14810 2 0x80000000
[ 1088.468013] Call Trace:
[ 1088.468020] __schedule+0x24e/0x880
[ 1088.468023] schedule+0x2c/0x80
[ 1088.468035] cv_wait_common+0x11e/0x140 [spl]
[ 1088.468039] ? wait_woken+0x80/0x80
[ 1088.468048] __cv_wait+0x15/0x20 [spl]
[ 1088.468124] txg_quiesce_thread+0x2cb/0x3d0 [zfs]
[ 1088.468193] ? txg_delay+0x1b0/0x1b0 [zfs]
[ 1088.468201] thread_generic_wrapper+0x74/0x90 [spl]
[ 1088.468204] kthread+0x121/0x140
[ 1088.468210] ? __thread_exit+0x20/0x20 [spl]
[ 1088.468213] ? kthread_create_worker_on_cpu+0x70/0x70
[ 1088.468217] ret_from_fork+0x35/0x40
[ 1088.468234] INFO: task zfs:28590 blocked for more than 120 seconds.
[ 1088.468278] Tainted: P O 4.15.0-76-generic #86-Ubuntu
[ 1088.468322] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1088.468373] zfs D 0 28590 28587 0x80000080
[ 1088.468376] Call Trace:
[ 1088.468379] __schedule+0x24e/0x880
[ 1088.468382] schedule+0x2c/0x80
[ 1088.468391] cv_wait_common+0x11e/0x140 [spl]
[ 1088.468396] ? wait_woken+0x80/0x80
[ 1088.468403] __cv_wait+0x15/0x20 [spl]
[ 1088.468459] dmu_recv_stream+0xa51/0xef0 [zfs]
[ 1088.468534] zfs_ioc_recv_impl+0x306/0x1100 [zfs]
[ 1088.468582] ? dbuf_read+0x34a/0x920 [zfs]
[ 1088.468631] ? dbuf_rele+0x36/0x40 [zfs]
[ 1088.468634] ? _cond_resched+0x19/0x40
[ 1088.468704] zfs_ioc_recv_new+0x33d/0x410 [zfs]
[ 1088.468714] ? spl_kmem_alloc_impl+0xe5/0x1a0 [spl]
[ 1088.468721] ? spl_vmem_alloc+0x19/0x20 [spl]
[ 1088.468729] ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
[ 1088.468735] ? nv_mem_zalloc.isra.0+0x2e/0x40 [znvpair]
[ 1088.468740] ? nvlist_xalloc.part.2+0x50/0xb0 [znvpair]
[ 1088.468805] zfsdev_ioctl+0x451/0x610 [zfs]
[ 1088.468811] do_vfs_ioctl+0xa8/0x630
[ 1088.468815] ? __audit_syscall_entry+0xbc/0x110
[ 1088.468821] ? syscall_trace_enter+0x1da/0x2d0
[ 1088.468824] SyS_ioctl+0x79/0x90
[ 1088.468827] do_syscall_64+0x73/0x130
[ 1088.468831] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 1088.468834] RIP: 0033:0x7fc305a905d7
[ 1088.468836] RSP: 002b:00007ffc45e39618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1088.468839] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00007fc305a905d7
[ 1088.468840] RDX: 00007ffc45e39630 RSI: 0000000000005a46 RDI: 0000000000000006
[ 1088.468842] RBP: 00007ffc45e39630 R08: 00007fc305d65e20 R09: 0000000000000000
[ 1088.468843] R10: 00005648313c7010 R11: 0000000000000246 R12: 00007ffc45e3cc60
[ 1088.468845] R13: 0000000000000006 R14: 00005648313cef10 R15: 000000000000000c
[ 1088.468853] INFO: task receive_writer:28685 blocked for more than 120 seconds.
[ 1088.468903] Tainted: P O 4.15.0-76-generic #86-Ubuntu
[ 1088.468948] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1088.469000] receive_writer D 0 28685 2 0x80000080
[ 1088.469002] Call Trace:
[ 1088.469006] __schedule+0x24e/0x880
[ 1088.469009] schedule+0x2c/0x80
[ 1088.469018] spl_panic+0xfa/0x110 [spl]
[ 1088.469066] ? arc_prune_task+0x1a/0x40 [zfs]
[ 1088.469116] ? dbuf_dirty+0x43d/0x850 [zfs]
[ 1088.469120] ? getrawmonotonic64+0x43/0xd0
[ 1088.469123] ? getrawmonotonic64+0x43/0xd0
[ 1088.469178] ? dmu_zfetch+0x49a/0x500 [zfs]
[ 1088.469182] ? getrawmonotonic64+0x43/0xd0
[ 1088.469234] ? dmu_zfetch+0x49a/0x500 [zfs]
[ 1088.469237] ? mutex_lock+0x12/0x40
[ 1088.469286] ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs]
[ 1088.469352] range_tree_find_impl+0x88/0x90 [zfs]
[ 1088.469360] ? spl_kmem_zalloc+0xdc/0x1a0 [spl]
[ 1088.469421] range_tree_clear+0x4f/0x60 [zfs]
[ 1088.469475] dnode_free_range+0x11f/0x5a0 [zfs]
[ 1088.469526] dmu_object_free+0x53/0x90 [zfs]
[ 1088.469575] dmu_free_long_object+0x9f/0xc0 [zfs]
[ 1088.469627] receive_freeobjects.isra.12+0x7a/0x100 [zfs]
[ 1088.469678] receive_writer_thread+0x6d2/0xa60 [zfs]
[ 1088.469681] ? set_curr_task_fair+0x2b/0x60
[ 1088.469689] ? spl_kmem_free+0x33/0x40 [spl]
[ 1088.469693] ? kfree+0x165/0x180
[ 1088.469740] ? receive_free.isra.13+0xc0/0xc0 [zfs]
[ 1088.469747] thread_generic_wrapper+0x74/0x90 [spl]
[ 1088.469750] kthread+0x121/0x140
[ 1088.469756] ? __thread_exit+0x20/0x20 [spl]
[ 1088.469759] ? kthread_create_worker_on_cpu+0x70/0x70
[ 1088.469763] ret_from_fork+0x35/0x40
[ 1209.298375] INFO: task txg_quiesce:14810 blocked for more than 120 seconds.
[ 1209.298429] Tainted: P O 4.15.0-76-generic #86-Ubuntu
[ 1209.298474] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1209.298526] txg_quiesce D 0 14810 2 0x80000000
[ 1209.298530] Call Trace:
[ 1209.298536] __schedule+0x24e/0x880
[ 1209.298540] schedule+0x2c/0x80
[ 1209.298551] cv_wait_common+0x11e/0x140 [spl]
[ 1209.298556] ? wait_woken+0x80/0x80
[ 1209.298564] __cv_wait+0x15/0x20 [spl]
[ 1209.298644] txg_quiesce_thread+0x2cb/0x3d0 [zfs]
[ 1209.298713] ? txg_delay+0x1b0/0x1b0 [zfs]
[ 1209.298721] thread_generic_wrapper+0x74/0x90 [spl]
[ 1209.298724] kthread+0x121/0x140
[ 1209.298730] ? __thread_exit+0x20/0x20 [spl]
[ 1209.298733] ? kthread_create_worker_on_cpu+0x70/0x70
[ 1209.298738] ret_from_fork+0x35/0x40
[ 1209.298754] INFO: task zfs:28590 blocked for more than 120 seconds.
[ 1209.298799] Tainted: P O 4.15.0-76-generic #86-Ubuntu
[ 1209.298843] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1209.298894] zfs D 0 28590 28587 0x80000080
[ 1209.298897] Call Trace:
[ 1209.298900] __schedule+0x24e/0x880
[ 1209.298903] schedule+0x2c/0x80
[ 1209.298912] cv_wait_common+0x11e/0x140 [spl]
[ 1209.298916] ? wait_woken+0x80/0x80
[ 1209.298924] __cv_wait+0x15/0x20 [spl]
[ 1209.298981] dmu_recv_stream+0xa51/0xef0 [zfs]
[ 1209.299056] zfs_ioc_recv_impl+0x306/0x1100 [zfs]
[ 1209.299104] ? dbuf_read+0x34a/0x920 [zfs]
[ 1209.299151] ? dbuf_rele+0x36/0x40 [zfs]
[ 1209.299154] ? _cond_resched+0x19/0x40
[ 1209.299223] zfs_ioc_recv_new+0x33d/0x410 [zfs]
[ 1209.299232] ? spl_kmem_alloc_impl+0xe5/0x1a0 [spl]
[ 1209.299239] ? spl_vmem_alloc+0x19/0x20 [spl]
[ 1209.299247] ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
[ 1209.299255] ? nv_mem_zalloc.isra.0+0x2e/0x40 [znvpair]
[ 1209.299261] ? nvlist_xalloc.part.2+0x50/0xb0 [znvpair]
[ 1209.299325] zfsdev_ioctl+0x451/0x610 [zfs]
[ 1209.299331] do_vfs_ioctl+0xa8/0x630
[ 1209.299334] ? __audit_syscall_entry+0xbc/0x110
[ 1209.299339] ? syscall_trace_enter+0x1da/0x2d0
[ 1209.299341] SyS_ioctl+0x79/0x90
[ 1209.299345] do_syscall_64+0x73/0x130
[ 1209.299349] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 1209.299352] RIP: 0033:0x7fc305a905d7
[ 1209.299353] RSP: 002b:00007ffc45e39618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1209.299358] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00007fc305a905d7
[ 1209.299361] RDX: 00007ffc45e39630 RSI: 0000000000005a46 RDI: 0000000000000006
[ 1209.299362] RBP: 00007ffc45e39630 R08: 00007fc305d65e20 R09: 0000000000000000
[ 1209.299365] R10: 00005648313c7010 R11: 0000000000000246 R12: 00007ffc45e3cc60
[ 1209.299366] R13: 0000000000000006 R14: 00005648313cef10 R15: 000000000000000c
[ 1209.299375] INFO: task receive_writer:28685 blocked for more than 120 seconds.
[ 1209.299426] Tainted: P O 4.15.0-76-generic #86-Ubuntu
[ 1209.299471] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1209.299523] receive_writer D 0 28685 2 0x80000080
[ 1209.299525] Call Trace:
[ 1209.299529] __schedule+0x24e/0x880
[ 1209.299532] schedule+0x2c/0x80
[ 1209.299541] spl_panic+0xfa/0x110 [spl]
[ 1209.299591] ? arc_prune_task+0x1a/0x40 [zfs]
[ 1209.299638] ? dbuf_dirty+0x43d/0x850 [zfs]
[ 1209.299644] ? getrawmonotonic64+0x43/0xd0
[ 1209.299647] ? getrawmonotonic64+0x43/0xd0
[ 1209.299703] ? dmu_zfetch+0x49a/0x500 [zfs]
[ 1209.299707] ? getrawmonotonic64+0x43/0xd0
[ 1209.299759] ? dmu_zfetch+0x49a/0x500 [zfs]
[ 1209.299762] ? mutex_lock+0x12/0x40
[ 1209.299810] ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs]
[ 1209.299875] range_tree_find_impl+0x88/0x90 [zfs]
[ 1209.299883] ? spl_kmem_zalloc+0xdc/0x1a0 [spl]
[ 1209.299944] range_tree_clear+0x4f/0x60 [zfs]
[ 1209.300000] dnode_free_range+0x11f/0x5a0 [zfs]
[ 1209.300052] dmu_object_free+0x53/0x90 [zfs]
[ 1209.300102] dmu_free_long_object+0x9f/0xc0 [zfs]
[ 1209.300153] receive_freeobjects.isra.12+0x7a/0x100 [zfs]
[ 1209.300203] receive_writer_thread+0x6d2/0xa60 [zfs]
[ 1209.300208] ? set_curr_task_fair+0x2b/0x60
[ 1209.300215] ? spl_kmem_free+0x33/0x40 [spl]
[ 1209.300219] ? kfree+0x165/0x180
[ 1209.300266] ? receive_free.isra.13+0xc0/0xc0 [zfs]
[ 1209.300273] thread_generic_wrapper+0x74/0x90 [spl]
[ 1209.300278] kthread+0x121/0x140
[ 1209.300284] ? __thread_exit+0x20/0x20 [spl]
[ 1209.300287] ? kthread_create_worker_on_cpu+0x70/0x70
[ 1209.300291] ret_from_fork+0x35/0x40
[ 1330.129221] INFO: task txg_quiesce:14810 blocked for more than 120 seconds.
[ 1330.129282] Tainted: P O 4.15.0-76-generic #86-Ubuntu
[ 1330.129328] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1330.129382] txg_quiesce D 0 14810 2 0x80000000
[ 1330.129387] Call Trace:
[ 1330.129399] __schedule+0x24e/0x880
[ 1330.129403] schedule+0x2c/0x80
[ 1330.129420] cv_wait_common+0x11e/0x140 [spl]
[ 1330.129429] ? wait_woken+0x80/0x80
[ 1330.129437] __cv_wait+0x15/0x20 [spl]
[ 1330.129547] txg_quiesce_thread+0x2cb/0x3d0 [zfs]
[ 1330.129617] ? txg_delay+0x1b0/0x1b0 [zfs]
[ 1330.129625] thread_generic_wrapper+0x74/0x90 [spl]
[ 1330.129631] kthread+0x121/0x140
[ 1330.129638] ? __thread_exit+0x20/0x20 [spl]
[ 1330.129641] ? kthread_create_worker_on_cpu+0x70/0x70
[ 1330.129646] ret_from_fork+0x35/0x40

Thanks

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-76-generic 4.15.0-76.86
ProcVersionSignature: Ubuntu 4.15.0-76.86-generic 4.15.18
Uname: Linux 4.15.0-76-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Jan 28 13:31 seq
crw-rw---- 1 root audio 116, 33 Jan 28 13:31 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.9
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Tue Jan 28 15:24:10 2020
HibernationDevice: RESUME=UUID=d34df57d-ae32-4002-be2c-a25efa8678e4
InstallationDate: Installed on 2016-04-04 (1394 days ago)
InstallationMedia: Ubuntu-Server 16.04 LTS "Xenial Xerus" - Beta amd64 (20160325)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
MachineType: Supermicro SSG-6038R-E1CR16L
PciMultimedia:

ProcEnviron:
TERM=rxvt-unicode-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB: 0 astdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-76-generic root=UUID=d1ab8bd8-41d5-4142-92d6-d0926539807b ro
RelatedPackageVersions:
linux-restricted-modules-4.15.0-76-generic N/A
linux-backports-modules-4.15.0-76-generic N/A
linux-firmware 1.173.14
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: Upgraded to bionic on 2018-08-16 (530 days ago)
dmi.bios.date: 12/17/2015
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 2.0
dmi.board.asset.tag: Default string
dmi.board.name: X10DRH-iT
dmi.board.vendor: Supermicro
dmi.board.version: 1.01
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 1
dmi.chassis.vendor: Supermicro
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr2.0:bd12/17/2015:svnSupermicro:pnSSG-6038R-E1CR16L:pvr123456789:rvnSupermicro:rnX10DRH-iT:rvr1.01:cvnSupermicro:ct1:cvrDefaultstring:
dmi.product.family: SMC X10
dmi.product.name: SSG-6038R-E1CR16L
dmi.product.version: 123456789
dmi.sys.vendor: Supermicro

Tags:

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-28:

CRDA.txt Edit (454 bytes, text/plain; charset="utf-8")
CurrentDmesg.txt Edit (116.2 KiB, text/plain; charset="utf-8")
Dependencies.txt Edit (2.3 KiB, text/plain; charset="utf-8")
Lspci.txt Edit (82.6 KiB, text/plain; charset="utf-8")
Lsusb.txt Edit (610 bytes, text/plain; charset="utf-8")
ProcCpuinfo.txt Edit (38.2 KiB, text/plain; charset="utf-8")
ProcCpuinfoMinimal.txt Edit (1.2 KiB, text/plain; charset="utf-8")
ProcInterrupts.txt Edit (56.8 KiB, text/plain; charset="utf-8")
ProcModules.txt Edit (6.8 KiB, text/plain; charset="utf-8")
UdevDb.txt Edit (529.6 KiB, text/plain; charset="utf-8")
WifiSyslog.txt Edit (141.1 KiB, text/plain; charset="utf-8")

Revision history for this message

Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote on 2020-01-29: Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status:	New → Confirmed

Colin Ian King (colin-king) on 2020-01-29

Changed in linux (Ubuntu):
assignee:	nobody → Colin Ian King (colin-king)
importance:	Undecided → Medium
importance:	Medium → High

Revision history for this message

Colin Ian King (colin-king) wrote on 2020-02-10:

Can you describe the zfs environment and the command that was being actioned that triggered this issue?

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-02-11:

The two machines involved are:

Receiver, bionic, probably running 4.15.0-76-generic and zfsutils-linux 0.7.5-1ubuntu16.7

Sender, focal, probably running 5.4.0-12-generic and zfsutils-linux 0.8.3-1ubuntu3

I'm using sanoid and syncoid to automate snapshot management and sending and receiving. The combination worked well for about a week before I got this failure.

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-02-11:

zfs bookmark, zfs send commands Edit (60.4 KiB, text/plain)

These are just the zfs bookmark and zfs send commands from the sender.

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-02-11:

receiver_limited_history Edit (49.9 KiB, text/plain)

These are all the zpool and zfs commands on the receiver except snapshots, renames, and destroys, associated with my Ubuntu archive mirror rsync. (7000-ish lines of juggling 30-snapshots. I should do something better here.)

Revision history for this message

Colin Ian King (colin-king) wrote on 2020-02-24:

What is interesting is the following commit modifies range_tree_clear() so it performs a zero size check and returns before calling range_tree_find_impl(). This commit is not in 18.10 and 19.04 Ubuntu ZFS releases.

commit a1d477c
Author: Matthew Ahrens <email address hidden>
Date: Thu Sep 22 09:30:13 2016 -0700

OpenZFS 7614, 9064 - zfs device evacuation/removal

OpenZFS 7614 - zfs device evacuation/removal
OpenZFS 9064 - remove_mirror should wait for device removal to complete

the specific change is:

@@ -560,6 +536,9 @@ range_tree_clear(range_tree_t *rt, uint64_t start, uint64_t size)
{
range_seg_t *rs;

+ if (size == 0)
+ return;
+
        while ((rs = range_tree_find_impl(rt, start, size)) != NULL) {
                uint64_t free_start = MAX(rs->rs_start, start);
                uint64_t free_end = MIN(rs->rs_end, start + size);

I'm not sure why this check was added, but I guess it handles the cases were zero sized allocations are allowed and stops these from doing any unnecessary clearing and avoids the assertion. But the semantics change is not clear in the commit message.

Revision history for this message

Colin Ian King (colin-king) wrote on 2020-02-24:

I've uploaded a potential fix to a PPA, do you mind testing this using the zfs-dkms kernel modules as follows:

sudo add-apt-repository ppa:colin-king/zfs-sru-1861235
sudo apt-get update
sudo apt-get install zfs-dkms

and reboot.

Then check the correct ZFS module is being used by:

dmesg | grep ZFS

It should be the 0.7.5-1ubuntu16.9~lp1861235 version.

And see if this helps avoid this issue.

Changed in linux (Ubuntu):
status:	Confirmed → Incomplete
Changed in zfs-linux (Ubuntu):
status:	New → Incomplete
importance:	Undecided → High
assignee:	nobody → Colin Ian King (colin-king)
no longer affects:	linux (Ubuntu)
no longer affects:	linux (Ubuntu Bionic)

Revision history for this message

Colin Ian King (colin-king) wrote on 2020-02-24:

Please ignore the above. Apparently the issue needs a little more digging and the workaround is insufficient.

Revision history for this message

Colin Ian King (colin-king) wrote on 2020-04-22:

#10

In this scenario, dmu_object_free has been called and this calls dnode_free_range using dnode_free_range(dn, 0, DMU_OBJECT_END, tx)

In this case, in dnode_free_range len is DMU_OBJECT_ENDand offset is 0, so..:

    if (len == DMU_OBJECT_END) {
            len = UINT64_MAX - off;
            trunc = TRUE;
    }

makes len is UINT64_MAX and trunc = TRUE. If one follows the code through we have this stanza:

    blkid = off >> blkshift;
    nblks = len >> blkshift;
    if (trunc)
            nblks += 1;

and this is the last place nblks gets updated. This implies blkshift is zero, nblks = UINT64_MAX >> 0 is UINT64_MAX, trunc is TRUE so nblks +=1 wraps nblks back to 0, hence the assert later on.

For blkshift to be zero blksz dn->dn_datablksz is zero, so this looks like some form of corruption.

Revision history for this message

Colin Ian King (colin-king) wrote on 2020-04-22:

#11

Incidentally, dnode_free_range hasn't changed much in the focal 8.2.x zfs, so I'm not sure if upgrading to that will make much of a difference.

Revision history for this message

Colin Ian King (colin-king) wrote on 2020-04-22:

#12

BTW, Is this still and open issue?

Revision history for this message

Colin Ian King (colin-king) wrote on 2020-04-22:

#13

*still an open issue?

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-04-23:

#14

Download full text (19.5 KiB)

Hello Colin, yes, this is still an open issue:

Linux wopr 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 11:09:48 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Hello Colin, yes, this is still an open issue:

Linux wopr 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 11:09:48 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Apr 22 19:10:03 wopr zed[12576]: eid=8352 class=history_event pool_guid=0xB3B099B638F02EEF
Apr 22 19:10:03 wopr kernel: VERIFY(size != 0) failed
Apr 22 19:10:03 wopr kernel: PANIC at range_tree.c:304:range_tree_find_impl()
Apr 22 19:10:03 wopr kernel: Showing stack for process 12577
Apr 22 19:10:03 wopr kernel: CPU: 8 PID: 12577 Comm: receive_writer Tainted: P           O     4.15.0-91-generic #92-Ubuntu
Apr 22 19:10:03 wopr kernel: Hardware name: Supermicro SSG-6038R-E1CR16L/X10DRH-iT, BIOS 2.0 12/17/2015
Apr 22 19:10:03 wopr kernel: Call Trace:
Apr 22 19:10:03 wopr kernel:  dump_stack+0x6d/0x8e
Apr 22 19:10:03 wopr kernel:  spl_dumpstack+0x42/0x50 [spl]
Apr 22 19:10:03 wopr kernel:  spl_panic+0xc8/0x110 [spl]
Apr 22 19:10:03 wopr kernel:  ? __switch_to_asm+0x41/0x70
Apr 22 19:10:03 wopr kernel:  ? abd_iter_map+0xa/0x90 [zfs]
Apr 22 19:10:03 wopr kernel:  ? dbuf_dirty+0x43d/0x850 [zfs]
Apr 22 19:10:03 wopr kernel:  ? getrawmonotonic64+0x43/0xd0
Apr 22 19:10:03 wopr kernel:  ? getrawmonotonic64+0x43/0xd0
Apr 22 19:10:03 wopr kernel:  ? dmu_zfetch+0x49a/0x500 [zfs]
Apr 22 19:10:03 wopr kernel:  ? getrawmonotonic64+0x43/0xd0
Apr 22 19:10:03 wopr kernel:  ? dmu_zfetch+0x49a/0x500 [zfs]
Apr 22 19:10:03 wopr kernel:  ? mutex_lock+0x12/0x40
Apr 22 19:10:03 wopr kernel:  ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs]
Apr 22 19:10:03 wopr kernel:  range_tree_find_impl+0x88/0x90 [zfs]
Apr 22 19:10:03 wopr kernel:  ? spl_kmem_zalloc+0xdc/0x1a0 [spl]
Apr 22 19:10:03 wopr kernel:  range_tree_clear+0x4f/0x60 [zfs]
Apr 22 19:10:03 wopr kernel:  dnode_free_range+0x11f/0x5a0 [zfs]
Apr 22 19:10:03 wopr kernel:  dmu_object_free+0x53/0x90 [zfs]
Apr 22 19:10:03 wopr kernel:  dmu_free_long_object+0x9f/0xc0 [zfs]
Apr 22 19:10:03 wopr kernel:  receive_freeobjects.isra.12+0x7a/0x100 [zfs]
Apr 22 19:10:03 wopr kernel:  receive_writer_thread+0x6d2/0xa60 [zfs]
Apr 22 19:10:03 wopr kernel:  ? set_curr_task_fair+0x2b/0x60
Apr 22 19:10:03 wopr kernel:  ? spl_kmem_free+0x33/0x40 [spl]
Apr 22 19:10:03 wopr kernel:  ? kfree+0x165/0x180
Apr 22 19:10:03 wopr kernel:  ? receive_free.isra.13+0xc0/0xc0 [zfs]
Apr 22 19:10:03 wopr kernel:  thread_generic_wrapper+0x74/0x90 [spl]
Apr 22 19:10:03 wopr kernel:  kthread+0x121/0x140
Apr 22 19:10:03 wopr kernel:  ? __thread_exit+0x20/0x20 [spl]
Apr 22 19:10:03 wopr kernel:  ? kthread_create_worker_on_cpu+0x70/0x70
Apr 22 19:10:03 wopr kernel:  ret_from_fork+0x35/0x40
Apr 22 19:12:56 wopr kernel: INFO: task txg_quiesce:2265 blocked for more than 120 seconds.
Apr 22 19:12:56 wopr kernel:       Tainted: P           O     4.15.0-91-generic #92-Ubuntu
Apr 22 19:12:56 wopr kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 22 19:12:56 wopr kernel: txg_quiesce     D    0  2265      2 0x80000000
Apr 22 19:12:56 wopr kernel: Call Trace:
Apr 22 19:12:56 wopr kernel:  __schedule+0x24e/0x880
Apr 22 19:12:56 wopr kernel:  schedule+0x2c/0x80
Apr 22 19:12:56 wopr kernel:  cv_wait_common+0x11e/0x140 [spl]
Apr 22 19:12:56 wopr kernel:  ? wait_woken+0x80/0x80
Apr 22 19:12:56 wopr kernel:  __cv_wait+0x15/0x20 [spl]
Apr 22 19:12:56 wopr kernel:  txg_quiesce_thread+0x2cb/0x3d0 [zfs]
Apr 22 19:12:56 wopr kernel:  ? txg_delay+0x1b0/0x1b0 [zfs]
Apr 22 19:12:56 wopr kernel:  thread_generic_wrapper+0x74/0x90 [spl]
Apr 22 19:12:56 wopr kernel:  kthread+0x121/0x140
Apr 22 19:12:56 wopr kernel:  ? __thread_exit+0x20/0x20 [spl]
Apr 22 19:12:56 wopr kernel:  ? kthread_create_worker_on_cpu+0x70/0x70
Apr 22 19:12:56 wopr kernel:  ret_from_fork+0x35/0x40
Apr 22 19:12:56 wopr kernel: INFO: task zfs:12482 blocked for more than 120 seconds.
Apr 22 19:12:56 wopr kernel:       Tainted: P           O     4.15.0-91-generic #92-Ubuntu
Apr 22 19:12:56 wopr kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 22 19:12:56 wopr kernel: zfs             D    0 12482  12479 0x80000080
Apr 22 19:12:56 wopr kernel: Call Trace:
Apr 22 19:12:56 wopr kernel:  __schedule+0x24e/0x880
Apr 22 19:12:56 wopr kernel:  schedule+0x2c/0x80
Apr 22 19:12:56 wopr kernel:  cv_wait_common+0x11e/0x140 [spl]
Apr 22 19:12:56 wopr kernel:  ? wait_woken+0x80/0x80
Apr 22 19:12:56 wopr kernel:  __cv_wait+0x15/0x20 [spl]
Apr 22 19:12:56 wopr kernel:  dmu_recv_stream+0xa51/0xef0 [zfs]
Apr 22 19:12:56 wopr kernel:  zfs_ioc_recv_impl+0x306/0x1100 [zfs]
Apr 22 19:12:56 wopr kernel:  ? dbuf_rele+0x36/0x40 [zfs]
Apr 22 19:12:56 wopr kernel:  zfs_ioc_recv_new+0x33d/0x410 [zfs]
Apr 22 19:12:56 wopr kernel:  ? spl_kmem_alloc_impl+0xe5/0x1a0 [spl]
Apr 22 19:12:56 wopr kernel:  ? spl_vmem_alloc+0x19/0x20 [spl]
Apr 22 19:12:56 wopr kernel:  ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
Apr 22 19:12:56 wopr kernel:  ? nv_mem_zalloc.isra.0+0x2e/0x40 [znvpair]
Apr 22 19:12:56 wopr kernel:  ? nvlist_xalloc.part.2+0x50/0xb0 [znvpair]
Apr 22 19:12:56 wopr kernel:  zfsdev_ioctl+0x451/0x610 [zfs]
Apr 22 19:12:56 wopr kernel:  do_vfs_ioctl+0xa8/0x630
Apr 22 19:12:56 wopr kernel:  ? __audit_syscall_entry+0xbc/0x110
Apr 22 19:12:56 wopr kernel:  ? syscall_trace_enter+0x1da/0x2d0
Apr 22 19:12:56 wopr kernel:  SyS_ioctl+0x79/0x90
Apr 22 19:12:56 wopr kernel:  do_syscall_64+0x73/0x130
Apr 22 19:12:56 wopr kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Apr 22 19:12:56 wopr kernel: RIP: 0033:0x7f3c5a2d55d7
Apr 22 19:12:56 wopr kernel: RSP: 002b:00007ffcf28d05d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Apr 22 19:12:56 wopr kernel: RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00007f3c5a2d55d7
Apr 22 19:12:56 wopr kernel: RDX: 00007ffcf28d05f0 RSI: 0000000000005a46 RDI: 0000000000000006
Apr 22 19:12:56 wopr kernel: RBP: 00007ffcf28d05f0 R08: 00007f3c5a5aae20 R09: 0000000000000000
Apr 22 19:12:56 wopr kernel: R10: 000055c7fedf4010 R11: 0000000000000246 R12: 00007ffcf28d3c20
Apr 22 19:12:56 wopr kernel: R13: 0000000000000006 R14: 000055c7fedfbf10 R15: 000000000000000c
Apr 22 19:12:56 wopr kernel: INFO: task receive_writer:12577 blocked for more than 120 seconds.
Apr 22 19:12:56 wopr kernel:       Tainted: P           O     4.15.0-91-generic #92-Ubuntu
Apr 22 19:12:56 wopr kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 22 19:12:56 wopr kernel: receive_writer  D    0 12577      2 0x80000080
Apr 22 19:12:56 wopr kernel: Call Trace:
Apr 22 19:12:56 wopr kernel:  __schedule+0x24e/0x880
Apr 22 19:12:56 wopr kernel:  schedule+0x2c/0x80
Apr 22 19:12:56 wopr kernel:  spl_panic+0xfa/0x110 [spl]
Apr 22 19:12:56 wopr kernel:  ? abd_iter_map+0xa/0x90 [zfs]
Apr 22 19:12:56 wopr kernel:  ? dbuf_dirty+0x43d/0x850 [zfs]
Apr 22 19:12:56 wopr kernel:  ? getrawmonotonic64+0x43/0xd0
Apr 22 19:12:56 wopr kernel:  ? getrawmonotonic64+0x43/0xd0
Apr 22 19:12:56 wopr kernel:  ? dmu_zfetch+0x49a/0x500 [zfs]
Apr 22 19:12:56 wopr kernel:  ? getrawmonotonic64+0x43/0xd0
Apr 22 19:12:56 wopr kernel:  ? dmu_zfetch+0x49a/0x500 [zfs]
Apr 22 19:12:56 wopr kernel:  ? mutex_lock+0x12/0x40
Apr 22 19:12:56 wopr kernel:  ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs]
Apr 22 19:12:56 wopr kernel:  range_tree_find_impl+0x88/0x90 [zfs]
Apr 22 19:12:56 wopr kernel:  ? spl_kmem_zalloc+0xdc/0x1a0 [spl]
Apr 22 19:12:56 wopr kernel:  range_tree_clear+0x4f/0x60 [zfs]
Apr 22 19:12:56 wopr kernel:  dnode_free_range+0x11f/0x5a0 [zfs]
Apr 22 19:12:56 wopr kernel:  dmu_object_free+0x53/0x90 [zfs]
Apr 22 19:12:56 wopr kernel:  dmu_free_long_object+0x9f/0xc0 [zfs]
Apr 22 19:12:56 wopr kernel:  receive_freeobjects.isra.12+0x7a/0x100 [zfs]
Apr 22 19:12:56 wopr kernel:  receive_writer_thread+0x6d2/0xa60 [zfs]
Apr 22 19:12:56 wopr kernel:  ? set_curr_task_fair+0x2b/0x60
Apr 22 19:12:56 wopr kernel:  ? spl_kmem_free+0x33/0x40 [spl]
Apr 22 19:12:56 wopr kernel:  ? kfree+0x165/0x180
Apr 22 19:12:56 wopr kernel:  ? receive_free.isra.13+0xc0/0xc0 [zfs]
Apr 22 19:12:56 wopr kernel:  thread_generic_wrapper+0x74/0x90 [spl]
Apr 22 19:12:56 wopr kernel:  kthread+0x121/0x140
Apr 22 19:12:56 wopr kernel:  ? __thread_exit+0x20/0x20 [spl]
Apr 22 19:12:56 wopr kernel:  ? kthread_create_worker_on_cpu+0x70/0x70
Apr 22 19:12:56 wopr kernel:  ret_from_fork+0x35/0x40

And the syncoid output:

# SSH_AUTH_SOCK=/tmp/ssh-EAphgaS9vNJE/agent.3449 SSH_AGENT_PID=3484 syncoid --recursive --skip-parent --create-bookmark --recvoptions="u" rpool syncoid@wopr:srv/backups/millbarge/rpool
Sending incremental rpool/ROOT@autosnap_2020-01-28_21:00:02_hourly ... syncoid_millbarge_2020-04-23:01:55:31 (~ 26 KB):
27.7KiB 0:00:00 [ 220KiB/s] [===============================================================================================================================================] 103%            
Resuming interrupted zfs send/receive from rpool/ROOT/ubuntu to srv/backups/millbarge/rpool/ROOT/ubuntu (~ UNKNOWN remaining):
cannot resume send: 'rpool/ROOT/ubuntu@autosnap_2020-01-25_00:00:01_daily' used in the initial send no longer exists
cannot receive: failed to read from stream
WARN: resetting partially receive state because the snapshot source no longer exists
Sending incremental rpool/ROOT/ubuntu@before_backup ... syncoid_millbarge_2020-04-23:01:55:52 (~ 28.6 GB):
cannot restore to srv/backups/millbarge/rpool/ROOT/ubuntu@autosnap_2020-02-01_00:00:01_monthly: destination already exists                                                   ]  3% ETA 0:04:25
mbuffer: error: outputThread: error writing to <stdout> at offset 0x3b9e0000: Broken pipe                                                                                    ]  4% ETA 0:04:43
mbuffer: warning: error during output to <stdout>: Broken pipe
1.16GiB 0:00:12 [94.3MiB/s] [====>                                                                                                                                           ]  4%            
CRITICAL ERROR:  zfs send  -I 'rpool/ROOT/ubuntu'@'before_backup' 'rpool/ROOT/ubuntu'@'syncoid_millbarge_2020-04-23:01:55:52' | pv -s 30731359288 | lzop  | mbuffer  -q -s 128k -m 16M 2>/dev/null | ssh     -S /tmp/syncoid-syncoid-syncoid@wopr-1587606930 syncoid@wopr ' mbuffer  -q -s 128k -m 16M 2>/dev/null | lzop -dfc | sudo zfs receive -u  -s -F '"'"'srv/backups/millbarge/rpool/ROOT/ubuntu'"'"' 2>&1' failed: 256 at /usr/sbin/syncoid line 786.
Sending incremental rpool/home@autosnap_2020-01-28_21:00:02_hourly ... syncoid_millbarge_2020-04-23:01:56:19 (~ 56 KB):
36.1KiB 0:00:00 [ 294KiB/s] [=========================================================================================>                                                      ] 63%            
Sending incremental rpool/home/root#'autosnap_2020-01-28_21:30:02_frequently' ... autosnap_2020-02-01_00:00:01_monthly (~ UNKNOWN):
 677KiB 0:00:00 [43.1MiB/s] [   <=>                                                                                                                                                          ]
Sending incremental rpool/home/root@autosnap_2020-02-01_00:00:01_monthly ... syncoid_millbarge_2020-04-23:01:56:28 (~ 844.4 MB):
 850MiB 0:00:11 [75.7MiB/s] [==============================================================================================================================================>] 100%            
Sending incremental rpool/home/sarnold#'autosnap_2020-01-28_21:30:02_frequently' ... autosnap_2020-02-01_00:00:01_monthly (~ UNKNOWN):
2.83GiB 0:00:29 [96.8MiB/s] [                                                                                              <=>                                                               ]
Sending incremental rpool/home/sarnold@autosnap_2020-02-01_00:00:01_monthly ... syncoid_millbarge_2020-04-23:01:56:56 (~ 45.7 GB):
49.5GiB 0:09:33 [88.3MiB/s] [===============================================================================================================================================] 108%            
Sending incremental rpool/swap@autosnap_2020-01-28_21:00:02_hourly ... syncoid_millbarge_2020-04-23:02:07:05 (~ 184.9 MB):
 193MiB 0:00:00 [ 282MiB/s] [===============================================================================================================================================] 104%            
Sending incremental rpool/tmp@autosnap_2020-01-28_21:00:02_hourly ... syncoid_millbarge_2020-04-23:02:07:24 (~ 13.6 MB):
14.1MiB 0:00:00 [40.0MiB/s] [===============================================================================================================================================] 103%            
Sending incremental rpool/usr#'autosnap_2020-01-28_21:30:02_frequently' ... autosnap_2020-02-01_00:00:01_monthly (~ UNKNOWN):
 624 B 0:00:00 [90.9KiB/s] [   <=>                                                                                                                                                           ]
Sending incremental rpool/usr@autosnap_2020-02-01_00:00:01_monthly ... syncoid_millbarge_2020-04-23:02:07:39 (~ 49 KB):
50.9KiB 0:00:00 [ 190KiB/s] [===============================================================================================================================================] 101%            
Sending incremental rpool/usr/local#'autosnap_2020-01-28_21:30:02_frequently' ... autosnap_2020-02-01_00:00:01_monthly (~ UNKNOWN):
6.79MiB 0:00:00 [ 241MiB/s] [   <=>                                                                                                                                                          ]
Sending incremental rpool/usr/local@autosnap_2020-02-01_00:00:01_monthly ... syncoid_millbarge_2020-04-23:02:07:58 (~ 2.0 MB):
1.60MiB 0:00:00 [4.69MiB/s] [================================================================================================================>                               ] 79%            
Sending incremental rpool/var@autosnap_2020-01-28_21:00:02_hourly ... syncoid_millbarge_2020-04-23:02:08:20 (~ 26 KB):
27.7KiB 0:00:00 [ 171KiB/s] [===============================================================================================================================================] 103%            
Sending incremental rpool/var/cache@autosnap_2020-01-28_21:00:02_hourly ... syncoid_millbarge_2020-04-23:02:08:33 (~ 1013.1 MB):
1013MiB 0:00:18 [55.4MiB/s] [==============================================================================================================================================>] 100%            
Sending incremental rpool/var/lib@autosnap_2020-01-28_21:00:02_hourly ... syncoid_millbarge_2020-04-23:02:08:54 (~ 26 KB):
27.7KiB 0:00:00 [ 125KiB/s] [===============================================================================================================================================] 103%            
Sending incremental rpool/var/lib/AccountsService@autosnap_2020-01-28_21:00:02_hourly ... syncoid_millbarge_2020-04-23:02:09:06 (~ 56 KB):
36.1KiB 0:00:00 [ 232KiB/s] [=========================================================================================>                                                      ] 63%            
Sending incremental rpool/var/lib/docker@autosnap_2020-01-28_21:00:02_hourly ... syncoid_millbarge_2020-04-23:02:09:18 (~ 54 KB):
34.9KiB 0:00:00 [ 230KiB/s] [===========================================================================================>                                                    ] 64%            
INFO: Sending oldest full snapshot rpool/var/lib/lxd@syncoid_millbarge_2020-04-23:02:09:30 (~ 42 KB) to new target filesystem:
45.1KiB 0:00:00 [3.96MiB/s] [===============================================================================================================================================] 105%            
INFO: Sending oldest full snapshot rpool/var/lib/lxd/containers@syncoid_millbarge_2020-04-23:02:09:30 (~ 42 KB) to new target filesystem:
45.1KiB 0:00:00 [5.74MiB/s] [===============================================================================================================================================] 105%            
INFO: Sending oldest full snapshot rpool/var/lib/lxd/custom@syncoid_millbarge_2020-04-23:02:09:31 (~ 42 KB) to new target filesystem:
45.1KiB 0:00:00 [3.95MiB/s] [===============================================================================================================================================] 105%            
INFO: Sending oldest full snapshot rpool/var/lib/lxd/deleted@syncoid_millbarge_2020-04-23:02:09:31 (~ 42 KB) to new target filesystem:
45.1KiB 0:00:00 [4.32MiB/s] [===============================================================================================================================================] 105%            
INFO: Sending oldest full snapshot rpool/var/lib/lxd/deleted/containers@syncoid_millbarge_2020-04-23:02:09:31 (~ 42 KB) to new target filesystem:
45.1KiB 0:00:00 [4.37MiB/s] [===============================================================================================================================================] 105%            
INFO: Sending oldest full snapshot rpool/var/lib/lxd/deleted/custom@syncoid_millbarge_2020-04-23:02:09:32 (~ 42 KB) to new target filesystem:
45.1KiB 0:00:00 [4.86MiB/s] [===============================================================================================================================================] 105%            
INFO: Sending oldest full snapshot rpool/var/lib/lxd/deleted/images@syncoid_millbarge_2020-04-23:02:09:32 (~ 42 KB) to new target filesystem:
45.1KiB 0:00:00 [5.40MiB/s] [===============================================================================================================================================] 105%            
INFO: Sending oldest full snapshot rpool/var/lib/lxd/deleted/virtual-machines@syncoid_millbarge_2020-04-23:02:09:33 (~ 42 KB) to new target filesystem:
45.1KiB 0:00:00 [5.02MiB/s] [===============================================================================================================================================] 105%            
INFO: Sending oldest full snapshot rpool/var/lib/lxd/images@syncoid_millbarge_2020-04-23:02:09:33 (~ 42 KB) to new target filesystem:
45.1KiB 0:00:00 [5.09MiB/s] [===============================================================================================================================================] 105%            
INFO: Sending oldest full snapshot rpool/var/lib/lxd/virtual-machines@syncoid_millbarge_2020-04-23:02:09:33 (~ 42 KB) to new target filesystem:
45.1KiB 0:00:00 [5.25MiB/s] [===============================================================================================================================================] 105%            
Sending incremental rpool/var/lib/nfs@autosnap_2020-01-28_21:00:02_hourly ... syncoid_millbarge_2020-04-23:02:09:34 (~ 54 KB):
34.9KiB 0:00:00 [ 236KiB/s] [===========================================================================================>                                                    ] 64%            
Sending incremental rpool/var/lib/schroot@autosnap_2020-01-28_21:00:02_hourly ... syncoid_millbarge_2020-04-23:02:09:46 (~ 394.5 MB):
 402MiB 0:00:11 [34.8MiB/s] [===============================================================================================================================================] 101%            
INFO: Sending oldest full snapshot rpool/var/lib/schroot/chroots@syncoid_millbarge_2020-04-23:02:10:02 (~ 42 KB) to new target filesystem:
45.1KiB 0:00:00 [4.89MiB/s] [===============================================================================================================================================] 105%            
Resuming interrupted zfs send/receive from rpool/var/log to srv/backups/millbarge/rpool/var/log (~ 57.9 MB remaining):
58.2MiB 0:00:00 [ 318MiB/s] [==============================================================================================================================================>] 100%

Bug Watch Updater (bug-watch-updater) on 2020-04-23

Changed in linux:
status:	Unknown → New

Revision history for this message

Colin Ian King (colin-king) wrote on 2020-05-11:

#15

i've uploaded a debug zfs-dkms package to https://launchpad.net/~colin-king/+archive/ubuntu/zfs-sru-1861235 for testing. This will dump out internal state of the driver to get a better idea of what is happening during this crash.

Please can you test this by doing the following:

sudo add-apt-repository ppa:colin-king/zfs-sru-1861235
sudo apt-get update
sudo apt-get install zfs-dkms

let this debug dkms zfs module build and reboot. when the failure occurs again please supply as much of the dmesg output as possible

Thank you

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-05-12:

#16

dmesg-0.7.5-1ubuntu16.10~lp1861235.1 Edit (114.8 KiB, text/plain)

Download full text (10.0 KiB)

Here's the part that looks important; the whole dmesg is in the attachment.

Thanks

Here's the part that looks important; the whole dmesg is in the attachment.

Thanks

[  761.730488] dnone_free_ramge: nblks = 0, trunc = 1, len = 18446744073709551615, blkshift = 0
[  761.730542] dnode_free_range: nblks == 0, len == 18446744073709551615
               , off=0
[  761.730543] range_tree_clear: size == 0
[  761.730605] VERIFY(size != 0) failed
[  761.730625] PANIC at range_tree.c:307:range_tree_find_impl()
[  761.730653] Showing stack for process 23913
[  761.730656] CPU: 31 PID: 23913 Comm: receive_writer Tainted: P           OE    4.15.0-99-generic #100-Ubuntu
[  761.730657] Hardware name: Supermicro SSG-6038R-E1CR16L/X10DRH-iT, BIOS 2.0 12/17/2015
[  761.730658] Call Trace:
[  761.730668]  dump_stack+0x6d/0x8e
[  761.730679]  spl_dumpstack+0x42/0x50 [spl]
[  761.730684]  spl_panic+0xc8/0x110 [spl]
[  761.730690]  ? fbcon_cursor+0x13e/0x1c0
[  761.730695]  ? atomic_notifier_call_chain+0x1a/0x20
[  761.730700]  ? vt_console_print+0x240/0x410
[  761.730705]  ? up+0x32/0x50
[  761.730708]  ? console_unlock+0x2a1/0x560
[  761.730767]  range_tree_find_impl+0x88/0x90 [zfs]
[  761.730769]  ? printk+0x52/0x6e
[  761.730810]  range_tree_clear+0x54/0x70 [zfs]
[  761.730843]  dnode_free_range+0x15b/0x720 [zfs]
[  761.730874]  dmu_object_free+0x75/0xc0 [zfs]
[  761.730904]  dmu_free_long_object+0x9f/0xc0 [zfs]
[  761.730934]  receive_freeobjects.isra.12+0x7a/0x100 [zfs]
[  761.730963]  receive_writer_thread+0x6d2/0xa60 [zfs]
[  761.730965]  ? set_curr_task_fair+0x2b/0x60
[  761.730970]  ? spl_kmem_free+0x33/0x40 [spl]
[  761.730973]  ? kfree+0x162/0x180
[  761.731001]  ? receive_free.isra.13+0xc0/0xc0 [zfs]
[  761.731006]  thread_generic_wrapper+0x74/0x90 [spl]
[  761.731008]  kthread+0x121/0x140
[  761.731012]  ? __thread_exit+0x20/0x20 [spl]
[  761.731014]  ? kthread_create_worker_on_cpu+0x70/0x70
[  761.731018]  ret_from_fork+0x35/0x40
[  967.430681] INFO: task txg_quiesce:4049 blocked for more than 120 seconds.
[  967.430739]       Tainted: P           OE    4.15.0-99-generic #100-Ubuntu
[  967.430785] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  967.430836] txg_quiesce     D    0  4049      2 0x80000000
[  967.430840] Call Trace:
[  967.430853]  __schedule+0x24e/0x880
[  967.430858]  schedule+0x2c/0x80
[  967.430875]  cv_wait_common+0x11e/0x140 [spl]
[  967.430885]  ? wait_woken+0x80/0x80
[  967.430893]  __cv_wait+0x15/0x20 [spl]
[  967.431010]  txg_quiesce_thread+0x2cb/0x3d0 [zfs]
[  967.431078]  ? txg_delay+0x1b0/0x1b0 [zfs]
[  967.431088]  thread_generic_wrapper+0x74/0x90 [spl]
[  967.431093]  kthread+0x121/0x140
[  967.431099]  ? __thread_exit+0x20/0x20 [spl]
[  967.431102]  ? kthread_create_worker_on_cpu+0x70/0x70
[  967.431107]  ret_from_fork+0x35/0x40
[  967.431140] INFO: task zfs:23834 blocked for more than 120 seconds.
[  967.431184]       Tainted: P           OE    4.15.0-99-generic #100-Ubuntu
[  967.431229] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  967.431279] zfs             D    0 23834  23831 0x80000080
[  967.431282] Call Trace:
[  967.431286]  __schedule+0x24e/0x880
[  967.431290]  schedule+0x2c/0x80
[  967.431298]  cv_wait_common+0x11e/0x140 [spl]
[  967.431303]  ? wait_woken+0x80/0x80
[  967.431310]  __cv_wait+0x15/0x20 [spl]
[  967.431367]  dmu_recv_stream+0xa51/0xef0 [zfs]
[  967.431438]  zfs_ioc_recv_impl+0x306/0x1100 [zfs]
[  967.431487]  ? dbuf_rele+0x36/0x40 [zfs]
[  967.431555]  zfs_ioc_recv_new+0x33d/0x410 [zfs]
[  967.431564]  ? spl_kmem_alloc_impl+0xe5/0x1a0 [spl]
[  967.431571]  ? spl_vmem_alloc+0x19/0x20 [spl]
[  967.431584]  ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
[  967.431590]  ? nv_mem_zalloc.isra.0+0x2e/0x40 [znvpair]
[  967.431596]  ? nvlist_xalloc.part.2+0x50/0xb0 [znvpair]
[  967.431660]  zfsdev_ioctl+0x451/0x610 [zfs]
[  967.431672]  do_vfs_ioctl+0xa8/0x630
[  967.431677]  ? __audit_syscall_entry+0xbc/0x110
[  967.431682]  ? syscall_trace_enter+0x1da/0x2d0
[  967.431685]  SyS_ioctl+0x79/0x90
[  967.431688]  do_syscall_64+0x73/0x130
[  967.431693]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  967.431696] RIP: 0033:0x7f4c273385d7
[  967.431698] RSP: 002b:00007ffed780e068 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  967.431701] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00007f4c273385d7
[  967.431702] RDX: 00007ffed780e080 RSI: 0000000000005a46 RDI: 0000000000000006
[  967.431704] RBP: 00007ffed780e080 R08: 00007f4c2760de20 R09: 0000000000000000
[  967.431705] R10: 000055dc5929b010 R11: 0000000000000246 R12: 00007ffed78116b0
[  967.431707] R13: 0000000000000006 R14: 000055dc592a2f10 R15: 000000000000000c
[  967.431722] INFO: task receive_writer:23913 blocked for more than 120 seconds.
[  967.431771]       Tainted: P           OE    4.15.0-99-generic #100-Ubuntu
[  967.431816] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  967.431867] receive_writer  D    0 23913      2 0x80000080
[  967.431870] Call Trace:
[  967.431874]  __schedule+0x24e/0x880
[  967.431878]  schedule+0x2c/0x80
[  967.431886]  spl_panic+0xfa/0x110 [spl]
[  967.431892]  ? atomic_notifier_call_chain+0x1a/0x20
[  967.431902]  ? vt_console_print+0x240/0x410
[  967.431909]  ? up+0x32/0x50
[  967.431913]  ? console_unlock+0x2a1/0x560
[  967.431980]  range_tree_find_impl+0x88/0x90 [zfs]
[  967.431985]  ? printk+0x52/0x6e
[  967.432047]  range_tree_clear+0x54/0x70 [zfs]
[  967.432102]  dnode_free_range+0x15b/0x720 [zfs]
[  967.432152]  dmu_object_free+0x75/0xc0 [zfs]
[  967.432200]  dmu_free_long_object+0x9f/0xc0 [zfs]
[  967.432250]  receive_freeobjects.isra.12+0x7a/0x100 [zfs]
[  967.432300]  receive_writer_thread+0x6d2/0xa60 [zfs]
[  967.432306]  ? set_curr_task_fair+0x2b/0x60
[  967.432312]  ? spl_kmem_free+0x33/0x40 [spl]
[  967.432320]  ? kfree+0x162/0x180
[  967.432366]  ? receive_free.isra.13+0xc0/0xc0 [zfs]
[  967.432375]  thread_generic_wrapper+0x74/0x90 [spl]
[  967.432378]  kthread+0x121/0x140
[  967.432384]  ? __thread_exit+0x20/0x20 [spl]
[  967.432387]  ? kthread_create_worker_on_cpu+0x70/0x70
[  967.432391]  ret_from_fork+0x35/0x40
[ 1088.260832] INFO: task txg_quiesce:4049 blocked for more than 120 seconds.
[ 1088.260885]       Tainted: P           OE    4.15.0-99-generic #100-Ubuntu
[ 1088.260931] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1088.260982] txg_quiesce     D    0  4049      2 0x80000000
[ 1088.260985] Call Trace:
[ 1088.260992]  __schedule+0x24e/0x880
[ 1088.260996]  schedule+0x2c/0x80
[ 1088.261007]  cv_wait_common+0x11e/0x140 [spl]
[ 1088.261012]  ? wait_woken+0x80/0x80
[ 1088.261020]  __cv_wait+0x15/0x20 [spl]
[ 1088.261095]  txg_quiesce_thread+0x2cb/0x3d0 [zfs]
[ 1088.261163]  ? txg_delay+0x1b0/0x1b0 [zfs]
[ 1088.261171]  thread_generic_wrapper+0x74/0x90 [spl]
[ 1088.261174]  kthread+0x121/0x140
[ 1088.261180]  ? __thread_exit+0x20/0x20 [spl]
[ 1088.261182]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 1088.261187]  ret_from_fork+0x35/0x40
[ 1088.261211] INFO: task zfs:23834 blocked for more than 120 seconds.
[ 1088.261255]       Tainted: P           OE    4.15.0-99-generic #100-Ubuntu
[ 1088.261300] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1088.261350] zfs             D    0 23834  23831 0x80000080
[ 1088.261353] Call Trace:
[ 1088.261357]  __schedule+0x24e/0x880
[ 1088.261361]  schedule+0x2c/0x80
[ 1088.261370]  cv_wait_common+0x11e/0x140 [spl]
[ 1088.261374]  ? wait_woken+0x80/0x80
[ 1088.261381]  __cv_wait+0x15/0x20 [spl]
[ 1088.261437]  dmu_recv_stream+0xa51/0xef0 [zfs]
[ 1088.261511]  zfs_ioc_recv_impl+0x306/0x1100 [zfs]
[ 1088.261558]  ? dbuf_rele+0x36/0x40 [zfs]
[ 1088.261627]  zfs_ioc_recv_new+0x33d/0x410 [zfs]
[ 1088.261636]  ? spl_kmem_alloc_impl+0xe5/0x1a0 [spl]
[ 1088.261643]  ? spl_vmem_alloc+0x19/0x20 [spl]
[ 1088.261651]  ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
[ 1088.261657]  ? nv_mem_zalloc.isra.0+0x2e/0x40 [znvpair]
[ 1088.261662]  ? nvlist_xalloc.part.2+0x50/0xb0 [znvpair]
[ 1088.261726]  zfsdev_ioctl+0x451/0x610 [zfs]
[ 1088.261732]  do_vfs_ioctl+0xa8/0x630
[ 1088.261737]  ? __audit_syscall_entry+0xbc/0x110
[ 1088.261741]  ? syscall_trace_enter+0x1da/0x2d0
[ 1088.261744]  SyS_ioctl+0x79/0x90
[ 1088.261747]  do_syscall_64+0x73/0x130
[ 1088.261752]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 1088.261755] RIP: 0033:0x7f4c273385d7
[ 1088.261757] RSP: 002b:00007ffed780e068 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1088.261759] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00007f4c273385d7
[ 1088.261761] RDX: 00007ffed780e080 RSI: 0000000000005a46 RDI: 0000000000000006
[ 1088.261762] RBP: 00007ffed780e080 R08: 00007f4c2760de20 R09: 0000000000000000
[ 1088.261764] R10: 000055dc5929b010 R11: 0000000000000246 R12: 00007ffed78116b0
[ 1088.261765] R13: 0000000000000006 R14: 000055dc592a2f10 R15: 000000000000000c
[ 1088.261776] INFO: task receive_writer:23913 blocked for more than 120 seconds.
[ 1088.261825]       Tainted: P           OE    4.15.0-99-generic #100-Ubuntu
[ 1088.261870] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1088.261921] receive_writer  D    0 23913      2 0x80000080
[ 1088.261924] Call Trace:
[ 1088.261928]  __schedule+0x24e/0x880
[ 1088.261931]  schedule+0x2c/0x80
[ 1088.261940]  spl_panic+0xfa/0x110 [spl]
[ 1088.261944]  ? atomic_notifier_call_chain+0x1a/0x20
[ 1088.261954]  ? vt_console_print+0x240/0x410
[ 1088.261958]  ? up+0x32/0x50
[ 1088.261961]  ? console_unlock+0x2a1/0x560
[ 1088.262031]  range_tree_find_impl+0x88/0x90 [zfs]
[ 1088.262036]  ? printk+0x52/0x6e
[ 1088.262099]  range_tree_clear+0x54/0x70 [zfs]
[ 1088.262153]  dnode_free_range+0x15b/0x720 [zfs]
[ 1088.262202]  dmu_object_free+0x75/0xc0 [zfs]
[ 1088.262250]  dmu_free_long_object+0x9f/0xc0 [zfs]
[ 1088.262300]  receive_freeobjects.isra.12+0x7a/0x100 [zfs]
[ 1088.262350]  receive_writer_thread+0x6d2/0xa60 [zfs]
[ 1088.262354]  ? set_curr_task_fair+0x2b/0x60
[ 1088.262361]  ? spl_kmem_free+0x33/0x40 [spl]
[ 1088.262367]  ? kfree+0x162/0x180
[ 1088.262413]  ? receive_free.isra.13+0xc0/0xc0 [zfs]
[ 1088.262420]  thread_generic_wrapper+0x74/0x90 [spl]
[ 1088.262425]  kthread+0x121/0x140
[ 1088.262430]  ? __thread_exit+0x20/0x20 [spl]
[ 1088.262433]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 1088.262437]  ret_from_fork+0x35/0x40

Revision history for this message

Colin Ian King (colin-king) wrote on 2020-05-12:

#17

Thanks for the data.

I've added a couple more lines of debug now to figure out some earlier missing information once we pop the stack. I've updated the package in the PPA, do you mind updating the zfs-dkms to 0.7.5-1ubuntu16.10~lp1861235.2 and re-testing?

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-05-12:

#18

dmesg-0.7.5-1ubuntu16.10~lp1861235.2 Edit (106.6 KiB, text/plain)

Thanks Colin!

[ 271.628232] dnone_free_ramge: nblks = 0, trunc = 1, len = 18446744073709551615, blkshift = 0
[ 271.628297] dnode_free_range: nblks == 0, len == 18446744073709551615
, off=0
[ 271.628298] range_tree_clear: size == 0
[ 271.628375] range_tree_find_impl: size == 0
[ 271.628393] VERIFY(size != 0) failed
[ 271.628409] PANIC at range_tree.c:168:range_tree_add()
[ 271.628429] Showing stack for process 7960
[ 271.628432] CPU: 26 PID: 7960 Comm: receive_writer Tainted: P OE 4.15.0-99-generic #100-Ubuntu
[ 271.628433] Hardware name: Supermicro SSG-6038R-E1CR16L/X10DRH-iT, BIOS 2.0 12/17/2015
[ 271.628433] Call Trace:
[ 271.628442] dump_stack+0x6d/0x8e
[ 271.628458] spl_dumpstack+0x42/0x50 [spl]
[ 271.628462] spl_panic+0xc8/0x110 [spl]
[ 271.628468] ? vt_console_print+0x240/0x410
[ 271.628473] ? console_unlock+0x2a1/0x560
[ 271.628475] ? vprintk_emit+0x104/0x2c0
[ 271.628477] ? vprintk_default+0x29/0x50
[ 271.628479] ? printk+0x52/0x6e
[ 271.628566] range_tree_add+0x29b/0x300 [zfs]
[ 271.628600] ? range_tree_clear+0xa4/0xb0 [zfs]
[ 271.628602] ? printk+0x52/0x6e
[ 271.628635] dnode_free_range+0x178/0x720 [zfs]
[ 271.628673] dmu_object_free+0x6d/0xc0 [zfs]
[ 271.628698] dmu_free_long_object+0x9f/0xc0 [zfs]
[ 271.628723] receive_freeobjects.isra.12+0x7a/0x100 [zfs]
[ 271.628747] receive_writer_thread+0x6d2/0xa60 [zfs]
[ 271.628750] ? set_curr_task_fair+0x2b/0x60
[ 271.628753] ? spl_kmem_free+0x33/0x40 [spl]
[ 271.628757] ? kfree+0x162/0x180
[ 271.628781] ? receive_free.isra.13+0xc0/0xc0 [zfs]
[ 271.628785] thread_generic_wrapper+0x74/0x90 [spl]
[ 271.628788] kthread+0x121/0x140
[ 271.628791] ? __thread_exit+0x20/0x20 [spl]
[ 271.628792] ? kthread_create_worker_on_cpu+0x70/0x70
[ 271.628796] ret_from_fork+0x35/0x40

Revision history for this message

Colin Ian King (colin-king) wrote on 2020-05-12:

#19

Hi, I was expecting some extra information and a panic message in a different place, are you sure this is running the latest zfs-dkms? One can check that by using dmesg | grep ZFS and check this is the .2 version of the debug zfd-dkms package.

Revision history for this message

Colin Ian King (colin-king) wrote on 2020-05-12:

#20

Oh, I see you are using the latest version: v0.7.5-1ubuntu16.10~lp1861235.2

I may need to figure out a .3 version to try next.

Revision history for this message

Colin Ian King (colin-king) wrote on 2020-05-15:

#21

Thanks for the data for the .2 tests. I've updated the packages with a .3 test build.

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-05-16:

#22

Hello Colin, trying the zfs recv operation on the .3 dkms eventually kills my system dead. There was nothing on the console. The cursor on the console stopped blinking; I couldn't switch VTs. My ssh sessions were hung. ping reported destination host unreachable.

It ran for about two and a half minutes, transferred around 10 datasets or snapshots:
$ journalctl -b -1 | grep -c "COMMAND=/sbin/zfs receive"
10

Sadly, journalctl doesn't have the dmesg from the previous boot:

$ journalctl -b -1 -k
[...]
May 15 17:25:10 wopr kernel: ZFS: Loaded module v0.7.5-1ubuntu16.10~lp1861235.3, ZFS pool version 5000, ZFS filesystem version 5
May 15 17:25:15 wopr kernel: zd32: p1
May 15 17:25:20 wopr kernel: pps pps0: new PPS source ptp0
May 15 17:25:20 wopr kernel: ixgbe 0000:05:00.0: registered PHC device on enp5s0f0
May 15 17:25:20 wopr kernel: IPv6: ADDRCONF(NETDEV_UP): enp5s0f0: link is not ready
May 15 17:25:20 wopr kernel: new mount options do not match the existing superblock, will be ignored
May 15 17:25:21 wopr kernel: Process accounting resumed
May 15 17:25:24 wopr kernel: ixgbe 0000:05:00.0 enp5s0f0: NIC Link is Up 1 Gbps, Flow Control: None
May 15 17:25:24 wopr kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp5s0f0: link becomes ready
May 15 17:25:29 wopr kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
May 15 17:25:29 wopr kernel: NFSD: starting 90-second grace period (net f00000a9)
May 15 17:25:32 wopr kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team
May 15 17:25:32 wopr kernel: Ebtables v2.0 registered
May 15 17:25:33 wopr kernel: bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
May 15 17:25:33 wopr kernel: virbr0: port 1(virbr0-nic) entered blocking state
May 15 17:25:33 wopr kernel: virbr0: port 1(virbr0-nic) entered disabled state
May 15 17:25:33 wopr kernel: device virbr0-nic entered promiscuous mode
May 15 17:25:33 wopr kernel: nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
May 15 17:25:33 wopr kernel: virbr0: port 1(virbr0-nic) entered blocking state
May 15 17:25:33 wopr kernel: virbr0: port 1(virbr0-nic) entered listening state
May 15 17:25:33 wopr kernel: virbr0: port 1(virbr0-nic) entered disabled state

Thanks

It ran for about two and a half minutes, transferred around 10 datasets or snapshots:
$ journalctl -b -1  | grep -c "COMMAND=/sbin/zfs receive"
10

Sadly, journalctl doesn't have the dmesg from the previous boot:

$ journalctl -b -1 -k
[...]
May 15 17:25:10 wopr kernel: ZFS: Loaded module v0.7.5-1ubuntu16.10~lp1861235.3, ZFS pool version 5000, ZFS filesystem version 5
May 15 17:25:15 wopr kernel:  zd32: p1
May 15 17:25:20 wopr kernel: pps pps0: new PPS source ptp0
May 15 17:25:20 wopr kernel: ixgbe 0000:05:00.0: registered PHC device on enp5s0f0
May 15 17:25:20 wopr kernel: IPv6: ADDRCONF(NETDEV_UP): enp5s0f0: link is not ready
May 15 17:25:20 wopr kernel: new mount options do not match the existing superblock, will be ignored
May 15 17:25:21 wopr kernel: Process accounting resumed
May 15 17:25:24 wopr kernel: ixgbe 0000:05:00.0 enp5s0f0: NIC Link is Up 1 Gbps, Flow Control: None
May 15 17:25:24 wopr kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp5s0f0: link becomes ready
May 15 17:25:29 wopr kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
May 15 17:25:29 wopr kernel: NFSD: starting 90-second grace period (net f00000a9)
May 15 17:25:32 wopr kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team
May 15 17:25:32 wopr kernel: Ebtables v2.0 registered
May 15 17:25:33 wopr kernel: bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
May 15 17:25:33 wopr kernel: virbr0: port 1(virbr0-nic) entered blocking state
May 15 17:25:33 wopr kernel: virbr0: port 1(virbr0-nic) entered disabled state
May 15 17:25:33 wopr kernel: device virbr0-nic entered promiscuous mode
May 15 17:25:33 wopr kernel: nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
May 15 17:25:33 wopr kernel: virbr0: port 1(virbr0-nic) entered blocking state
May 15 17:25:33 wopr kernel: virbr0: port 1(virbr0-nic) entered listening state
May 15 17:25:33 wopr kernel: virbr0: port 1(virbr0-nic) entered disabled state

Thanks

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-05-16: Re: [Bug 1861235] Re: zfs recv PANIC at range_tree.c:304:range_tree_find_impl()

#23

On Sat, May 16, 2020 at 01:56:08AM -0000, Seth Arnold wrote:
> Sadly, journalctl doesn't have the dmesg from the previous boot:

I meant to say, journalctl's copy of dmesg from the previous boot doesn't
have the new debug output. Sorry.

Thanks

Revision history for this message

Colin Ian King (colin-king) wrote on 2020-05-20:

#24

Hi Seth, I've upload a .4 - it may add a lot more debug but it will allow us to see the object type that is failing before the crash.

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-05-21:

#25

dmesg-v0.7.5-1ubuntu16.10~lp1861235.4 Edit (1.3 MiB, text/plain)

Download full text (16.7 KiB)

Colin, you're not kidding, there's a lot more debug information in this one. Here's the part that looks important:

[  289.167264] dmu_object_free: object = 0x8319, dn->dn_type = 196
[  289.167659] dmu_object_free: object = 0x8318, dn->dn_type = 13
[  289.167922] dmu_object_free: object = 0x8317, dn->dn_type = 15
[  289.168197] dmu_object_free: object = 0x8316, dn->dn_type = 12
[  289.168500] dmu_object_free: object = 0x861b, dn->dn_type = 195
[  289.351934] dmu_object_free: object = 0x5ef6, dn->dn_type = 50
[  289.353287] dmu_object_free: object = 0x8553, dn->dn_type = 37
[  289.353704] dmu_object_free: object = 0x65bb, dn->dn_type = 196
[  289.917945] dmu_object_free: object = 0x210, dn->dn_type = 19
[  289.918616] dmu_object_free: object = 0x211, dn->dn_type = 32
[  289.919025] dnone_free_ramge: nblks = 0, trunc = 1, len = 0xffffffffffffffff, blkshift = 0
[  289.919442] dnode_free_range: nblks == 0, len == 18446744073709551615
               , off=0
[  289.919443] range_tree_clear: size == 0
[  289.920690] range_tree_find_impl: size == 0
[  289.921123] VERIFY(size != 0) failed
[  289.921566] PANIC at range_tree.c:319:range_tree_find_impl()
[  289.922043] Showing stack for process 28938
[  289.922047] CPU: 15 PID: 28938 Comm: receive_writer Tainted: P           OE    4.15.0-101-generic #102-Ubuntu
[  289.922049] Hardware name: Supermicro SSG-6038R-E1CR16L/X10DRH-iT, BIOS 2.0 12/17/2015
[  289.922050] Call Trace:
[  289.922064]  dump_stack+0x6d/0x8e
[  289.922082]  spl_dumpstack+0x42/0x50 [spl]
[  289.922089]  spl_panic+0xc8/0x110 [spl]
[  289.922098]  ? console_unlock+0x2a1/0x560
[  289.922102]  ? vprintk_emit+0x104/0x2c0
[  289.922106]  ? vprintk_default+0x29/0x50
[  289.922111]  ? vprintk_func+0x47/0xc0
[  289.922114]  ? printk+0x52/0x6e
[  289.922201]  range_tree_find_impl+0x94/0xa0 [zfs]
[  289.922205]  ? printk+0x52/0x6e
[  289.922268]  range_tree_clear+0x54/0x70 [zfs]
[  289.922322]  dnode_free_range+0x15b/0x720 [zfs]
[  289.922372]  dmu_object_free+0x80/0xd0 [zfs]
[  289.922420]  dmu_free_long_object+0x9f/0xc0 [zfs]
[  289.922469]  receive_freeobjects.isra.12+0x7a/0x100 [zfs]
[  289.922517]  receive_writer_thread+0x6d2/0xa60 [zfs]
[  289.922521]  ? set_curr_task_fair+0x2b/0x60
[  289.922528]  ? spl_kmem_free+0x33/0x40 [spl]
[  289.922535]  ? kfree+0x168/0x180
[  289.922581]  ? receive_free.isra.13+0xc0/0xc0 [zfs]
[  289.922589]  thread_generic_wrapper+0x74/0x90 [spl]
[  289.922594]  kthread+0x121/0x140
[  289.922600]  ? __thread_exit+0x20/0x20 [spl]
[  289.922603]  ? kthread_create_worker_on_cpu+0x70/0x70
[  289.922610]  ret_from_fork+0x35/0x40
[  484.358156] INFO: task txg_quiesce:3136 blocked for more than 120 seconds.
[  484.358778]       Tainted: P           OE    4.15.0-101-generic #102-Ubuntu
[  484.359320] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.359898] txg_quiesce     D    0  3136      2 0x80000000
[  484.359902] Call Trace:
[  484.359914]  __schedule+0x24e/0x880
[  484.359918]  schedule+0x2c/0x80
[  484.359935]  cv_wait_common+0x11e/0x140 [spl]
[  484.359942]  ? wait_woken+0x80/0x80
[  484.359951]  __cv_wait+0x15/0x20 [spl]
[  484.360056]  txg_quiesce_thread+0x2cb/0x3d0 [zfs]
[  484.360124]  ? txg_delay+0x1b0/0x1b0 [zfs]
[  484.360132]  thread_generic_wrapper+0x74/0x90 [spl]
[  484.360138]  kthread+0x121/0x140
[  484.360144]  ? __thread_exit+0x20/0x20 [spl]
[  484.360147]  ? kthread_create_worker_on_cpu+0x70/0x70
[  484.360152]  ret_from_fork+0x35/0x40
[  484.360171] INFO: task zfs:28847 blocked for more than 120 seconds.
[  484.360776]       Tainted: P           OE    4.15.0-101-generic #102-Ubuntu
[  484.361403] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.362092] zfs             D    0 28847  28844 0x80000080
[  484.362095] Call Trace:
[  484.362101]  __schedule+0x24e/0x880
[  484.362104]  schedule+0x2c/0x80
[  484.362114]  cv_wait_common+0x11e/0x140 [spl]
[  484.362119]  ? wait_woken+0x80/0x80
[  484.362135]  __cv_wait+0x15/0x20 [spl]
[  484.362196]  dmu_recv_stream+0xa51/0xef0 [zfs]
[  484.362274]  zfs_ioc_recv_impl+0x306/0x1100 [zfs]
[  484.362325]  ? dbuf_read+0x34a/0x940 [zfs]
[  484.362377]  ? dbuf_rele+0x36/0x40 [zfs]
[  484.362386]  ? _cond_resched+0x19/0x40
[  484.362457]  zfs_ioc_recv_new+0x33d/0x410 [zfs]
[  484.362472]  ? spl_kmem_alloc_impl+0xe5/0x1a0 [spl]
[  484.362485]  ? spl_vmem_alloc+0x19/0x20 [spl]
[  484.362501]  ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
[  484.362511]  ? nv_mem_zalloc.isra.0+0x2e/0x40 [znvpair]
[  484.362517]  ? nvlist_xalloc.part.2+0x50/0xb0 [znvpair]
[  484.362581]  zfsdev_ioctl+0x451/0x610 [zfs]
[  484.362587]  do_vfs_ioctl+0xa8/0x630
[  484.362592]  ? __audit_syscall_entry+0xbc/0x110
[  484.362599]  ? syscall_trace_enter+0x1da/0x2d0
[  484.362602]  SyS_ioctl+0x79/0x90
[  484.362606]  do_syscall_64+0x73/0x130
[  484.362610]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  484.362613] RIP: 0033:0x7f53da8085d7
[  484.362615] RSP: 002b:00007ffc72d58f68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  484.362618] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00007f53da8085d7
[  484.362620] RDX: 00007ffc72d58f80 RSI: 0000000000005a46 RDI: 0000000000000006
[  484.362621] RBP: 00007ffc72d58f80 R08: 00007f53daadde20 R09: 0000000000000000
[  484.362623] R10: 000055ac5cec6010 R11: 0000000000000246 R12: 00007ffc72d5c5b0
[  484.362624] R13: 0000000000000006 R14: 000055ac5cecdf10 R15: 000000000000000c
[  484.362633] INFO: task receive_writer:28938 blocked for more than 120 seconds.
[  484.363327]       Tainted: P           OE    4.15.0-101-generic #102-Ubuntu
[  484.364048] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.364805] receive_writer  D    0 28938      2 0x80000080
[  484.364808] Call Trace:
[  484.364812]  __schedule+0x24e/0x880
[  484.364816]  schedule+0x2c/0x80
[  484.364826]  spl_panic+0xfa/0x110 [spl]
[  484.364832]  ? console_unlock+0x2a1/0x560
[  484.364837]  ? vprintk_emit+0x104/0x2c0
[  484.364841]  ? vprintk_default+0x29/0x50
[  484.364847]  ? vprintk_func+0x47/0xc0
[  484.364850]  ? printk+0x52/0x6e
[  484.364918]  range_tree_find_impl+0x94/0xa0 [zfs]
[  484.364922]  ? printk+0x52/0x6e
[  484.364985]  range_tree_clear+0x54/0x70 [zfs]
[  484.365042]  dnode_free_range+0x15b/0x720 [zfs]
[  484.365093]  dmu_object_free+0x80/0xd0 [zfs]
[  484.365143]  dmu_free_long_object+0x9f/0xc0 [zfs]
[  484.365195]  receive_freeobjects.isra.12+0x7a/0x100 [zfs]
[  484.365246]  receive_writer_thread+0x6d2/0xa60 [zfs]
[  484.365250]  ? set_curr_task_fair+0x2b/0x60
[  484.365257]  ? spl_kmem_free+0x33/0x40 [spl]
[  484.365265]  ? kfree+0x168/0x180
[  484.365313]  ? receive_free.isra.13+0xc0/0xc0 [zfs]
[  484.365321]  thread_generic_wrapper+0x74/0x90 [spl]
[  484.365325]  kthread+0x121/0x140
[  484.365332]  ? __thread_exit+0x20/0x20 [spl]
[  484.365334]  ? kthread_create_worker_on_cpu+0x70/0x70
[  484.365339]  ret_from_fork+0x35/0x40
[  605.188646] INFO: task txg_quiesce:3136 blocked for more than 120 seconds.
[  605.189595]       Tainted: P           OE    4.15.0-101-generic #102-Ubuntu
[  605.190434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  605.191302] txg_quiesce     D    0  3136      2 0x80000000
[  605.191306] Call Trace:
[  605.191316]  __schedule+0x24e/0x880
[  605.191320]  schedule+0x2c/0x80
[  605.191332]  cv_wait_common+0x11e/0x140 [spl]
[  605.191339]  ? wait_woken+0x80/0x80
[  605.191347]  __cv_wait+0x15/0x20 [spl]
[  605.191432]  txg_quiesce_thread+0x2cb/0x3d0 [zfs]
[  605.191493]  ? txg_delay+0x1b0/0x1b0 [zfs]
[  605.191501]  thread_generic_wrapper+0x74/0x90 [spl]
[  605.191505]  kthread+0x121/0x140
[  605.191511]  ? __thread_exit+0x20/0x20 [spl]
[  605.191514]  ? kthread_create_worker_on_cpu+0x70/0x70
[  605.191519]  ret_from_fork+0x35/0x40
[  605.191544] INFO: task zfs:28847 blocked for more than 120 seconds.
[  605.192434]       Tainted: P           OE    4.15.0-101-generic #102-Ubuntu
[  605.193375] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  605.194330] zfs             D    0 28847  28844 0x80000080
[  605.194333] Call Trace:
[  605.194339]  __schedule+0x24e/0x880
[  605.194343]  schedule+0x2c/0x80
[  605.194352]  cv_wait_common+0x11e/0x140 [spl]
[  605.194356]  ? wait_woken+0x80/0x80
[  605.194363]  __cv_wait+0x15/0x20 [spl]
[  605.194414]  dmu_recv_stream+0xa51/0xef0 [zfs]
[  605.194482]  zfs_ioc_recv_impl+0x306/0x1100 [zfs]
[  605.194521]  ? dbuf_read+0x34a/0x940 [zfs]
[  605.194561]  ? dbuf_rele+0x36/0x40 [zfs]
[  605.194564]  ? _cond_resched+0x19/0x40
[  605.194626]  zfs_ioc_recv_new+0x33d/0x410 [zfs]
[  605.194636]  ? spl_kmem_alloc_impl+0xe5/0x1a0 [spl]
[  605.194645]  ? spl_vmem_alloc+0x19/0x20 [spl]
[  605.194655]  ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
[  605.194660]  ? nv_mem_zalloc.isra.0+0x2e/0x40 [znvpair]
[  605.194666]  ? nvlist_xalloc.part.2+0x50/0xb0 [znvpair]
[  605.194723]  zfsdev_ioctl+0x451/0x610 [zfs]
[  605.194729]  do_vfs_ioctl+0xa8/0x630
[  605.194733]  ? __audit_syscall_entry+0xbc/0x110
[  605.194739]  ? syscall_trace_enter+0x1da/0x2d0
[  605.194742]  SyS_ioctl+0x79/0x90
[  605.194745]  do_syscall_64+0x73/0x130
[  605.194750]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  605.194753] RIP: 0033:0x7f53da8085d7
[  605.194754] RSP: 002b:00007ffc72d58f68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  605.194757] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00007f53da8085d7
[  605.194759] RDX: 00007ffc72d58f80 RSI: 0000000000005a46 RDI: 0000000000000006
[  605.194760] RBP: 00007ffc72d58f80 R08: 00007f53daadde20 R09: 0000000000000000
[  605.194761] R10: 000055ac5cec6010 R11: 0000000000000246 R12: 00007ffc72d5c5b0
[  605.194763] R13: 0000000000000006 R14: 000055ac5cecdf10 R15: 000000000000000c
[  605.194777] INFO: task receive_writer:28938 blocked for more than 120 seconds.
[  605.195760]       Tainted: P           OE    4.15.0-101-generic #102-Ubuntu
[  605.196789] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  605.197838] receive_writer  D    0 28938      2 0x80000080
[  605.197841] Call Trace:
[  605.197846]  __schedule+0x24e/0x880
[  605.197850]  schedule+0x2c/0x80
[  605.197858]  spl_panic+0xfa/0x110 [spl]
[  605.197863]  ? console_unlock+0x2a1/0x560
[  605.197867]  ? vprintk_emit+0x104/0x2c0
[  605.197872]  ? vprintk_default+0x29/0x50
[  605.197876]  ? vprintk_func+0x47/0xc0
[  605.197879]  ? printk+0x52/0x6e
[  605.197939]  range_tree_find_impl+0x94/0xa0 [zfs]
[  605.197943]  ? printk+0x52/0x6e
[  605.198000]  range_tree_clear+0x54/0x70 [zfs]
[  605.198049]  dnode_free_range+0x15b/0x720 [zfs]
[  605.198094]  dmu_object_free+0x80/0xd0 [zfs]
[  605.198136]  dmu_free_long_object+0x9f/0xc0 [zfs]
[  605.198180]  receive_freeobjects.isra.12+0x7a/0x100 [zfs]
[  605.198222]  receive_writer_thread+0x6d2/0xa60 [zfs]
[  605.198226]  ? set_curr_task_fair+0x2b/0x60
[  605.198233]  ? spl_kmem_free+0x33/0x40 [spl]
[  605.198238]  ? kfree+0x168/0x180
[  605.198278]  ? receive_free.isra.13+0xc0/0xc0 [zfs]
[  605.198285]  thread_generic_wrapper+0x74/0x90 [spl]
[  605.198289]  kthread+0x121/0x140
[  605.198295]  ? __thread_exit+0x20/0x20 [spl]
[  605.198297]  ? kthread_create_worker_on_cpu+0x70/0x70
[  605.198302]  ret_from_fork+0x35/0x40
[  726.019648] INFO: task txg_quiesce:3136 blocked for more than 120 seconds.
[  726.020809]       Tainted: P           OE    4.15.0-101-generic #102-Ubuntu
[  726.021920] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  726.023063] txg_quiesce     D    0  3136      2 0x80000000
[  726.023069] Call Trace:
[  726.023078]  __schedule+0x24e/0x880
[  726.023082]  schedule+0x2c/0x80
[  726.023094]  cv_wait_common+0x11e/0x140 [spl]
[  726.023100]  ? wait_woken+0x80/0x80
[  726.023108]  __cv_wait+0x15/0x20 [spl]
[  726.023205]  txg_quiesce_thread+0x2cb/0x3d0 [zfs]
[  726.023275]  ? txg_delay+0x1b0/0x1b0 [zfs]
[  726.023283]  thread_generic_wrapper+0x74/0x90 [spl]
[  726.023287]  kthread+0x121/0x140
[  726.023293]  ? __thread_exit+0x20/0x20 [spl]
[  726.023296]  ? kthread_create_worker_on_cpu+0x70/0x70
[  726.023301]  ret_from_fork+0x35/0x40
[  726.023324] INFO: task zfs:28847 blocked for more than 120 seconds.
[  726.024528]       Tainted: P           OE    4.15.0-101-generic #102-Ubuntu
[  726.025721] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  726.026956] zfs             D    0 28847  28844 0x80000080
[  726.026959] Call Trace:
[  726.026965]  __schedule+0x24e/0x880
[  726.026969]  schedule+0x2c/0x80
[  726.026979]  cv_wait_common+0x11e/0x140 [spl]
[  726.026984]  ? wait_woken+0x80/0x80
[  726.026991]  __cv_wait+0x15/0x20 [spl]
[  726.027051]  dmu_recv_stream+0xa51/0xef0 [zfs]
[  726.027126]  zfs_ioc_recv_impl+0x306/0x1100 [zfs]
[  726.027174]  ? dbuf_read+0x34a/0x940 [zfs]
[  726.027222]  ? dbuf_rele+0x36/0x40 [zfs]
[  726.027225]  ? _cond_resched+0x19/0x40
[  726.027294]  zfs_ioc_recv_new+0x33d/0x410 [zfs]
[  726.027305]  ? spl_kmem_alloc_impl+0xe5/0x1a0 [spl]
[  726.027313]  ? spl_vmem_alloc+0x19/0x20 [spl]
[  726.027323]  ? nv_alloc_sleep_spl+0x1f/0x30 [znvpair]
[  726.027329]  ? nv_mem_zalloc.isra.0+0x2e/0x40 [znvpair]
[  726.027336]  ? nvlist_xalloc.part.2+0x50/0xb0 [znvpair]
[  726.027400]  zfsdev_ioctl+0x451/0x610 [zfs]
[  726.027405]  do_vfs_ioctl+0xa8/0x630
[  726.027409]  ? __audit_syscall_entry+0xbc/0x110
[  726.027414]  ? syscall_trace_enter+0x1da/0x2d0
[  726.027417]  SyS_ioctl+0x79/0x90
[  726.027420]  do_syscall_64+0x73/0x130
[  726.027425]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  726.027428] RIP: 0033:0x7f53da8085d7
[  726.027430] RSP: 002b:00007ffc72d58f68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  726.027432] RAX: ffffffffffffffda RBX: 0000000000005a46 RCX: 00007f53da8085d7
[  726.027434] RDX: 00007ffc72d58f80 RSI: 0000000000005a46 RDI: 0000000000000006
[  726.027436] RBP: 00007ffc72d58f80 R08: 00007f53daadde20 R09: 0000000000000000
[  726.027437] R10: 000055ac5cec6010 R11: 0000000000000246 R12: 00007ffc72d5c5b0
[  726.027438] R13: 0000000000000006 R14: 000055ac5cecdf10 R15: 000000000000000c
[  726.027451] INFO: task receive_writer:28938 blocked for more than 120 seconds.
[  726.028733]       Tainted: P           OE    4.15.0-101-generic #102-Ubuntu
[  726.030021] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  726.031350] receive_writer  D    0 28938      2 0x80000080
[  726.031355] Call Trace:
[  726.031360]  __schedule+0x24e/0x880
[  726.031364]  schedule+0x2c/0x80
[  726.031373]  spl_panic+0xfa/0x110 [spl]
[  726.031378]  ? console_unlock+0x2a1/0x560
[  726.031382]  ? vprintk_emit+0x104/0x2c0
[  726.031386]  ? vprintk_default+0x29/0x50
[  726.031390]  ? vprintk_func+0x47/0xc0
[  726.031393]  ? printk+0x52/0x6e
[  726.031462]  range_tree_find_impl+0x94/0xa0 [zfs]
[  726.031480]  ? printk+0x52/0x6e
[  726.031543]  range_tree_clear+0x54/0x70 [zfs]
[  726.031604]  dnode_free_range+0x15b/0x720 [zfs]
[  726.031663]  dmu_object_free+0x80/0xd0 [zfs]
[  726.031715]  dmu_free_long_object+0x9f/0xc0 [zfs]
[  726.031771]  receive_freeobjects.isra.12+0x7a/0x100 [zfs]
[  726.031826]  receive_writer_thread+0x6d2/0xa60 [zfs]
[  726.031834]  ? set_curr_task_fair+0x2b/0x60
[  726.031844]  ? spl_kmem_free+0x33/0x40 [spl]
[  726.031852]  ? kfree+0x168/0x180
[  726.031899]  ? receive_free.isra.13+0xc0/0xc0 [zfs]
[  726.031911]  thread_generic_wrapper+0x74/0x90 [spl]
[  726.031918]  kthread+0x121/0x140
[  726.031924]  ? __thread_exit+0x20/0x20 [spl]
[  726.031930]  ? kthread_create_worker_on_cpu+0x70/0x70
[  726.031937]  ret_from_fork+0x35/0x40
[  846.850892] INFO: task txg_quiesce:3136 blocked for more than 120 seconds.
[  846.852296]       Tainted: P           OE    4.15.0-101-generic #102-Ubuntu
[  846.853681] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  846.855129] txg_quiesce     D    0  3136      2 0x80000000
[  846.855133] Call Trace:
[  846.855142]  __schedule+0x24e/0x880
[  846.855146]  schedule+0x2c/0x80
[  846.855158]  cv_wait_common+0x11e/0x140 [spl]
[  846.855167]  ? wait_woken+0x80/0x80
[  846.855180]  __cv_wait+0x15/0x20 [spl]
[  846.855279]  txg_quiesce_thread+0x2cb/0x3d0 [zfs]
[  846.855358]  ? txg_delay+0x1b0/0x1b0 [zfs]
[  846.855368]  thread_generic_wrapper+0x74/0x90 [spl]
[  846.855372]  kthread+0x121/0x140
[  846.855378]  ? __thread_exit+0x20/0x20 [spl]
[  846.855381]  ? kthread_create_worker_on_cpu+0x70/0x70
[  846.855386]  ret_from_fork+0x35/0x40
[ 3610.314028] dmu_object_free: object = 0x1e3d01, dn->dn_type = 19
[ 3610.316196] dmu_object_free: object = 0x1e2b0e, dn->dn_type = 19
[ 3610.323363] dmu_object_free: object = 0x1e57e7, dn->dn_type = 19
[ 3610.324468] dmu_object_free: object = 0x1e57e8, dn->dn_type = 19
[ 3610.325620] dmu_object_free: object = 0x1e57e9, dn->dn_type = 19
[ 3610.326732] dmu_object_free: object = 0x1e57ea, dn->dn_type = 19
[ 3610.327690] dmu_object_free: object = 0x1e57eb, dn->dn_type = 19
[ 3610.328713] dmu_object_free: object = 0x1e57ec, dn->dn_type = 19
[ 3610.329682] dmu_object_free: object = 0x1e57ed, dn->dn_type = 19
[ 3610.330666] dmu_object_free: object = 0x1e57ee, dn->dn_type = 19

And the full dmesg is attached.

Thanks

Revision history for this message

Colin Ian King (colin-king) wrote on 2020-05-28:

#26

OK so object 0x211 looks like the curlprit, I just can't see the pool name being dumped - do you know which pool is triggering this issue?

If you can identify the poolt then we can next dump the potentially corrupt object using:

sudo zdb -dddd <pool>/<dataset> 529

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-05-29:

#27

I picked the last dataset given in the command output from an earlier, but not the most recent, comment:

$ sudo zdb -dddd srv/backups/millbarge/rpool/var/log 529
Dataset srv/backups/millbarge/rpool/var/log [ZPL], ID 39694, cr_txg 23197757, 554M, 274 objects, rootbp DVA[0]=<1:1d000d42000:1000> DVA[1]=<2:24416f4b000:1000> [L0 DMU objset] sha256 uncompressed LE contiguous unique double size=800L/800P birth=23437543L/23437543P fill=274 cksum=cf5fcb2f986c768c:7cc8638aeb1a5162:9de9f236f072e16:40c6a08d90b54951

    Object lvl iblk dblk dsize dnsize lsize %full type
       529 0 0 0 0 512 0 -nan DSL permissions (K=UNKNOWN) (Z=UNKNOWN)
                                                 0 bonus UNKNOWN
dnode flags: USED_BYTES USERUSED_ACCOUNTED USEROBJUSED_ACCOUNTED SPILL_BLKPTR
dnode maxblkid: 0
UNKNOWN OBJECT TYPE
c < (1ULL << 24) >> 9 (0x7fffffffffffff < 0x8000)
ASSERT at ../../module/zfs/zio.c:266:zio_buf_alloc()Aborted

Thanks

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-09-25:

#28

Is it interesting that I'm seeing both 19 and 20 in my dmesg?

[1229822.406130] dmu_object_free: object = 0x266d2904, dn->dn_type = 20
[1229823.980888] dmu_object_free: object = 0x266d0d5a, dn->dn_type = 20
[1229823.994690] dmu_object_free: object = 0x266d0d5b, dn->dn_type = 20
[1229823.998123] dmu_object_free: object = 0x266d00c1, dn->dn_type = 19
[1229823.998191] dmu_object_free: object = 0x266d00c3, dn->dn_type = 19
[1229823.998324] dmu_object_free: object = 0x266d00c5, dn->dn_type = 19
[1229823.998377] dmu_object_free: object = 0x266d00c4, dn->dn_type = 20

Thanks

Revision history for this message

Andrea Righi (arighi) wrote on 2020-12-02:

#29

Hi @seth-arnold,

adding some details to this bug:

dn->dn_type = 19 = DMU_OT_PLAIN_FILE_CONTENTS
dn->dn_type = 20 = DMU_OT_DIRECTORY_CONTENTS

So they look correct, they seem to be just plain files and directories.

However, the specific object that is causing the panic seems to have dn_type=32 that is DMU_OT_DSL_PERMS (that's a bit odd, I don't see any other object types like this in your dmesg), moreover the output of "zdb -dddd" shows a lot of UNKNONW stuff (checksum and compression type for example, they are UNKNOWN).

So, it seems that we may have some corrupted data here.

Do you see any error if you try to run a simple `zpool status -v`? Do you know exactly which pool / volume / filesystem is causing this problem?

Revision history for this message

Andrea Righi (arighi) wrote on 2020-12-02:

#30

I just noticed the bug on github that also mentions about potential data corruption. It looks like we need to recompile zfs with --enable-debug. I'll try to produce a debug package with that.

Revision history for this message

Andrea Righi (arighi) wrote on 2020-12-02:

#31

I've uploaded the latest version of zfs with debugging enabled (--enable-debug) in this ppa:
https://launchpad.net/~arighi/+archive/ubuntu/zfs-linux

sudo add-apt-repository ppa:arighi/zfs-linux
sudo apt-get update
sudo apt-get install zfs-dkms

It'd be interesting to repeat the test using this version. Make sure the loaded zfs module is matching the right version:

$ modinfo zfs | grep ^version
version: 0.7.5-1ubuntu16.10+lp1861235

Moreover, if the data got corrupted, it'd be interesting to know why... can you provide some smartctl information? Did you notice any I/O error in your logs? Thanks!

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-12-02:

#32

Hello Andrea, thanks for looking into my problem.

# zpool status -v
pool: fst
state: ONLINE
scan: scrub repaired 0B in 2h37m with 0 errors on Sun Nov 8 03:01:53 2020
config:

errors: No known data errors

pool: srv
state: ONLINE
scan: scrub repaired 0B in 3h44m with 0 errors on Sun Nov 8 04:08:16 2020
config:

errors: No known data errors

I'll swap over to your dkms build and report back.

Thanks

Hello Andrea, thanks for looking into my problem.

# zpool status -v
  pool: fst
 state: ONLINE
  scan: scrub repaired 0B in 2h37m with 0 errors on Sun Nov  8 03:01:53 2020
config:

NAME                        STATE     READ WRITE CKSUM
	fst                         ONLINE       0     0     0
	  raidz1-0                  ONLINE       0     0     0
	    wwn-0x5002538e4095da39  ONLINE       0     0     0
	    wwn-0x5002538e4095bdd6  ONLINE       0     0     0
	    wwn-0x5002538e4093c6fd  ONLINE       0     0     0
	    wwn-0x5002538e4095da30  ONLINE       0     0     0

errors: No known data errors

pool: srv
 state: ONLINE
  scan: scrub repaired 0B in 3h44m with 0 errors on Sun Nov  8 04:08:16 2020
config:

NAME                                         STATE     READ WRITE CKSUM
	srv                                          ONLINE       0     0     0
	  mirror-0                                   ONLINE       0     0     0
	    ata-HGST_HUS724030ALA640_PN2234P8KTWJYY  ONLINE       0     0     0
	    ata-HGST_HUS724030ALA640_PN2234P9G620TW  ONLINE       0     0     0
	    ata-HGST_HUS724030ALA640_PN2234P9G66E2U  ONLINE       0     0     0
	  mirror-1                                   ONLINE       0     0     0
	    ata-HGST_HUS724030ALA640_PN2234P9G69TKU  ONLINE       0     0     0
	    ata-HGST_HUS724030ALA640_PN2234P9G69TXU  ONLINE       0     0     0
	    ata-HGST_HUS724030ALA640_PN2234P9G69U2U  ONLINE       0     0     0
	  mirror-2                                   ONLINE       0     0     0
	    ata-HGST_HUS724030ALA640_PN2234P9G6EBUU  ONLINE       0     0     0
	    ata-HGST_HUS724030ALA640_PN2234P9G6ESAU  ONLINE       0     0     0
	    ata-HGST_HUS724030ALA640_PN2234P9G6G70U  ONLINE       0     0     0
	logs
	  nvme0n1p1                                  ONLINE       0     0     0
	cache
	  nvme0n1p2                                  ONLINE       0     0     0

errors: No known data errors

I'll swap over to your dkms build and report back.

Thanks

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-12-02:

#33

Picking the previous bad object (which I'm pretty sure should still be intact):

$ modinfo zfs | grep ^version
version: 0.7.5-1ubuntu16.10+lp1861235
$ uname -a
Linux wopr 4.15.0-126-generic #129-Ubuntu SMP Mon Nov 23 18:53:38 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ sudo zdb -dddd srv/backups/millbarge/rpool/var/log 529
Dataset srv/backups/millbarge/rpool/var/log [ZPL], ID 39694, cr_txg 23197757, 554M, 274 objects, rootbp DVA[0]=<1:1d000d42000:1000> DVA[1]=<2:24416f4b000:1000> [L0 DMU objset] sha256 uncompressed LE contiguous unique double size=800L/800P birth=23437543L/23437543P fill=274 cksum=cf5fcb2f986c768c:7cc8638aeb1a5162:9de9f236f072e16:40c6a08d90b54951

Object lvl iblk dblk dsize dnsize lsize %full type
dn->dn_nblkptr >= 1 (0x0 >= 0x1)
ASSERT at ../../module/zfs/dnode.c:278:dnode_verify()Aborted
$ dmesg | tail
[ 43.017302] Ebtables v2.0 registered
[ 44.977437] L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.
[ 47.047114] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[ 47.047966] virbr0: port 1(virbr0-nic) entered blocking state
[ 47.047968] virbr0: port 1(virbr0-nic) entered disabled state
[ 47.048063] device virbr0-nic entered promiscuous mode
[ 47.433533] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
[ 47.971009] virbr0: port 1(virbr0-nic) entered blocking state
[ 47.971013] virbr0: port 1(virbr0-nic) entered listening state
[ 48.321853] virbr0: port 1(virbr0-nic) entered disabled state
$

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-12-02:

#34

smartctl Edit (40.4 KiB, text/plain)

Here's smartctl output on the disks in the pool. It's a bit hard to summarize nine drives of smartctl output but nothing stood out as interesting to me.

My journals only go back a few months so they wouldn't show any IO errors at the time of this problem being introduced, but this currently gives no results:

journalctl | grep -i "i/o error"

so there's nothing recent.

I started a new scrub, it'll finish in a few hours, if I forget to post results please do remind me. :)

Thanks

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-12-03:

#35

journalctl -b output Edit (2.3 MiB, text/plain)

The scrub has finished, no new errors in zpool status -v, dmesg is filled with debugging messages, I hope something is useful. :)

Thanks

$ uname -a
Linux wopr 4.15.0-126-generic #129-Ubuntu SMP Mon Nov 23 18:53:38 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ modinfo zfs | grep -i version
version: 0.7.5-1ubuntu16.10+lp1861235
srcversion: 62D1F1B1A1EE81EB3AA8ABA
$ zpool status -v
pool: fst
state: ONLINE
scan: scrub repaired 0B in 2h37m with 0 errors on Sun Nov 8 03:01:53 2020
config:

errors: No known data errors

pool: srv
state: ONLINE
scan: scrub repaired 0B in 4h12m with 0 errors on Wed Dec 2 18:53:44 2020
config:

errors: No known data errors

The scrub has finished, no new errors in zpool status -v, dmesg is filled with debugging messages, I hope something is useful. :)

Thanks

$ uname -a
Linux wopr 4.15.0-126-generic #129-Ubuntu SMP Mon Nov 23 18:53:38 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ modinfo zfs | grep -i version
version:        0.7.5-1ubuntu16.10+lp1861235
srcversion:     62D1F1B1A1EE81EB3AA8ABA
$ zpool status -v
  pool: fst
 state: ONLINE
  scan: scrub repaired 0B in 2h37m with 0 errors on Sun Nov  8 03:01:53 2020
config:

errors: No known data errors

pool: srv
 state: ONLINE
  scan: scrub repaired 0B in 4h12m with 0 errors on Wed Dec  2 18:53:44 2020
config:

errors: No known data errors

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-12-03:

#36

My guess is the next step is trying the zfs send | ssh zfs recv commands that were causing the VERIFY(size != 0) failed message with traces. I've always been uneasy with the 'main' error coming only with a recv operation, relying upon two systems to recreate a bug is iffy at best. And since it wedges the one, I don't like doing it just for fun, but I'd also like to get to the bottom of this.

Is there anything else to try first?

Thanks

Revision history for this message

Andrea Righi (arighi) wrote on 2020-12-04:

#37

This time I don't see any error or panic in `journalctl -b` and apparently everything is fine according to smartctl, so it looks like we are not getting any obvious I/O error at the moment.

IIUC you have done already a `zpool scrub` and that also didn't report any error, so apparently the zfs data integrigy checks passed locally.

If that's the case, yes, I think the next step would be to try to `zfs send | zfs recv` commands and see if we get more information with zfs debugging enabled. Alternatively we could try to stress the local zpool more, maybe running some I/O stress test (fio for example) on some volumes / filesystems created from the "potentially faulty" zpool. Apart than that I don't see any other useful test that we can do...

Revision history for this message

Launchpad Janitor (janitor) wrote on 2020-12-30:

#38

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in zfs-linux (Ubuntu Bionic):
status:	New → Confirmed

Colin Ian King (colin-king) on 2021-10-08

Changed in zfs-linux (Ubuntu):
assignee:	Colin Ian King (colin-king) → Dimitri John Ledkov (xnox)

Bug Watch Updater (bug-watch-updater) on 2023-08-28

Changed in linux:
status:	New → Fix Released

Dimitri John Ledkov (xnox) on 2023-12-06

Changed in zfs-linux (Ubuntu):
assignee:	Dimitri John Ledkov (xnox) → nobody

Dimitri John Ledkov (xnox) on 2023-12-06

Changed in zfs-linux (Ubuntu):
importance:	High → Undecided

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1861228

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

github-zfs #8637
[closed Type: Defect Status: Stale] Edit

Bug watches keep track of this bug in other bug trackers.