zfs hangs on mount/unmount

Bug #1773392 reported by Vasiliy on 2018-05-25
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Unknown
linux (Ubuntu)
Undecided
Unassigned

Bug Description

I am running lxd 3.0 on ubuntu 18.04 with kernel 4.15.0-22-generic and 4.15.0-20-generic (same behaviour) with zfs backend (0.7.5-1ubuntu16; also tried 0.7.9).

Sometimes lxd hangs when I try to stop / restart or "stop && move" some containers. Furhter investigation showed that problem is in zfs mount or unmount: it just hangs and lxd just wait it. Also commands like "zfs list" hangs to.

It seems that it is not lxd or zfs issue, but kernel bug?
https://github.com/lxc/lxd/issues/4104#issuecomment-392072939

I have one test ct that always hangs on restart, so here is info:

dmesg:
[ 1330.390938] INFO: task txg_sync:9944 blocked for more than 120 seconds.
[ 1330.390994] Tainted: P O 4.15.0-22-generic #24-Ubuntu
[ 1330.391044] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1330.391101] txg_sync D 0 9944 2 0x80000000
[ 1330.391105] Call Trace:
[ 1330.391117] __schedule+0x297/0x8b0
[ 1330.391122] schedule+0x2c/0x80
[ 1330.391136] cv_wait_common+0x11e/0x140 [spl]
[ 1330.391141] ? wait_woken+0x80/0x80
[ 1330.391152] __cv_wait+0x15/0x20 [spl]
[ 1330.391234] rrw_enter_write+0x3c/0xa0 [zfs]
[ 1330.391306] rrw_enter+0x13/0x20 [zfs]
[ 1330.391380] spa_sync+0x7c9/0xd80 [zfs]
[ 1330.391457] txg_sync_thread+0x2cd/0x4a0 [zfs]
[ 1330.391534] ? txg_quiesce_thread+0x3d0/0x3d0 [zfs]
[ 1330.391543] thread_generic_wrapper+0x74/0x90 [spl]
[ 1330.391549] kthread+0x121/0x140
[ 1330.391558] ? __thread_exit+0x20/0x20 [spl]
[ 1330.391562] ? kthread_create_worker_on_cpu+0x70/0x70
[ 1330.391566] ? kthread_create_worker_on_cpu+0x70/0x70
[ 1330.391569] ret_from_fork+0x35/0x40
[ 1330.391582] INFO: task lxd:12419 blocked for more than 120 seconds.
[ 1330.391630] Tainted: P O 4.15.0-22-generic #24-Ubuntu
[ 1330.391679] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1330.391735] lxd D 0 12419 1 0x00000000
[ 1330.391739] Call Trace:
[ 1330.391745] __schedule+0x297/0x8b0
[ 1330.391749] schedule+0x2c/0x80
[ 1330.391752] rwsem_down_write_failed+0x162/0x360
[ 1330.391808] ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs]
[ 1330.391814] call_rwsem_down_write_failed+0x17/0x30
[ 1330.391817] ? call_rwsem_down_write_failed+0x17/0x30
[ 1330.391821] down_write+0x2d/0x40
[ 1330.391825] grab_super+0x30/0x90
[ 1330.391901] ? zpl_create+0x160/0x160 [zfs]
[ 1330.391905] sget_userns+0x91/0x490
[ 1330.391908] ? get_anon_bdev+0x100/0x100
[ 1330.391983] ? zpl_create+0x160/0x160 [zfs]
[ 1330.391987] sget+0x7d/0xa0
[ 1330.391990] ? get_anon_bdev+0x100/0x100
[ 1330.392066] zpl_mount+0xa8/0x160 [zfs]
[ 1330.392071] mount_fs+0x37/0x150
[ 1330.392077] vfs_kern_mount.part.23+0x5d/0x110
[ 1330.392080] do_mount+0x5ed/0xce0
[ 1330.392083] ? copy_mount_options+0x2c/0x220
[ 1330.392086] SyS_mount+0x98/0xe0
[ 1330.392092] do_syscall_64+0x73/0x130
[ 1330.392096] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 1330.392099] RIP: 0033:0x4db36a
[ 1330.392101] RSP: 002b:000000c4207fa768 EFLAGS: 00000216 ORIG_RAX: 00000000000000a5
[ 1330.392104] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004db36a
[ 1330.392106] RDX: 000000c4205984cc RSI: 000000c420a6ee00 RDI: 000000c420a23b60
[ 1330.392108] RBP: 000000c4207fa808 R08: 000000c4209d4960 R09: 0000000000000000
[ 1330.392110] R10: 0000000000000000 R11: 0000000000000216 R12: ffffffffffffffff
[ 1330.392112] R13: 0000000000000039 R14: 0000000000000038 R15: 0000000000000080
[ 1330.392123] INFO: task lxd:16725 blocked for more than 120 seconds.
[ 1330.392171] Tainted: P O 4.15.0-22-generic #24-Ubuntu
[ 1330.392220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1330.392276] lxd D 0 16725 1 0x00000002
[ 1330.392279] Call Trace:
[ 1330.392284] __schedule+0x297/0x8b0
[ 1330.392289] ? irq_work_queue+0x8d/0xa0
[ 1330.392293] schedule+0x2c/0x80
[ 1330.392297] io_schedule+0x16/0x40
[ 1330.392302] wait_on_page_bit_common+0xd8/0x160
[ 1330.392305] ? page_cache_tree_insert+0xe0/0xe0
[ 1330.392309] __filemap_fdatawait_range+0xfa/0x160
[ 1330.392313] ? _cond_resched+0x19/0x40
[ 1330.392317] ? bdi_split_work_to_wbs+0x45/0x2c0
[ 1330.392321] ? _cond_resched+0x19/0x40
[ 1330.392324] filemap_fdatawait_keep_errors+0x1e/0x40
[ 1330.392327] sync_inodes_sb+0x20d/0x2b0
[ 1330.392333] __sync_filesystem+0x1b/0x60
[ 1330.392336] sync_filesystem+0x39/0x40
[ 1330.392340] generic_shutdown_super+0x27/0x120
[ 1330.392343] kill_anon_super+0x12/0x20
[ 1330.392419] zpl_kill_sb+0x1a/0x20 [zfs]
[ 1330.392423] deactivate_locked_super+0x48/0x80
[ 1330.392427] deactivate_super+0x40/0x60
[ 1330.392430] cleanup_mnt+0x3f/0x80
[ 1330.392434] __cleanup_mnt+0x12/0x20
[ 1330.392438] task_work_run+0x9d/0xc0
[ 1330.392442] exit_to_usermode_loop+0xc0/0xd0
[ 1330.392445] do_syscall_64+0x115/0x130
[ 1330.392449] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 1330.392451] RIP: 0033:0x7f0f72115447
[ 1330.392453] RSP: 002b:00007ffc5bf8f4a0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
[ 1330.392457] RAX: 0000000000000000 RBX: 0000000000000010 RCX: 00007f0f72115447
[ 1330.392458] RDX: 0000000000000000 RSI: 0000000002422010 RDI: 0000000000000010
[ 1330.392460] RBP: 0000000002443e20 R08: 0000000000000000 R09: 0000000000000000
[ 1330.392462] R10: 0000000000000008 R11: 0000000000000293 R12: 0000000002443e4c
[ 1330.392464] R13: 0000000000000007 R14: 00007f0f725a7587 R15: 00007ffc5bf8f520
[ 1451.194003] INFO: task txg_sync:9944 blocked for more than 120 seconds.
[ 1451.194061] Tainted: P O 4.15.0-22-generic #24-Ubuntu
[ 1451.194111] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1451.194168] txg_sync D 0 9944 2 0x80000000
[ 1451.194172] Call Trace:
[ 1451.194181] __schedule+0x297/0x8b0
[ 1451.194186] schedule+0x2c/0x80
[ 1451.194206] cv_wait_common+0x11e/0x140 [spl]
[ 1451.194213] ? wait_woken+0x80/0x80
[ 1451.194225] __cv_wait+0x15/0x20 [spl]
[ 1451.194306] rrw_enter_write+0x3c/0xa0 [zfs]
[ 1451.194379] rrw_enter+0x13/0x20 [zfs]
[ 1451.194452] spa_sync+0x7c9/0xd80 [zfs]
[ 1451.194529] txg_sync_thread+0x2cd/0x4a0 [zfs]
[ 1451.194606] ? txg_quiesce_thread+0x3d0/0x3d0 [zfs]
[ 1451.194616] thread_generic_wrapper+0x74/0x90 [spl]
[ 1451.194621] kthread+0x121/0x140
[ 1451.194630] ? __thread_exit+0x20/0x20 [spl]
[ 1451.194634] ? kthread_create_worker_on_cpu+0x70/0x70
[ 1451.194638] ? kthread_create_worker_on_cpu+0x70/0x70
[ 1451.194641] ret_from_fork+0x35/0x40
[ 1451.194655] INFO: task lxd:12419 blocked for more than 120 seconds.
[ 1451.194705] Tainted: P O 4.15.0-22-generic #24-Ubuntu
[ 1451.194754] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1451.194810] lxd D 0 12419 1 0x00000000
[ 1451.194814] Call Trace:
[ 1451.194819] __schedule+0x297/0x8b0
[ 1451.194824] schedule+0x2c/0x80
[ 1451.194827] rwsem_down_write_failed+0x162/0x360
[ 1451.194883] ? dbuf_rele_and_unlock+0x1a8/0x4b0 [zfs]
[ 1451.194889] call_rwsem_down_write_failed+0x17/0x30
[ 1451.194892] ? call_rwsem_down_write_failed+0x17/0x30
[ 1451.194895] down_write+0x2d/0x40
[ 1451.194900] grab_super+0x30/0x90
[ 1451.194975] ? zpl_create+0x160/0x160 [zfs]
[ 1451.194979] sget_userns+0x91/0x490
[ 1451.194982] ? get_anon_bdev+0x100/0x100
[ 1451.195058] ? zpl_create+0x160/0x160 [zfs]
[ 1451.195062] sget+0x7d/0xa0
[ 1451.195065] ? get_anon_bdev+0x100/0x100
[ 1451.195141] zpl_mount+0xa8/0x160 [zfs]
[ 1451.195145] mount_fs+0x37/0x150
[ 1451.195151] vfs_kern_mount.part.23+0x5d/0x110
[ 1451.195154] do_mount+0x5ed/0xce0
[ 1451.195157] ? copy_mount_options+0x2c/0x220
[ 1451.195161] SyS_mount+0x98/0xe0
[ 1451.195165] do_syscall_64+0x73/0x130
[ 1451.195169] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 1451.195172] RIP: 0033:0x4db36a
[ 1451.195174] RSP: 002b:000000c4207fa768 EFLAGS: 00000216 ORIG_RAX: 00000000000000a5
[ 1451.195178] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004db36a
[ 1451.195180] RDX: 000000c4205984cc RSI: 000000c420a6ee00 RDI: 000000c420a23b60
[ 1451.195181] RBP: 000000c4207fa808 R08: 000000c4209d4960 R09: 0000000000000000
[ 1451.195183] R10: 0000000000000000 R11: 0000000000000216 R12: ffffffffffffffff
[ 1451.195185] R13: 0000000000000039 R14: 0000000000000038 R15: 0000000000000080
[ 1451.195197] INFO: task lxd:16725 blocked for more than 120 seconds.
[ 1451.195245] Tainted: P O 4.15.0-22-generic #24-Ubuntu
[ 1451.195294] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1451.195349] lxd D 0 16725 1 0x00000002
[ 1451.195353] Call Trace:
[ 1451.195358] __schedule+0x297/0x8b0
[ 1451.195363] ? irq_work_queue+0x8d/0xa0
[ 1451.195367] schedule+0x2c/0x80
[ 1451.195371] io_schedule+0x16/0x40
[ 1451.195376] wait_on_page_bit_common+0xd8/0x160
[ 1451.195379] ? page_cache_tree_insert+0xe0/0xe0
[ 1451.195384] __filemap_fdatawait_range+0xfa/0x160
[ 1451.195388] ? _cond_resched+0x19/0x40
[ 1451.195391] ? bdi_split_work_to_wbs+0x45/0x2c0
[ 1451.195395] ? _cond_resched+0x19/0x40
[ 1451.195399] filemap_fdatawait_keep_errors+0x1e/0x40
[ 1451.195401] sync_inodes_sb+0x20d/0x2b0
[ 1451.195407] __sync_filesystem+0x1b/0x60
[ 1451.195410] sync_filesystem+0x39/0x40
[ 1451.195414] generic_shutdown_super+0x27/0x120
[ 1451.195417] kill_anon_super+0x12/0x20
[ 1451.195493] zpl_kill_sb+0x1a/0x20 [zfs]
[ 1451.195498] deactivate_locked_super+0x48/0x80
[ 1451.195501] deactivate_super+0x40/0x60
[ 1451.195504] cleanup_mnt+0x3f/0x80
[ 1451.195508] __cleanup_mnt+0x12/0x20
[ 1451.195512] task_work_run+0x9d/0xc0
[ 1451.195516] exit_to_usermode_loop+0xc0/0xd0
[ 1451.195519] do_syscall_64+0x115/0x130
[ 1451.195523] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 1451.195525] RIP: 0033:0x7f0f72115447
[ 1451.195527] RSP: 002b:00007ffc5bf8f4a0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
[ 1451.195530] RAX: 0000000000000000 RBX: 0000000000000010 RCX: 00007f0f72115447
[ 1451.195532] RDX: 0000000000000000 RSI: 0000000002422010 RDI: 0000000000000010
[ 1451.195534] RBP: 0000000002443e20 R08: 0000000000000000 R09: 0000000000000000
[ 1451.195536] R10: 0000000000000008 R11: 0000000000000293 R12: 0000000002443e4c
[ 1451.195538] R13: 0000000000000007 R14: 00007f0f725a7587 R15: 00007ffc5bf8f520
[ 1451.195548] INFO: task zfs:18387 blocked for more than 120 seconds.
[ 1451.195595] Tainted: P O 4.15.0-22-generic #24-Ubuntu
[ 1451.195644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1451.195701] zfs D 0 18387 18374 0x00000000
[ 1451.195704] Call Trace:
[ 1451.195709] __schedule+0x297/0x8b0
[ 1451.195714] schedule+0x2c/0x80
[ 1451.195724] cv_wait_common+0x11e/0x140 [spl]
[ 1451.195728] ? wait_woken+0x80/0x80
[ 1451.195739] __cv_wait+0x15/0x20 [spl]
[ 1451.195812] rrw_enter_read_impl+0x4e/0x160 [zfs]
[ 1451.195884] rrw_enter+0x1c/0x20 [zfs]
[ 1451.195952] dsl_pool_hold+0x5a/0x80 [zfs]
[ 1451.196011] dmu_objset_hold+0x33/0xa0 [zfs]
[ 1451.196089] zfs_ioc_objset_stats+0x32/0xa0 [zfs]
[ 1451.196167] zfsdev_ioctl+0x1e0/0x610 [zfs]
[ 1451.196173] do_vfs_ioctl+0xa8/0x630
[ 1451.196178] ? handle_mm_fault+0xb1/0x1f0
[ 1451.196183] ? __do_page_fault+0x270/0x4d0
[ 1451.196187] SyS_ioctl+0x79/0x90
[ 1451.196191] do_syscall_64+0x73/0x130
[ 1451.196195] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 1451.196197] RIP: 0033:0x7ff184eec5d7
[ 1451.196198] RSP: 002b:00007ffd6bd71cb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1451.196202] RAX: ffffffffffffffda RBX: 00007ffd6bd71ce0 RCX: 00007ff184eec5d7
[ 1451.196204] RDX: 00007ffd6bd71ce0 RSI: 0000000000005a12 RDI: 0000000000000003
[ 1451.196206] RBP: 00005558b3cc3430 R08: 00005558b3cc5fa0 R09: 0000000000000000
[ 1451.196208] R10: fffffffffffff000 R11: 0000000000000246 R12: 00005558b3cc3430
[ 1451.196210] R13: 00005558b3cc5380 R14: 0000000000000000 R15: 0000000000000000

zfs list: hangs

# cat /proc/18387/stack (this is "zfs list" command)
[<0>] cv_wait_common+0x11e/0x140 [spl]
[<0>] __cv_wait+0x15/0x20 [spl]
[<0>] rrw_enter_read_impl+0x4e/0x160 [zfs]
[<0>] rrw_enter+0x1c/0x20 [zfs]
[<0>] dsl_pool_hold+0x5a/0x80 [zfs]
[<0>] dmu_objset_hold+0x33/0xa0 [zfs]
[<0>] zfs_ioc_objset_stats+0x32/0xa0 [zfs]
[<0>] zfsdev_ioctl+0x1e0/0x610 [zfs]
[<0>] do_vfs_ioctl+0xa8/0x630
[<0>] SyS_ioctl+0x79/0x90
[<0>] do_syscall_64+0x73/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

# cat /proc/17714/stack (this is "lxc restart ct" command)
[<0>] futex_wait_queue_me+0xca/0x130
[<0>] futex_wait+0x10a/0x250
[<0>] do_futex+0x325/0x500
[<0>] SyS_futex+0x13b/0x180
[<0>] do_syscall_64+0x73/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

# cat /proc/16725/stack (this is "[lxc monitor] /var/lib/lxd/containers ct")
[<0>] io_schedule+0x16/0x40
[<0>] wait_on_page_bit_common+0xd8/0x160
[<0>] __filemap_fdatawait_range+0xfa/0x160
[<0>] filemap_fdatawait_keep_errors+0x1e/0x40
[<0>] sync_inodes_sb+0x20d/0x2b0
[<0>] __sync_filesystem+0x1b/0x60
[<0>] sync_filesystem+0x39/0x40
[<0>] generic_shutdown_super+0x27/0x120
[<0>] kill_anon_super+0x12/0x20
[<0>] zpl_kill_sb+0x1a/0x20 [zfs]
[<0>] deactivate_locked_super+0x48/0x80
[<0>] deactivate_super+0x40/0x60
[<0>] cleanup_mnt+0x3f/0x80
[<0>] __cleanup_mnt+0x12/0x20
[<0>] task_work_run+0x9d/0xc0
[<0>] exit_to_usermode_loop+0xc0/0xd0
[<0>] do_syscall_64+0x115/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

# cat /proc/12419/stack (I don't see this process now)
[<0>] call_rwsem_down_write_failed+0x17/0x30
[<0>] grab_super+0x30/0x90
[<0>] sget_userns+0x91/0x490
[<0>] sget+0x7d/0xa0
[<0>] zpl_mount+0xa8/0x160 [zfs]
[<0>] mount_fs+0x37/0x150
[<0>] vfs_kern_mount.part.23+0x5d/0x110
[<0>] do_mount+0x5ed/0xce0
[<0>] SyS_mount+0x98/0xe0
[<0>] do_syscall_64+0x73/0x130
[<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<0>] 0xffffffffffffffff

# cat /proc/9944/stack (this is "[txg_sync]")
[<0>] cv_wait_common+0x11e/0x140 [spl]
[<0>] __cv_wait+0x15/0x20 [spl]
[<0>] rrw_enter_write+0x3c/0xa0 [zfs]
[<0>] rrw_enter+0x13/0x20 [zfs]
[<0>] spa_sync+0x7c9/0xd80 [zfs]
[<0>] txg_sync_thread+0x2cd/0x4a0 [zfs]
[<0>] thread_generic_wrapper+0x74/0x90 [spl]
[<0>] kthread+0x121/0x140
[<0>] ret_from_fork+0x35/0x40
[<0>] 0xffffffffffffffff

zpool status -v:
  pool: lxc
 state: ONLINE
  scan: none requested
config:

        NAME STATE READ WRITE CKSUM
        lxc ONLINE 0 0 0
          lxc ONLINE 0 0 0

errors: No known data errors
---
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 May 25 18:11 seq
 crw-rw---- 1 root audio 116, 33 May 25 18:11 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 18.04
HibernationDevice: RESUME=/dev/mapper/vg00-swap
InstallationDate: Installed on 2018-05-24 (3 days ago)
InstallationMedia: Ubuntu-Server 18.04 LTS "Bionic Beaver" - Release amd64 (20180426)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb:
 Bus 002 Device 003: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse
 Bus 002 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Intel Corporation S3420GP
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 mgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-22-generic root=/dev/mapper/vg00-root ro
ProcVersionSignature: Ubuntu 4.15.0-22.24-generic 4.15.17
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-22-generic N/A
 linux-backports-modules-4.15.0-22-generic N/A
 linux-firmware 1.173
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic
Uname: Linux 4.15.0-22-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 04/29/2010
dmi.bios.vendor: Intel Corp.
dmi.bios.version: S3420GP.86B.01.00.0042.042920102218
dmi.board.asset.tag: ....................
dmi.board.name: S3420GP
dmi.board.vendor: Intel Corporation
dmi.board.version: E51976-405
dmi.chassis.asset.tag: ....................
dmi.chassis.type: 17
dmi.chassis.vendor: ..............................
dmi.chassis.version: ..................
dmi.modalias: dmi:bvnIntelCorp.:bvrS3420GP.86B.01.00.0042.042920102218:bd04/29/2010:svnIntelCorporation:pnS3420GP:pvr....................:rvnIntelCorporation:rnS3420GP:rvrE51976-405:cvn..............................:ct17:cvr..................:
dmi.product.name: S3420GP
dmi.product.version: ....................
dmi.sys.vendor: Intel Corporation

Vasiliy (vvershkov) wrote :

Little notice also: after that hang is happend you can't shutdown or reboot your server, you need to reset it or turn it off manually (or via ipmi).

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1773392

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic
Vasiliy (vvershkov) wrote : CRDA.txt

apport information

tags: added: apport-collected
description: updated

apport information

Vasiliy (vvershkov) wrote : Lspci.txt

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Vasiliy (vvershkov) wrote :

Little update here:
I tried ubuntu server 16.04 with HWE (4.13.0-36-generic) and lxd 3.0 + zfs 0.7.5 (latest from bionic repo) - and it seems there is no such bug with 4.13 kernel.

After that I updated kernel (from bionic repo again - 4.15.0-22-generic) - and but appear again: zfs hangs if my lxd ct restarts (on mount again).

So now I am kind of sure that it is kernel bug.
Also I tried 4.16.12 from here: http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.16.12/
Same behaviour - zfs hangs.

Vasiliy (vvershkov) wrote :

Another experiment:
1) Setting new ubuntu 18.04 server with LXD on ZFS - zfs hangs on ct restart
2) Install 4.13.0-36 (from 16.04 HWE install disk) kernel + zfs 0.6.5 from xenial repo (because 0.7.5 not compatible with old kernel)
3) Setup grub to load old kernel and reboot
4) And... everything is fine :)

I will use this method on my servers because I have no idea when this bug will be fixed...

Vasiliy (vvershkov) wrote :

And another little update to previous experiment:

Everything works fine with kernel 4.13.0-36 and zfs 0.7.5 IF you create new zpool with disabled unsupported features:
feature@multi_vdev_crash_dump
feature@large_dnode
feature@sha512
feature@skein
feature@edonr
feature@userobj_accounting

Problem is you can't disable them on current zpool, you need to recreate zpool. And this features is enabled by default: if you do "zpool create pool /dev/disk" - they are enabled on 4.15 or you get an error on 4.13.

Also disabling this features don't fix bug: if you create zpool on 4.13 kernel and after switch back to 4.15 - zfs still hangs on container restart.

Vasiliy (vvershkov) wrote :

And another little update:
I started to downgrade kernels on my serves (ubuntu 18.04 lxd hw nodes) and also I had a problem: HP blade gen8 hangs on reboot/shutdown (system indicates blade failure and you need to pull it off and insert again), while gen7 works fine.

With 4.13 kernel gen8 works fine and can be rebooted: so this issue was also fixed with kernel downgrade from 4.15...

Vasiliy (vvershkov) wrote :

4.15.0-23-generic - same bug: zfs still hangs.

A. Dieckmann (a.dieckmann) wrote :

vmlinuz-4.15.0-23-generic - it seems ok!

I have upgraded normally and everything is fine, the machine boots up. It's all done for me.

Thanks!

James Buren (ryu0) wrote :

It's an issue for my server too. LXD hangs when I try to restart containers at times. I'm switching to libvirt until this is resolved.

Simos Xenitellis  (simosx) wrote :

Upstream bug report and pull request to try:

"Kernel error "task zfs:pid blocked for more than 120 seconds" #7691"
https://github.com/zfsonlinux/zfs/issues/7691

"Fix zpl_mount() deadlock #7693"
https://github.com/zfsonlinux/zfs/pull/7693

Changed in linux:
status: Unknown → Fix Released
Changed in linux:
status: Fix Released → Unknown
Changed in linux:
status: Unknown → New
Changed in linux:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.