zfs hangs on mount/unmount
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Linux |
Fix Released
|
Unknown
|
|||
linux (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
I am running lxd 3.0 on ubuntu 18.04 with kernel 4.15.0-22-generic and 4.15.0-20-generic (same behaviour) with zfs backend (0.7.5-1ubuntu16; also tried 0.7.9).
Sometimes lxd hangs when I try to stop / restart or "stop && move" some containers. Furhter investigation showed that problem is in zfs mount or unmount: it just hangs and lxd just wait it. Also commands like "zfs list" hangs to.
It seems that it is not lxd or zfs issue, but kernel bug?
https:/
I have one test ct that always hangs on restart, so here is info:
dmesg:
[ 1330.390938] INFO: task txg_sync:9944 blocked for more than 120 seconds.
[ 1330.390994] Tainted: P O 4.15.0-22-generic #24-Ubuntu
[ 1330.391044] "echo 0 > /proc/sys/
[ 1330.391101] txg_sync D 0 9944 2 0x80000000
[ 1330.391105] Call Trace:
[ 1330.391117] __schedule+
[ 1330.391122] schedule+0x2c/0x80
[ 1330.391136] cv_wait_
[ 1330.391141] ? wait_woken+
[ 1330.391152] __cv_wait+0x15/0x20 [spl]
[ 1330.391234] rrw_enter_
[ 1330.391306] rrw_enter+0x13/0x20 [zfs]
[ 1330.391380] spa_sync+
[ 1330.391457] txg_sync_
[ 1330.391534] ? txg_quiesce_
[ 1330.391543] thread_
[ 1330.391549] kthread+0x121/0x140
[ 1330.391558] ? __thread_
[ 1330.391562] ? kthread_
[ 1330.391566] ? kthread_
[ 1330.391569] ret_from_
[ 1330.391582] INFO: task lxd:12419 blocked for more than 120 seconds.
[ 1330.391630] Tainted: P O 4.15.0-22-generic #24-Ubuntu
[ 1330.391679] "echo 0 > /proc/sys/
[ 1330.391735] lxd D 0 12419 1 0x00000000
[ 1330.391739] Call Trace:
[ 1330.391745] __schedule+
[ 1330.391749] schedule+0x2c/0x80
[ 1330.391752] rwsem_down_
[ 1330.391808] ? dbuf_rele_
[ 1330.391814] call_rwsem_
[ 1330.391817] ? call_rwsem_
[ 1330.391821] down_write+
[ 1330.391825] grab_super+
[ 1330.391901] ? zpl_create+
[ 1330.391905] sget_userns+
[ 1330.391908] ? get_anon_
[ 1330.391983] ? zpl_create+
[ 1330.391987] sget+0x7d/0xa0
[ 1330.391990] ? get_anon_
[ 1330.392066] zpl_mount+
[ 1330.392071] mount_fs+0x37/0x150
[ 1330.392077] vfs_kern_
[ 1330.392080] do_mount+
[ 1330.392083] ? copy_mount_
[ 1330.392086] SyS_mount+0x98/0xe0
[ 1330.392092] do_syscall_
[ 1330.392096] entry_SYSCALL_
[ 1330.392099] RIP: 0033:0x4db36a
[ 1330.392101] RSP: 002b:000000c420
[ 1330.392104] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004db36a
[ 1330.392106] RDX: 000000c4205984cc RSI: 000000c420a6ee00 RDI: 000000c420a23b60
[ 1330.392108] RBP: 000000c4207fa808 R08: 000000c4209d4960 R09: 0000000000000000
[ 1330.392110] R10: 0000000000000000 R11: 0000000000000216 R12: ffffffffffffffff
[ 1330.392112] R13: 0000000000000039 R14: 0000000000000038 R15: 0000000000000080
[ 1330.392123] INFO: task lxd:16725 blocked for more than 120 seconds.
[ 1330.392171] Tainted: P O 4.15.0-22-generic #24-Ubuntu
[ 1330.392220] "echo 0 > /proc/sys/
[ 1330.392276] lxd D 0 16725 1 0x00000002
[ 1330.392279] Call Trace:
[ 1330.392284] __schedule+
[ 1330.392289] ? irq_work_
[ 1330.392293] schedule+0x2c/0x80
[ 1330.392297] io_schedule+
[ 1330.392302] wait_on_
[ 1330.392305] ? page_cache_
[ 1330.392309] __filemap_
[ 1330.392313] ? _cond_resched+
[ 1330.392317] ? bdi_split_
[ 1330.392321] ? _cond_resched+
[ 1330.392324] filemap_
[ 1330.392327] sync_inodes_
[ 1330.392333] __sync_
[ 1330.392336] sync_filesystem
[ 1330.392340] generic_
[ 1330.392343] kill_anon_
[ 1330.392419] zpl_kill_
[ 1330.392423] deactivate_
[ 1330.392427] deactivate_
[ 1330.392430] cleanup_
[ 1330.392434] __cleanup_
[ 1330.392438] task_work_
[ 1330.392442] exit_to_
[ 1330.392445] do_syscall_
[ 1330.392449] entry_SYSCALL_
[ 1330.392451] RIP: 0033:0x7f0f72115447
[ 1330.392453] RSP: 002b:00007ffc5b
[ 1330.392457] RAX: 0000000000000000 RBX: 0000000000000010 RCX: 00007f0f72115447
[ 1330.392458] RDX: 0000000000000000 RSI: 0000000002422010 RDI: 0000000000000010
[ 1330.392460] RBP: 0000000002443e20 R08: 0000000000000000 R09: 0000000000000000
[ 1330.392462] R10: 0000000000000008 R11: 0000000000000293 R12: 0000000002443e4c
[ 1330.392464] R13: 0000000000000007 R14: 00007f0f725a7587 R15: 00007ffc5bf8f520
[ 1451.194003] INFO: task txg_sync:9944 blocked for more than 120 seconds.
[ 1451.194061] Tainted: P O 4.15.0-22-generic #24-Ubuntu
[ 1451.194111] "echo 0 > /proc/sys/
[ 1451.194168] txg_sync D 0 9944 2 0x80000000
[ 1451.194172] Call Trace:
[ 1451.194181] __schedule+
[ 1451.194186] schedule+0x2c/0x80
[ 1451.194206] cv_wait_
[ 1451.194213] ? wait_woken+
[ 1451.194225] __cv_wait+0x15/0x20 [spl]
[ 1451.194306] rrw_enter_
[ 1451.194379] rrw_enter+0x13/0x20 [zfs]
[ 1451.194452] spa_sync+
[ 1451.194529] txg_sync_
[ 1451.194606] ? txg_quiesce_
[ 1451.194616] thread_
[ 1451.194621] kthread+0x121/0x140
[ 1451.194630] ? __thread_
[ 1451.194634] ? kthread_
[ 1451.194638] ? kthread_
[ 1451.194641] ret_from_
[ 1451.194655] INFO: task lxd:12419 blocked for more than 120 seconds.
[ 1451.194705] Tainted: P O 4.15.0-22-generic #24-Ubuntu
[ 1451.194754] "echo 0 > /proc/sys/
[ 1451.194810] lxd D 0 12419 1 0x00000000
[ 1451.194814] Call Trace:
[ 1451.194819] __schedule+
[ 1451.194824] schedule+0x2c/0x80
[ 1451.194827] rwsem_down_
[ 1451.194883] ? dbuf_rele_
[ 1451.194889] call_rwsem_
[ 1451.194892] ? call_rwsem_
[ 1451.194895] down_write+
[ 1451.194900] grab_super+
[ 1451.194975] ? zpl_create+
[ 1451.194979] sget_userns+
[ 1451.194982] ? get_anon_
[ 1451.195058] ? zpl_create+
[ 1451.195062] sget+0x7d/0xa0
[ 1451.195065] ? get_anon_
[ 1451.195141] zpl_mount+
[ 1451.195145] mount_fs+0x37/0x150
[ 1451.195151] vfs_kern_
[ 1451.195154] do_mount+
[ 1451.195157] ? copy_mount_
[ 1451.195161] SyS_mount+0x98/0xe0
[ 1451.195165] do_syscall_
[ 1451.195169] entry_SYSCALL_
[ 1451.195172] RIP: 0033:0x4db36a
[ 1451.195174] RSP: 002b:000000c420
[ 1451.195178] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004db36a
[ 1451.195180] RDX: 000000c4205984cc RSI: 000000c420a6ee00 RDI: 000000c420a23b60
[ 1451.195181] RBP: 000000c4207fa808 R08: 000000c4209d4960 R09: 0000000000000000
[ 1451.195183] R10: 0000000000000000 R11: 0000000000000216 R12: ffffffffffffffff
[ 1451.195185] R13: 0000000000000039 R14: 0000000000000038 R15: 0000000000000080
[ 1451.195197] INFO: task lxd:16725 blocked for more than 120 seconds.
[ 1451.195245] Tainted: P O 4.15.0-22-generic #24-Ubuntu
[ 1451.195294] "echo 0 > /proc/sys/
[ 1451.195349] lxd D 0 16725 1 0x00000002
[ 1451.195353] Call Trace:
[ 1451.195358] __schedule+
[ 1451.195363] ? irq_work_
[ 1451.195367] schedule+0x2c/0x80
[ 1451.195371] io_schedule+
[ 1451.195376] wait_on_
[ 1451.195379] ? page_cache_
[ 1451.195384] __filemap_
[ 1451.195388] ? _cond_resched+
[ 1451.195391] ? bdi_split_
[ 1451.195395] ? _cond_resched+
[ 1451.195399] filemap_
[ 1451.195401] sync_inodes_
[ 1451.195407] __sync_
[ 1451.195410] sync_filesystem
[ 1451.195414] generic_
[ 1451.195417] kill_anon_
[ 1451.195493] zpl_kill_
[ 1451.195498] deactivate_
[ 1451.195501] deactivate_
[ 1451.195504] cleanup_
[ 1451.195508] __cleanup_
[ 1451.195512] task_work_
[ 1451.195516] exit_to_
[ 1451.195519] do_syscall_
[ 1451.195523] entry_SYSCALL_
[ 1451.195525] RIP: 0033:0x7f0f72115447
[ 1451.195527] RSP: 002b:00007ffc5b
[ 1451.195530] RAX: 0000000000000000 RBX: 0000000000000010 RCX: 00007f0f72115447
[ 1451.195532] RDX: 0000000000000000 RSI: 0000000002422010 RDI: 0000000000000010
[ 1451.195534] RBP: 0000000002443e20 R08: 0000000000000000 R09: 0000000000000000
[ 1451.195536] R10: 0000000000000008 R11: 0000000000000293 R12: 0000000002443e4c
[ 1451.195538] R13: 0000000000000007 R14: 00007f0f725a7587 R15: 00007ffc5bf8f520
[ 1451.195548] INFO: task zfs:18387 blocked for more than 120 seconds.
[ 1451.195595] Tainted: P O 4.15.0-22-generic #24-Ubuntu
[ 1451.195644] "echo 0 > /proc/sys/
[ 1451.195701] zfs D 0 18387 18374 0x00000000
[ 1451.195704] Call Trace:
[ 1451.195709] __schedule+
[ 1451.195714] schedule+0x2c/0x80
[ 1451.195724] cv_wait_
[ 1451.195728] ? wait_woken+
[ 1451.195739] __cv_wait+0x15/0x20 [spl]
[ 1451.195812] rrw_enter_
[ 1451.195884] rrw_enter+0x1c/0x20 [zfs]
[ 1451.195952] dsl_pool_
[ 1451.196011] dmu_objset_
[ 1451.196089] zfs_ioc_
[ 1451.196167] zfsdev_
[ 1451.196173] do_vfs_
[ 1451.196178] ? handle_
[ 1451.196183] ? __do_page_
[ 1451.196187] SyS_ioctl+0x79/0x90
[ 1451.196191] do_syscall_
[ 1451.196195] entry_SYSCALL_
[ 1451.196197] RIP: 0033:0x7ff184eec5d7
[ 1451.196198] RSP: 002b:00007ffd6b
[ 1451.196202] RAX: ffffffffffffffda RBX: 00007ffd6bd71ce0 RCX: 00007ff184eec5d7
[ 1451.196204] RDX: 00007ffd6bd71ce0 RSI: 0000000000005a12 RDI: 0000000000000003
[ 1451.196206] RBP: 00005558b3cc3430 R08: 00005558b3cc5fa0 R09: 0000000000000000
[ 1451.196208] R10: fffffffffffff000 R11: 0000000000000246 R12: 00005558b3cc3430
[ 1451.196210] R13: 00005558b3cc5380 R14: 0000000000000000 R15: 0000000000000000
zfs list: hangs
# cat /proc/18387/stack (this is "zfs list" command)
[<0>] cv_wait_
[<0>] __cv_wait+0x15/0x20 [spl]
[<0>] rrw_enter_
[<0>] rrw_enter+0x1c/0x20 [zfs]
[<0>] dsl_pool_
[<0>] dmu_objset_
[<0>] zfs_ioc_
[<0>] zfsdev_
[<0>] do_vfs_
[<0>] SyS_ioctl+0x79/0x90
[<0>] do_syscall_
[<0>] entry_SYSCALL_
[<0>] 0xffffffffffffffff
# cat /proc/17714/stack (this is "lxc restart ct" command)
[<0>] futex_wait_
[<0>] futex_wait+
[<0>] do_futex+
[<0>] SyS_futex+
[<0>] do_syscall_
[<0>] entry_SYSCALL_
[<0>] 0xffffffffffffffff
# cat /proc/16725/stack (this is "[lxc monitor] /var/lib/
[<0>] io_schedule+
[<0>] wait_on_
[<0>] __filemap_
[<0>] filemap_
[<0>] sync_inodes_
[<0>] __sync_
[<0>] sync_filesystem
[<0>] generic_
[<0>] kill_anon_
[<0>] zpl_kill_
[<0>] deactivate_
[<0>] deactivate_
[<0>] cleanup_
[<0>] __cleanup_
[<0>] task_work_
[<0>] exit_to_
[<0>] do_syscall_
[<0>] entry_SYSCALL_
[<0>] 0xffffffffffffffff
# cat /proc/12419/stack (I don't see this process now)
[<0>] call_rwsem_
[<0>] grab_super+
[<0>] sget_userns+
[<0>] sget+0x7d/0xa0
[<0>] zpl_mount+
[<0>] mount_fs+0x37/0x150
[<0>] vfs_kern_
[<0>] do_mount+
[<0>] SyS_mount+0x98/0xe0
[<0>] do_syscall_
[<0>] entry_SYSCALL_
[<0>] 0xffffffffffffffff
# cat /proc/9944/stack (this is "[txg_sync]")
[<0>] cv_wait_
[<0>] __cv_wait+0x15/0x20 [spl]
[<0>] rrw_enter_
[<0>] rrw_enter+0x13/0x20 [zfs]
[<0>] spa_sync+
[<0>] txg_sync_
[<0>] thread_
[<0>] kthread+0x121/0x140
[<0>] ret_from_
[<0>] 0xffffffffffffffff
zpool status -v:
pool: lxc
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
lxc ONLINE 0 0 0
lxc ONLINE 0 0 0
errors: No known data errors
---
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 May 25 18:11 seq
crw-rw---- 1 root audio 116, 33 May 25 18:11 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 18.04
HibernationDevice: RESUME=
InstallationDate: Installed on 2018-05-24 (3 days ago)
InstallationMedia: Ubuntu-Server 18.04 LTS "Bionic Beaver" - Release amd64 (20180426)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb:
Bus 002 Device 003: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse
Bus 002 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Intel Corporation S3420GP
NonfreeKernelMo
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
TERM=xterm-
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB: 0 mgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware 1.173
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic
Uname: Linux 4.15.0-22-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
_MarkForUpload: True
dmi.bios.date: 04/29/2010
dmi.bios.vendor: Intel Corp.
dmi.bios.version: S3420GP.
dmi.board.
dmi.board.name: S3420GP
dmi.board.vendor: Intel Corporation
dmi.board.version: E51976-405
dmi.chassis.
dmi.chassis.type: 17
dmi.chassis.vendor: .......
dmi.chassis.
dmi.modalias: dmi:bvnIntelCor
dmi.product.name: S3420GP
dmi.product.
dmi.sys.vendor: Intel Corporation
Changed in linux: | |
status: | Unknown → Fix Released |
Changed in linux: | |
status: | Fix Released → Unknown |
Changed in linux: | |
status: | Unknown → New |
Changed in linux: | |
status: | New → Fix Released |
Little notice also: after that hang is happend you can't shutdown or reboot your server, you need to reset it or turn it off manually (or via ipmi).