Concurrent zfs create and rename operations can lock a zpool completely

Bug #1560869 reported by Fabian Grünbichler
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
Undecided
Unassigned
zfs-linux (Ubuntu)
Fix Released
High
Colin Ian King

Bug Description

When doing "zfs create -V" and "zfs rename" operations on the same zpool in parallel, there is a high chance for a deadlock leading to a complete hang of the zpool in question (i.e., all further zfs operations hang indefinitely).

Attached you find a simple perl script that should trigger the bug (the pool variable needs to be set to your pool name), if run in two shells at the same time:

$ for i in `seq 1 100`; do sudo ./zfsrenamebug.pl "A$i"; done

$ for i in `seq 1 100`; do sudo ./zfsrenamebug.pl "B$i"; done

Reported upstream in https://github.com/zfsonlinux/zfs/issues/4404 and apparently fixed with the linked commits.
---
ApportVersion: 2.20-0ubuntu3
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CurrentDesktop: Unity
DistroRelease: Ubuntu 16.04
HibernationDevice: RESUME=UUID=41b79831-ff2c-4d62-8d09-0fd00a3fafad
InstallationDate: Installed on 2016-03-18 (4 days ago)
InstallationMedia: Ubuntu 16.04 LTS "Xenial Xerus" - Alpha amd64 (20160307)
IwConfig:
 ens18 no wireless extensions.

 lo no wireless extensions.
Lsusb: Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl
Package: zfs-linux
ProcFB: 0 qxldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-15-generic root=UUID=ff5bf1a3-8ced-46a8-9e2e-e3e7d0e522c0 ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 4.4.0-15.31-generic 4.4.6
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-15-generic N/A
 linux-backports-modules-4.4.0-15-generic N/A
 linux-firmware 1.157
RfKill:

Tags: xenial
Uname: Linux 4.4.0-15-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 04/01/2014
dmi.bios.vendor: SeaBIOS
dmi.bios.version: rel-1.8.2-0-g33fbe13 by qemu-project.org
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-2.5
dmi.modalias: dmi:bvnSeaBIOS:bvrrel-1.8.2-0-g33fbe13byqemu-project.org:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-2.5:cvnQEMU:ct1:cvrpc-i440fx-2.5:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-2.5
dmi.sys.vendor: QEMU

Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote :
Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote :
Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote :
Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote :
Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote :
Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote :

Stack traces of hanging processes collected via /proc/<PID>/stack:

proc-1519-stack.log:
[<ffffffffc01adc49>] cv_wait_common+0x109/0x140 [spl]
[<ffffffffc01adcb5>] __cv_wait_sig+0x15/0x20 [spl]
[<ffffffffc02bc131>] txg_quiesce_thread+0x3e1/0x3f0 [zfs]
[<ffffffffc01a8e41>] thread_generic_wrapper+0x71/0x80 [spl]
[<ffffffff8109f3e8>] kthread+0xd8/0xf0
[<ffffffff8182238f>] ret_from_fork+0x3f/0x70
[<ffffffffffffffff>] 0xffffffffffffffff
proc-1520-stack.log:
[<ffffffffc0311b18>] zvol_rename_minors+0x58/0x190 [zfs]
[<ffffffffc028eadd>] dsl_dir_rename_sync+0x31d/0x5a0 [zfs]
[<ffffffffc0297f69>] dsl_sync_task_sync+0xe9/0xf0 [zfs]
[<ffffffffc028ff77>] dsl_pool_sync+0x327/0x430 [zfs]
[<ffffffffc02ab536>] spa_sync+0x366/0xb30 [zfs]
[<ffffffffc02bc9ba>] txg_sync_thread+0x3ba/0x630 [zfs]
[<ffffffffc01a8e41>] thread_generic_wrapper+0x71/0x80 [spl]
[<ffffffff8109f3e8>] kthread+0xd8/0xf0
[<ffffffff8182238f>] ret_from_fork+0x3f/0x70
[<ffffffffffffffff>] 0xffffffffffffffff
proc-1763-stack.log:
[<ffffffffc01adc49>] cv_wait_common+0x109/0x140 [spl]
[<ffffffffc01adcb5>] __cv_wait_sig+0x15/0x20 [spl]
[<ffffffffc0299008>] zfs_zevent_wait+0x88/0xa0 [zfs]
[<ffffffffc02dfb46>] zfs_ioc_events_next+0xa6/0xe0 [zfs]
[<ffffffffc02e45a3>] zfsdev_ioctl+0x423/0x4b0 [zfs]
[<ffffffff8121e7df>] do_vfs_ioctl+0x29f/0x490
[<ffffffff8121ea49>] SyS_ioctl+0x79/0x90
[<ffffffff81821ff2>] entry_SYSCALL_64_fastpath+0x16/0x71
[<ffffffffffffffff>] 0xffffffffffffffff
proc-7027-stack.log:
[<ffffffffc01adc49>] cv_wait_common+0x109/0x140 [spl]
[<ffffffffc01adc95>] __cv_wait+0x15/0x20 [spl]
[<ffffffffc02a0e8c>] rrw_enter_read_impl+0x4c/0x160 [zfs]
[<ffffffffc02a117c>] rrw_enter+0x1c/0x20 [zfs]
[<ffffffffc0290eaa>] dsl_pool_hold+0x5a/0x80 [zfs]
[<ffffffffc026d244>] dmu_objset_own+0x44/0xd0 [zfs]
[<ffffffffc030f9e7>] __zvol_create_minor+0x127/0x640 [zfs]
[<ffffffffc0311763>] zvol_create_minor+0x33/0x70 [zfs]
[<ffffffffc03117ae>] zvol_create_minors_cb+0xe/0x20 [zfs]
[<ffffffffc026e872>] dmu_objset_find_impl+0x112/0x3f0 [zfs]
[<ffffffffc026eba8>] dmu_objset_find+0x58/0x90 [zfs]
[<ffffffffc0311949>] zvol_create_minors+0x29/0x30 [zfs]
[<ffffffffc02e6870>] zfs_ioc_create+0x190/0x2a0 [zfs]
[<ffffffffc02e4398>] zfsdev_ioctl+0x218/0x4b0 [zfs]
[<ffffffff8121e7df>] do_vfs_ioctl+0x29f/0x490
[<ffffffff8121ea49>] SyS_ioctl+0x79/0x90
[<ffffffff81821ff2>] entry_SYSCALL_64_fastpath+0x16/0x71
[<ffffffffffffffff>] 0xffffffffffffffff
proc-7038-stack.log:
[<ffffffffc01adc49>] cv_wait_common+0x109/0x140 [spl]
[<ffffffffc01adc95>] __cv_wait+0x15/0x20 [spl]
[<ffffffffc02bc225>] txg_wait_synced+0xe5/0x130 [zfs]
[<ffffffffc0297ca9>] dsl_sync_task+0x179/0x260 [zfs]
[<ffffffffc028d53b>] dsl_dir_rename+0x5b/0x80 [zfs]
[<ffffffffc02e013d>] zfs_ioc_rename+0x10d/0x120 [zfs]
[<ffffffffc02e45a3>] zfsdev_ioctl+0x423/0x4b0 [zfs]
[<ffffffff8121e7df>] do_vfs_ioctl+0x29f/0x490
[<ffffffff8121ea49>] SyS_ioctl+0x79/0x90
[<ffffffff81821ff2>] entry_SYSCALL_64_fastpath+0x16/0x71
[<ffffffffffffffff>] 0xffffffffffffffff

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1560869

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected xenial
description: updated
Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote : CRDA.txt

apport information

Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote : JournalErrors.txt

apport information

Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote : Lspci.txt

apport information

Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote : ProcEnviron.txt

apport information

Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote : ProcModules.txt

apport information

Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote : PulseList.txt

apport information

Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote : UdevDb.txt

apport information

Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote : WifiSyslog.txt

apport information

Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote :

Another note: this can also be triggered only with rename operations on zvols:

$ sudo zfs create -V 4096k pool1/zfsrename 1
$ sudo zfs create -V 4096k pool1/zfsrename 3

and then in two shells in parallel:

$ while : ; do echo "RENAME" `date`; sudo zfs rename pool1/testrename3 pool1/testrename4; sudo zfs rename pool1/testrename4 pool1/testrename3; done

$ while : ; do echo "RENAME" `date`; sudo zfs rename pool1/testrename1 pool1/testrename2; sudo zfs rename pool1/testrename2 pool1/testrename1; done

takes a bit longer than with the create rename destroy script from above, but triggers after 10-15 seconds in the same test VM:

user@test:~$ sudo cat /proc/6338/stack
[<ffffffffc02abc41>] spa_open_common+0x61/0x480 [zfs]
[<ffffffffc02ac083>] spa_open+0x13/0x20 [zfs]
[<ffffffffc02e2112>] pool_status_check.part.24+0x32/0xa0 [zfs]
[<ffffffffc02e2509>] zfsdev_ioctl+0x389/0x4b0 [zfs]
[<ffffffff8121e7df>] do_vfs_ioctl+0x29f/0x490
[<ffffffff8121ea49>] SyS_ioctl+0x79/0x90
[<ffffffff81821ff2>] entry_SYSCALL_64_fastpath+0x16/0x71
[<ffffffffffffffff>] 0xffffffffffffffff
user@test:~$ sudo cat /proc/6333/stack
[<ffffffffc01adc49>] cv_wait_common+0x109/0x140 [spl]
[<ffffffffc01adc95>] __cv_wait+0x15/0x20 [spl]
[<ffffffffc02ba225>] txg_wait_synced+0xe5/0x130 [zfs]
[<ffffffffc0295ca9>] dsl_sync_task+0x179/0x260 [zfs]
[<ffffffffc028b53b>] dsl_dir_rename+0x5b/0x80 [zfs]
[<ffffffffc02de13d>] zfs_ioc_rename+0x10d/0x120 [zfs]
[<ffffffffc02e25a3>] zfsdev_ioctl+0x423/0x4b0 [zfs]
[<ffffffff8121e7df>] do_vfs_ioctl+0x29f/0x490
[<ffffffff8121ea49>] SyS_ioctl+0x79/0x90
[<ffffffff81821ff2>] entry_SYSCALL_64_fastpath+0x16/0x71
[<ffffffffffffffff>] 0xffffffffffffffff

Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote :

Should be fixed with upstream version 0.6.5.6, so I guess this can be closed once that version hits the archive.

Changed in zfs-linux (Ubuntu):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Fabian Grünbichler (f-gruenbichler) wrote :

Does not trigger anymore with linux-image-4.4.0-16-generic 4.4.0-16.32 / zfs.ko v0.6.5.6-0ubuntu1. zfsutils-linux and friends are still on 0.6.5.4-0ubuntu6.

Changed in zfs-linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.