ZFS I/O hangs for minutes

Bug #1654517 reported by Karl-Philipp Richter on 2017-01-06
32
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Native ZFS for Linux
Fix Released
Unknown
zfs-linux (Ubuntu)
Medium
Colin Ian King
Nominated for Artful by Rafael David Tinoco
Nominated for Bionic by Rafael David Tinoco
Nominated for Xenial by Rafael David Tinoco

Bug Description

I/O for multiple programs, like `thunderbird`, `firefox`, etc., hangs for minutes and approx. 100 `z_rd_int_[n]` and `z_wr_int_[n]` kernel threads are created, `dmesg` contains

    [ 9184.451606] INFO: task txg_sync:11471 blocked for more than 120 seconds.
    [ 9184.451610] Tainted: P OE 4.8.0-32-generic #34-Ubuntu
    [ 9184.451611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 9184.451612] txg_sync D ffffa240ab3a7aa8 0 11471 2 0x00000000
    [ 9184.451616] ffffa240ab3a7aa8 00ffffffbb6ade1f ffffa24095148000 ffffa240e5ca5580
    [ 9184.451618] 0000000000000046 ffffa240ab3a8000 ffffa240ff359200 7fffffffffffffff
    [ 9184.451620] ffffa23d36cf9050 0000000000000001 ffffa240ab3a7ac0 ffffffffbbe96b15
    [ 9184.451621] Call Trace:
    [ 9184.451627] [<ffffffffbbe96b15>] schedule+0x35/0x80
    [ 9184.451628] [<ffffffffbbe9a2ca>] schedule_timeout+0x22a/0x3f0
    [ 9184.451631] [<ffffffffbb62c77e>] ? __switch_to+0x2ce/0x6c0
    [ 9184.451633] [<ffffffffbb6c156c>] ? pick_next_task_fair+0x48c/0x4c0
    [ 9184.451635] [<ffffffffbb6fc511>] ? ktime_get+0x41/0xb0
    [ 9184.451636] [<ffffffffbbe96304>] io_schedule_timeout+0xa4/0x110
    [ 9184.451644] [<ffffffffc0acee12>] cv_wait_common+0xb2/0x130 [spl]
    [ 9184.451646] [<ffffffffbb6c7150>] ? wake_atomic_t_function+0x60/0x60
    [ 9184.451650] [<ffffffffc0aceee8>] __cv_wait_io+0x18/0x20 [spl]
    [ 9184.451689] [<ffffffffc0ebfcdd>] zio_wait+0xfd/0x1d0 [zfs]
    [ 9184.451716] [<ffffffffc0e4b2c8>] dsl_pool_sync+0xb8/0x480 [zfs]
    [ 9184.451745] [<ffffffffc0e6650f>] spa_sync+0x37f/0xb30 [zfs]
    [ 9184.451747] [<ffffffffbb6aecd2>] ? default_wake_function+0x12/0x20
    [ 9184.451779] [<ffffffffc0e77195>] txg_sync_thread+0x3a5/0x600 [zfs]
    [ 9184.451807] [<ffffffffc0e76df0>] ? txg_delay+0x160/0x160 [zfs]
    [ 9184.451811] [<ffffffffc0ac9fb1>] thread_generic_wrapper+0x71/0x80 [spl]
    [ 9184.451815] [<ffffffffc0ac9f40>] ? __thread_exit+0x20/0x20 [spl]
    [ 9184.451817] [<ffffffffbb6a3e58>] kthread+0xd8/0xf0
    [ 9184.451819] [<ffffffffbbe9b4df>] ret_from_fork+0x1f/0x40
    [ 9184.451821] [<ffffffffbb6a3d80>] ? kthread_create_on_node+0x1e0/0x1e0
    [ 9184.451849] INFO: task mozStorage #2:21607 blocked for more than 120 seconds.
    [ 9184.451851] Tainted: P OE 4.8.0-32-generic #34-Ubuntu
    [ 9184.451852] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 9184.451853] mozStorage #2 D ffffa23fe8a5bd38 0 21607 19750 0x00000004
    [ 9184.451855] ffffa23fe8a5bd38 00ffa240ee8feb40 ffffa240ecf72ac0 ffffa2403803b900
    [ 9184.451857] ffffffffbc2c02f7 ffffa23fe8a5c000 ffffa240aa940828 ffffa240aa940800
    [ 9184.451858] ffffa240aa940980 0000000000000000 ffffa23fe8a5bd50 ffffffffbbe96b15
    [ 9184.451860] Call Trace:
    [ 9184.451861] [<ffffffffbbe96b15>] schedule+0x35/0x80
    [ 9184.451866] [<ffffffffc0acee70>] cv_wait_common+0x110/0x130 [spl]
    [ 9184.451868] [<ffffffffbb6c7150>] ? wake_atomic_t_function+0x60/0x60
    [ 9184.451872] [<ffffffffc0aceea5>] __cv_wait+0x15/0x20 [spl]
    [ 9184.451904] [<ffffffffc0eb9759>] zil_commit.part.11+0x79/0x7a0 [zfs]
    [ 9184.451909] [<ffffffffc0acf9f6>] ? tsd_hash_search.isra.0+0x46/0xa0 [spl]
    [ 9184.451913] [<ffffffffc0ad0214>] ? tsd_set+0x2b4/0x500 [spl]
    [ 9184.451914] [<ffffffffbbe98a82>] ? mutex_lock+0x12/0x30
    [ 9184.451945] [<ffffffffc0eb9e97>] zil_commit+0x17/0x20 [zfs]
    [ 9184.451975] [<ffffffffc0eaf2aa>] zfs_fsync+0x7a/0xf0 [zfs]
    [ 9184.452005] [<ffffffffc0ec5878>] zpl_fsync+0x68/0xa0 [zfs]
    [ 9184.452008] [<ffffffffbb868b9b>] vfs_fsync_range+0x4b/0xb0
    [ 9184.452010] [<ffffffffbb868c5d>] do_fsync+0x3d/0x70
    [ 9184.452011] [<ffffffffbb868ef0>] SyS_fsync+0x10/0x20
    [ 9184.452013] [<ffffffffbbe9b2b6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
    [ 9184.452023] INFO: task bitcoin-msghand:663 blocked for more than 120 seconds.
    [ 9184.452024] Tainted: P OE 4.8.0-32-generic #34-Ubuntu
    [ 9184.452025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 9184.452026] bitcoin-msghand D ffffa23eeb23bd38 0 663 26994 0x00000000
    [ 9184.452028] ffffa23eeb23bd38 00ffa23eab434000 ffffa240ecf70000 ffffa24095148000
    [ 9184.452030] ffffa23eeb23bd20 ffffa23eeb23c000 ffffa240aa940828 ffffa240aa940800
    [ 9184.452031] ffffa240aa940980 0000000000000000 ffffa23eeb23bd50 ffffffffbbe96b15
    [ 9184.452033] Call Trace:
    [ 9184.452034] [<ffffffffbbe96b15>] schedule+0x35/0x80
    [ 9184.452039] [<ffffffffc0acee70>] cv_wait_common+0x110/0x130 [spl]
    [ 9184.452041] [<ffffffffbb6c7150>] ? wake_atomic_t_function+0x60/0x60
    [ 9184.452044] [<ffffffffc0aceea5>] __cv_wait+0x15/0x20 [spl]
    [ 9184.452074] [<ffffffffc0eb9759>] zil_commit.part.11+0x79/0x7a0 [zfs]
    [ 9184.452079] [<ffffffffc0acf9f6>] ? tsd_hash_search.isra.0+0x46/0xa0 [spl]
    [ 9184.452083] [<ffffffffc0ad0214>] ? tsd_set+0x2b4/0x500 [spl]
    [ 9184.452084] [<ffffffffbbe98a82>] ? mutex_lock+0x12/0x30
    [ 9184.452113] [<ffffffffc0eb9e97>] zil_commit+0x17/0x20 [zfs]
    [ 9184.452141] [<ffffffffc0eaf2aa>] zfs_fsync+0x7a/0xf0 [zfs]
    [ 9184.452168] [<ffffffffc0ec5878>] zpl_fsync+0x68/0xa0 [zfs]
    [ 9184.452170] [<ffffffffbb868b9b>] vfs_fsync_range+0x4b/0xb0
    [ 9184.452172] [<ffffffffbb868c5d>] do_fsync+0x3d/0x70
    [ 9184.452173] [<ffffffffbb868f13>] SyS_fdatasync+0x13/0x20
    [ 9184.452174] [<ffffffffbbe9b2b6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
    [ 9305.221219] INFO: task txg_sync:11471 blocked for more than 120 seconds.
    [ 9305.221223] Tainted: P OE 4.8.0-32-generic #34-Ubuntu
    [ 9305.221224] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 9305.221225] txg_sync D ffffa240ab3a7aa8 0 11471 2 0x00000000
    [ 9305.221229] ffffa240ab3a7aa8 00ffffffbb6ade1f ffffa24095148000 ffffa240e5ca5580
    [ 9305.221230] 0000000000000046 ffffa240ab3a8000 ffffa240ff359200 7fffffffffffffff
    [ 9305.221232] ffffa23d36cf9050 0000000000000001 ffffa240ab3a7ac0 ffffffffbbe96b15
    [ 9305.221234] Call Trace:
    [ 9305.221239] [<ffffffffbbe96b15>] schedule+0x35/0x80
    [ 9305.221241] [<ffffffffbbe9a2ca>] schedule_timeout+0x22a/0x3f0
    [ 9305.221243] [<ffffffffbb62c77e>] ? __switch_to+0x2ce/0x6c0
    [ 9305.221245] [<ffffffffbb6c156c>] ? pick_next_task_fair+0x48c/0x4c0
    [ 9305.221247] [<ffffffffbb6fc511>] ? ktime_get+0x41/0xb0
    [ 9305.221249] [<ffffffffbbe96304>] io_schedule_timeout+0xa4/0x110
    [ 9305.221256] [<ffffffffc0acee12>] cv_wait_common+0xb2/0x130 [spl]
    [ 9305.221258] [<ffffffffbb6c7150>] ? wake_atomic_t_function+0x60/0x60
    [ 9305.221262] [<ffffffffc0aceee8>] __cv_wait_io+0x18/0x20 [spl]
    [ 9305.221301] [<ffffffffc0ebfcdd>] zio_wait+0xfd/0x1d0 [zfs]
    [ 9305.221328] [<ffffffffc0e4b2c8>] dsl_pool_sync+0xb8/0x480 [zfs]
    [ 9305.221357] [<ffffffffc0e6650f>] spa_sync+0x37f/0xb30 [zfs]
    [ 9305.221359] [<ffffffffbb6aecd2>] ? default_wake_function+0x12/0x20
    [ 9305.221389] [<ffffffffc0e77195>] txg_sync_thread+0x3a5/0x600 [zfs]
    [ 9305.221417] [<ffffffffc0e76df0>] ? txg_delay+0x160/0x160 [zfs]
    [ 9305.221421] [<ffffffffc0ac9fb1>] thread_generic_wrapper+0x71/0x80 [spl]
    [ 9305.221424] [<ffffffffc0ac9f40>] ? __thread_exit+0x20/0x20 [spl]
    [ 9305.221426] [<ffffffffbb6a3e58>] kthread+0xd8/0xf0
    [ 9305.221428] [<ffffffffbbe9b4df>] ret_from_fork+0x1f/0x40
    [ 9305.221430] [<ffffffffbb6a3d80>] ? kthread_create_on_node+0x1e0/0x1e0
    [ 9305.221458] INFO: task mozStorage #2:21607 blocked for more than 120 seconds.
    [ 9305.221459] Tainted: P OE 4.8.0-32-generic #34-Ubuntu
    [ 9305.221460] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 9305.221461] mozStorage #2 D ffffa23fe8a5bd38 0 21607 19750 0x00000004
    [ 9305.221463] ffffa23fe8a5bd38 00ffa240ee8feb40 ffffa240ecf72ac0 ffffa2403803b900
    [ 9305.221465] ffffffffbc2c02f7 ffffa23fe8a5c000 ffffa240aa940828 ffffa240aa940800
    [ 9305.221466] ffffa240aa940980 0000000000000000 ffffa23fe8a5bd50 ffffffffbbe96b15
    [ 9305.221468] Call Trace:
    [ 9305.221469] [<ffffffffbbe96b15>] schedule+0x35/0x80
    [ 9305.221474] [<ffffffffc0acee70>] cv_wait_common+0x110/0x130 [spl]
    [ 9305.221475] [<ffffffffbb6c7150>] ? wake_atomic_t_function+0x60/0x60
    [ 9305.221479] [<ffffffffc0aceea5>] __cv_wait+0x15/0x20 [spl]
    [ 9305.221510] [<ffffffffc0eb9759>] zil_commit.part.11+0x79/0x7a0 [zfs]
    [ 9305.221515] [<ffffffffc0acf9f6>] ? tsd_hash_search.isra.0+0x46/0xa0 [spl]
    [ 9305.221519] [<ffffffffc0ad0214>] ? tsd_set+0x2b4/0x500 [spl]
    [ 9305.221520] [<ffffffffbbe98a82>] ? mutex_lock+0x12/0x30
    [ 9305.221551] [<ffffffffc0eb9e97>] zil_commit+0x17/0x20 [zfs]
    [ 9305.221582] [<ffffffffc0eaf2aa>] zfs_fsync+0x7a/0xf0 [zfs]
    [ 9305.221611] [<ffffffffc0ec5878>] zpl_fsync+0x68/0xa0 [zfs]
    [ 9305.221614] [<ffffffffbb868b9b>] vfs_fsync_range+0x4b/0xb0
    [ 9305.221616] [<ffffffffbb868c5d>] do_fsync+0x3d/0x70
    [ 9305.221618] [<ffffffffbb868ef0>] SyS_fsync+0x10/0x20
    [ 9305.221619] [<ffffffffbbe9b2b6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
    [ 9305.221621] INFO: task mozStorage #8:5507 blocked for more than 120 seconds.
    [ 9305.221622] Tainted: P OE 4.8.0-32-generic #34-Ubuntu
    [ 9305.221623] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    [ 9305.221624] mozStorage #8 D ffffa23f1573fc18 0 5507 19750 0x00000004
    [ 9305.221625] ffffa23f1573fc18 00000179907cd000 ffffa240ecf72ac0 ffffa23f944b9c80
    [ 9305.221627] ffffa240ec957188 ffffa23f15740000 ffffa240ff319200 7fffffffffffffff
    [ 9305.221629] ffffa23d0ac9a5e0 0000000000000001 ffffa23f1573fc30 ffffffffbbe96b15
    [ 9305.221630] Call Trace:
    [ 9305.221631] [<ffffffffbbe96b15>] schedule+0x35/0x80
    [ 9305.221633] [<ffffffffbbe9a2ca>] schedule_timeout+0x22a/0x3f0
    [ 9305.221662] [<ffffffffc0ebdb50>] ? zio_taskq_member.isra.6+0x90/0x90 [zfs]
    [ 9305.221666] [<ffffffffc0aca9ae>] ? taskq_dispatch_ent+0x4e/0x120 [spl]
    [ 9305.221668] [<ffffffffbb6fc511>] ? ktime_get+0x41/0xb0
    [ 9305.221669] [<ffffffffbbe96304>] io_schedule_timeout+0xa4/0x110
    [ 9305.221673] [<ffffffffc0acee12>] cv_wait_common+0xb2/0x130 [spl]
    [ 9305.221674] [<ffffffffbb6c7150>] ? wake_atomic_t_function+0x60/0x60
    [ 9305.221678] [<ffffffffc0aceee8>] __cv_wait_io+0x18/0x20 [spl]
    [ 9305.221706] [<ffffffffc0ebfcdd>] zio_wait+0xfd/0x1d0 [zfs]
    [ 9305.221736] [<ffffffffc0eb9aa6>] zil_commit.part.11+0x3c6/0x7a0 [zfs]
    [ 9305.221740] [<ffffffffc0acf9f6>] ? tsd_hash_search.isra.0+0x46/0xa0 [spl]
    [ 9305.221768] [<ffffffffc0eb9e97>] zil_commit+0x17/0x20 [zfs]
    [ 9305.221798] [<ffffffffc0eaf2aa>] zfs_fsync+0x7a/0xf0 [zfs]
    [ 9305.221829] [<ffffffffc0ec5878>] zpl_fsync+0x68/0xa0 [zfs]
    [ 9305.221833] [<ffffffffbb868b9b>] vfs_fsync_range+0x4b/0xb0
    [ 9305.221834] [<ffffffffbb868c5d>] do_fsync+0x3d/0x70
    [ 9305.221836] [<ffffffffbb868ef0>] SyS_fsync+0x10/0x20
    [ 9305.221838] [<ffffffffbbe9b2b6>] entry_SYSCALL_64_fastpath+0x1e/0xa8

ProblemType: Bug
DistroRelease: Ubuntu 16.10
Package: zfs-dkms 0.6.5.8-0ubuntu4.1
ProcVersionSignature: Ubuntu 4.8.0-32.34-generic 4.8.11
Uname: Linux 4.8.0-32-generic x86_64
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl
ApportVersion: 2.20.3-0ubuntu8.2
Architecture: amd64
CurrentDesktop: Unity
Date: Fri Jan 6 10:19:22 2017
InstallationDate: Installed on 2015-12-12 (390 days ago)
InstallationMedia: Ubuntu 15.10 "Wily Werewolf" - Release amd64 (20151021)
SourcePackage: zfs-linux
UpgradeStatus: Upgraded to yakkety on 2016-10-17 (80 days ago)
---
AlsaVersion: Advanced Linux Sound Architecture Driver Version k4.8.0-32-generic.
ApportVersion: 2.20.3-0ubuntu8.2
Architecture: amd64
ArecordDevices:
 **** Liste der Hardware-Geräte (CAPTURE) ****
 Karte 0: PCH [HDA Intel PCH], Gerät 0: ALC269VC Analog [ALC269VC Analog]
   Sub-Geräte: 1/1
   Sub-Gerät #0: subdevice #0
AudioDevicesInUse:
 BEN. PID ZUGR. BEFEHL
 /dev/snd/pcmC0D0p: richter 20519 F...m pulseaudio
 /dev/snd/controlC0: richter 20519 F.... pulseaudio
Card0.Amixer.info:
 Card hw:0 'PCH'/'HDA Intel PCH at 0xd3610000 irq 31'
   Mixer name : 'Realtek ALC269VC'
   Components : 'HDA:10ec0269,17aac034,00100202 HDA:80862806,80860101,00100000'
   Controls : 30
   Simple ctrls : 12
CurrentDesktop: Unity
DistroRelease: Ubuntu 16.10
HibernationDevice: RESUME=UUID=1bb0bb3e-8148-4957-9673-9701bd571c0a
InstallationDate: Installed on 2015-12-12 (390 days ago)
InstallationMedia: Ubuntu 15.10 "Wily Werewolf" - Release amd64 (20151021)
MachineType: LENOVO 20221
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl
Package: linux (not installed)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.8.0-32-generic root=UUID=791b47e7-915d-4d5a-bce0-d7e9b660a2ab ro rootflags=subvol=ubuntu-main-root
ProcVersionSignature: Ubuntu 4.8.0-32.34-generic 4.8.11
RelatedPackageVersions:
 linux-restricted-modules-4.8.0-32-generic N/A
 linux-backports-modules-4.8.0-32-generic N/A
 linux-firmware 1.161.1
Tags: yakkety
Uname: Linux 4.8.0-32-generic x86_64
UpgradeStatus: Upgraded to yakkety on 2016-10-17 (80 days ago)
UserGroups: adm autopilot bumblebee cdrom dip libvirtd lp lpadmin plugdev sambashare saned sudo vboxusers
_MarkForUpload: True
dmi.bios.date: 07/12/2013
dmi.bios.vendor: LENOVO
dmi.bios.version: 71CN51WW(V1.21)
dmi.board.asset.tag: No Asset Tag
dmi.board.name: INVALID
dmi.board.vendor: LENOVO
dmi.board.version: 31900003WIN8 STD MLT
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Lenovo IdeaPad Z500 Touch
dmi.modalias: dmi:bvnLENOVO:bvr71CN51WW(V1.21):bd07/12/2013:svnLENOVO:pn20221:pvrLenovoIdeaPadZ500Touch:rvnLENOVO:rnINVALID:rvr31900003WIN8STDMLT:cvnLENOVO:ct10:cvrLenovoIdeaPadZ500Touch:
dmi.product.name: 20221
dmi.product.version: Lenovo IdeaPad Z500 Touch
dmi.sys.vendor: LENOVO

Aron Xu (happyaron) on 2017-01-06
affects: zfs-linux (Ubuntu) → linux (Ubuntu)

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1654517

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete

apport information

tags: added: apport-collected
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
affects: linux (Ubuntu) → zfs-linux (Ubuntu)
Changed in zfs-linux (Ubuntu):
status: Confirmed → New
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in zfs-linux (Ubuntu):
status: New → Confirmed
description: updated
Colin Ian King (colin-king) wrote :

When this issue occurs again, can you run the following command:

iostat -mx

..and past the output into the bug. Thanks

Changed in zfs-linux (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
status: Confirmed → Incomplete
Changed in zfs:
status: Unknown → Fix Released
Colin Ian King (colin-king) wrote :

Is this still causing problems? Ive not heard any update on my question in comment #24, so I was wondering if this bug is still an issue.

Colin Ian King (colin-king) wrote :

Hi, if this is still and issue please let me know. I'm going to close the bug if I don't get any response in 1 week.

Changed in zfs-linux (Ubuntu):
status: Incomplete → Won't Fix
Download full text (6.5 KiB)

Colin,

I'm afraid this is an ongoing issue. I have faced this in 4.4 and 4.13 kernels:

[ 240.568146] INFO: task txg_sync:2065 blocked for more than 120 seconds.
[ 240.568218] Tainted: P O 4.4.0-116-generic #140-Ubuntu
[ 240.568268] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.568290] txg_sync D ffff880739e63aa8 0 2065 2 0x00000000
[ 240.568294] ffff880739e63aa8 ffff880739e63a88 ffff88081af75400 ffff8800ac635400
[ 240.568296] ffff880739e64000 ffff88083ed972c0 7fffffffffffffff ffff88009fffc968
[ 240.568297] 0000000000000001 ffff880739e63ac0 ffffffff8184ae45 0000000000000000
[ 240.568300] Call Trace:
[ 240.568305] [<ffffffff8184ae45>] schedule+0x35/0x80
[ 240.568307] [<ffffffff8184df98>] schedule_timeout+0x1b8/0x270
[ 240.568310] [<ffffffff810aec12>] ? default_wake_function+0x12/0x20
[ 240.568313] [<ffffffff810c6318>] ? __wake_up_common+0x58/0x90
[ 240.568315] [<ffffffff810f8cce>] ? ktime_get+0x3e/0xb0
[ 240.568317] [<ffffffff8184a5d4>] io_schedule_timeout+0xa4/0x110
[ 240.568325] [<ffffffffc0395c2c>] cv_wait_common+0xbc/0x140 [spl]
[ 240.568327] [<ffffffff810c69d0>] ? wake_atomic_t_function+0x60/0x60
[ 240.568332] [<ffffffffc0395d08>] __cv_wait_io+0x18/0x20 [spl]
[ 240.568373] [<ffffffffc0957634>] zio_wait+0x114/0x200 [zfs]
[ 240.568397] [<ffffffffc08e0d88>] dsl_pool_sync+0xb8/0x430 [zfs]
[ 240.568424] [<ffffffffc08fc7b6>] spa_sync+0x366/0xb30 [zfs]
[ 240.568426] [<ffffffff810aec12>] ? default_wake_function+0x12/0x20
[ 240.568453] [<ffffffffc090dc4a>] txg_sync_thread+0x3ba/0x630 [zfs]
[ 240.568481] [<ffffffffc090d890>] ? txg_delay+0x180/0x180 [zfs]
[ 240.568486] [<ffffffffc0390dc0>] ? __thread_exit+0x20/0x20 [spl]
[ 240.568490] [<ffffffffc0390e33>] thread_generic_wrapper+0x73/0x80 [spl]
[ 240.568492] [<ffffffff810a30f7>] kthread+0xe7/0x100
[ 240.568494] [<ffffffff810a3010>] ? kthread_create_on_node+0x1e0/0x1e0
[ 240.568495] [<ffffffff8184f422>] ret_from_fork+0x42/0x80
[ 240.568497] [<ffffffff810a3010>] ? kthread_create_on_node+0x1e0/0x1e0
[ 240.568505] INFO: task lxd:2562 blocked for more than 120 seconds.
[ 240.568524] Tainted: P O 4.4.0-116-generic #140-Ubuntu
[ 240.568543] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.568565] lxd D ffff880734773ba0 0 2562 1 0x00000000
[ 240.568567] ffff880734773ba0 0000000000000246 ffff88081af71c00 ffff88080d18f000
[ 240.568569] ffff880734774000 ffff8808144dba20 ffff8808144dba48 ffff8808144dbae0
[ 240.568571] 0000000000000000 ffff880734773bb8 ffffffff8184ae45 ffff8808144dbad8
[ 240.568573] Call Trace:
[ 240.568575] [<ffffffff8184ae45>] schedule+0x35/0x80
[ 240.568579] [<ffffffffc0395c7b>] cv_wait_common+0x10b/0x140 [spl]
[ 240.568581] [<ffffffff810c69d0>] ? wake_atomic_t_function+0x60/0x60
[ 240.568586] [<ffffffffc0395cc5>] __cv_wait+0x15/0x20 [spl]
[ 240.568614] [<ffffffffc090d4b5>] txg_wait_synced+0xe5/0x130 [zfs]
[ 240.568643] [<ffffffffc0952e4b>] zil_replay+0xdb/0x120 [zfs]
[ 240.568673] [<ffffffffc093fcef>] zfs_sb_setup+0x16f/0x180 [zfs]
[ 240.568702] [<ffffffffc0940b38>] zfs_dom...

Read more...

Changed in zfs-linux (Ubuntu):
status: Won't Fix → Confirmed
Colin Ian King (colin-king) wrote :

@Rafael,

1. what is your ZFS configuration (number of drives?)
2. size of your system (total memory, free memory).
3. Do your drive(s) suffer from large I/O latencies?
   - what is the output from iostat -mx

@Colin

----
1. its a really simple one since this is my devel machine: one ssd disk only

zpool: https://pastebin.ubuntu.com/p/gZPtrxRDnF/
hdparm: https://pastebin.ubuntu.com/p/JYyGwQ7cVr/

2. 8 x 4.0GHz AMD cores and 32GB RAM

meminfo: https://pastebin.ubuntu.com/p/kSjXSC6R3q/ (at the time of soft lockup)

3. likely it does since its a single spindle for zpool

iostat: https://pastebin.ubuntu.com/p/vbxXZKzmBp/

----

From the stack trace, zfs seems to be setting a property in a newly created zfs filesystem (since i'm mostly creating containers at this specific point in time). Following the code, zfs_ioctl_init sets as a callback to ZFS_IOC_SET_PROP, the function zfs_ioc_set_prop() which will trigger the code what gets soft locked by 120 sec (spinning on txg_wait_synced). The txg kthread, likely the one responsible for sync to be finished, is waiting on completion for zio that were submitted. So, yes, I do agree that this might be an overload because of the amount of spindles/throughput I have. I have to check now if 4.4 is also in this same step because it looked dead locked to me, for hours. Upstream kernel/zfs/spl is, at least, in good conditions after I/O is finished and sync is done.

Colin Ian King (colin-king) wrote :

You may possibly get some benefit from setting spl_kmem_cache_slab_limit to zero as a module parameter for the spl ZoL kernel module. This avoids SLUB slab merging entirely and maybe helpful from anecdotal evidence I've seen.

Colin Ian King (colin-king) wrote :

@Rafael, did the workaround in comment #31 help?

Changed in zfs-linux (Ubuntu):
status: Confirmed → Incomplete

Hello Colin, nope.

It looks like ZFS has some sort of contention in waiting the ZIL to finish the async callbacks from scheduled work, even when more spindles exist. Not sure if its related to slab merging or getting the objects from either slab/kmalloc or directly from page allocs. I think its a locking contention due to zfs scalability iself, to be honest :\. I'm getting, sometimes, soft lockups here and there (all waiting on on zfs sync - txg_wait_synced). I'm using a raid-z with 4x480G SSDs and an extra SSD as L2ARC disk.

Its fairly easy to reproduce, i'm basically compiling kernel tress all day in this machine, using the zfs pools as ccache's cache and as the compiled objects repository. I haven't explored tuning it though, considering its cache is not kept together with the pagecache (like changing zfs_dirty_data_sync or zfs_sync_taskq_batch_pct to make zfs cache flush less async, which might help).

Cheers!

Found one in my Debian box logs:

Jun 11 10:02:37 workstation kernel: [238276.640617] INFO: task txg_sync:5936 blocked for more than 120 seconds.
Jun 11 10:02:37 workstation kernel: [238276.643806] Tainted: P O 4.16.0-2-amd64 #1 Debian 4.16.12-
Jun 11 10:02:37 workstation kernel: [238276.646803] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this m
Jun 11 10:02:37 workstation kernel: [238276.649819] txg_sync D 0 5936 2 0x80000000
Jun 11 10:02:37 workstation kernel: [238276.652850] Call Trace:
Jun 11 10:02:37 workstation kernel: [238276.655776] ? __schedule+0x291/0x870
Jun 11 10:02:37 workstation kernel: [238276.658671] ? zio_taskq_dispatch+0x6f/0x90 [zfs]
Jun 11 10:02:37 workstation kernel: [238276.661196] ? zio_nowait+0xa3/0x140 [zfs]
Jun 11 10:02:37 workstation kernel: [238276.664199] schedule+0x28/0x80
Jun 11 10:02:37 workstation kernel: [238276.667206] io_schedule+0x12/0x40
Jun 11 10:02:37 workstation kernel: [238276.670226] cv_wait_common+0xac/0x130 [spl]
Jun 11 10:02:37 workstation kernel: [238276.673206] ? finish_wait+0x80/0x80
Jun 11 10:02:37 workstation kernel: [238276.676522] zio_wait+0xe6/0x1a0 [zfs]
Jun 11 10:02:37 workstation kernel: [238276.679588] dsl_pool_sync+0xe6/0x440 [zfs]
Jun 11 10:02:37 workstation kernel: [238276.682620] spa_sync+0x424/0xcf0 [zfs]
Jun 11 10:02:37 workstation kernel: [238276.685657] txg_sync_thread+0x2ce/0x490 [zfs]
Jun 11 10:02:37 workstation kernel: [238276.688661] ? txg_delay+0x1b0/0x1b0 [zfs]
Jun 11 10:02:37 workstation kernel: [238276.691641] ? __thread_exit+0x20/0x20 [spl]
Jun 11 10:02:37 workstation kernel: [238276.694604] thread_generic_wrapper+0x6f/0x80 [spl]
Jun 11 10:02:37 workstation kernel: [238276.697591] kthread+0x113/0x130
Jun 11 10:02:37 workstation kernel: [238276.700633] ? kthread_create_worker_on_cpu+0x70/0x70
Jun 11 10:02:37 workstation kernel: [238276.703594] ret_from_fork+0x22/0x40

Running zfs/spl 0.7.9-3.

I think this still happens from time to time with mainline and lates zfs/spl (as I tested in another box). So, if its related to zfs cache dirty ratio, then the zfs_dirty_data_sync could make cache smaller and the sync wouldn't take so long.. if changing it doesn't solve, then its likely an intermittent locking issue on the zio_wait logic.

Hope it helps!

Colin Ian King (colin-king) wrote :

This stack trace is similar to one in https://github.com/zfsonlinux/zfs/issues/2934 - the suspected issue is that one of the devices is taking a long time to complete and ZFS is blocked waiting for this.

I suggest installing sysstat

sudo apt install sysstat

and when the lockup happens run:

iostat -mx

so that we can see if it is a long device delay or not.

gf (gf-interlinks) wrote :

Hello Karl-Philipp,
Thank you for submitting this bug and reporting a problem with the zfs-linux package. You made this bug report in 2007 and there have been several versions of Ubuntu since then.

Could you confirm that this is no longer a problem and that we can close the ticket?
If it is still a problem, are you still interested in finding a solution to this bug?
If you are, could you let us know and, in the current version, could you run the following (only once):
apport-collect 1654517
and upload the updated logs and and any other logs that are relevant for this particular issue.

Thank you again for helping make Ubuntu better.
G

Colin Ian King (colin-king) wrote :

This bug has been waiting for feedback for questions in comment #35 and #36 for months. I'm going to mark this bug as Won't fix because of limited responses. If this bug is still an issue, please re-open it.

Changed in zfs-linux (Ubuntu):
status: Incomplete → Won't Fix
gf (gf-interlinks) wrote :

Ok, thanks for changing the status, Colin.
:)
G

Felix Moreno (info-justdust) wrote :

It happened to me, is not usual, in fact was muy first time, in my case it hanged forever, i was making a heavy use with nfs, ocal and lot of paralel copy writes.... and it stucked. No problems with the disks, just the process using 99% of cpu.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.