Comment 5 for bug 1998870

Revision history for this message
Lucas Teske (teske) wrote :

Also affects me. Some more information: After it gives that message, the I/O for any ZFS device gets really horrible (NVMe SSDs get a maximum throuput of 10MB/s while normally does 400MB/s) and only solves by restarting.

Things I have tried based on what I saw over the internet for the same issue:

1. Removing all swap
2. Reducing / Increasing the ARC RAM (always keeping at least 16GB of RAM free in the OS)
3. Checked smart data for all disks (everything is fine)

After rebooting, it _usually_ takes one or two days before the issue happens again. The performance is normal while it doesn't happen. I have several virtual machines that uses both SSD and disk drives. It usually starts running some backup routines from 10 PM to midnight, so it might be high I/O related stuff (although the backups usually only take a few minutes to do).

My dmesg is a bit different though, I get this repeated about 4 ou 5 times with 20 minutes interval, and the it usually stops reporting it.

[sex fev 24 02:31:53 2023] INFO: task txg_sync:2457 blocked for more than 120 seconds.
[sex fev 24 02:31:53 2023] Tainted: P O 5.15.0-46-generic #49-Ubuntu
[sex fev 24 02:31:53 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[sex fev 24 02:31:53 2023] task:txg_sync state:D stack: 0 pid: 2457 ppid: 2 flags:0x00004000
[sex fev 24 02:31:53 2023] Call Trace:
[sex fev 24 02:31:53 2023] <TASK>
[sex fev 24 02:31:53 2023] __schedule+0x23d/0x590
[sex fev 24 02:31:53 2023] schedule+0x4e/0xc0
[sex fev 24 02:31:53 2023] schedule_timeout+0x87/0x140
[sex fev 24 02:31:53 2023] ? zio_issue_async+0x12/0x20 [zfs]
[sex fev 24 02:31:53 2023] ? __bpf_trace_tick_stop+0x20/0x20
[sex fev 24 02:31:53 2023] io_schedule_timeout+0x51/0x80
[sex fev 24 02:31:53 2023] __cv_timedwait_common+0x12c/0x170 [spl]
[sex fev 24 02:31:53 2023] ? wait_woken+0x70/0x70
[sex fev 24 02:31:53 2023] __cv_timedwait_io+0x19/0x20 [spl]
[sex fev 24 02:31:53 2023] zio_wait+0x116/0x220 [zfs]
[sex fev 24 02:31:53 2023] dsl_pool_sync+0xb6/0x400 [zfs]
[sex fev 24 02:31:53 2023] ? __mod_timer+0x214/0x400
[sex fev 24 02:31:53 2023] spa_sync_iterate_to_convergence+0xe0/0x1f0 [zfs]
[sex fev 24 02:31:53 2023] spa_sync+0x2dc/0x5b0 [zfs]
[sex fev 24 02:31:53 2023] txg_sync_thread+0x266/0x2f0 [zfs]
[sex fev 24 02:31:53 2023] ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[sex fev 24 02:31:53 2023] thread_generic_wrapper+0x64/0x80 [spl]
[sex fev 24 02:31:53 2023] ? __thread_exit+0x20/0x20 [spl]
[sex fev 24 02:31:53 2023] kthread+0x12a/0x150
[sex fev 24 02:31:53 2023] ? set_kthread_struct+0x50/0x50
[sex fev 24 02:31:53 2023] ret_from_fork+0x22/0x30
[sex fev 24 02:31:53 2023] </TASK>

Machine:
- Lenovo RD450
- Dual Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
- 192GB RAM DDR4
- 2x 4TB Disks WD RED (ZFS Mirror)
- 2x 2TB Crucial NVMe (ZFS Mirror)
- 2x 8TB Disks WD RED (ZFS Mirror)
- 512GB WD Green SSD (OS Only)
- QLCNIC 10Gbps NIC

OS:
- Distributor ID: Ubuntu
- Description: Ubuntu 22.04.1 LTS
- Release: 22.04
- Codename: jammy

ZFS:

zfs-2.1.4-0ubuntu0.1
zfs-kmod-2.1.4-0ubuntu0.1