Also affects me. Some more information: After it gives that message, the I/O for any ZFS device gets really horrible (NVMe SSDs get a maximum throuput of 10MB/s while normally does 400MB/s) and only solves by restarting.
Things I have tried based on what I saw over the internet for the same issue:
1. Removing all swap
2. Reducing / Increasing the ARC RAM (always keeping at least 16GB of RAM free in the OS)
3. Checked smart data for all disks (everything is fine)
After rebooting, it _usually_ takes one or two days before the issue happens again. The performance is normal while it doesn't happen. I have several virtual machines that uses both SSD and disk drives. It usually starts running some backup routines from 10 PM to midnight, so it might be high I/O related stuff (although the backups usually only take a few minutes to do).
My dmesg is a bit different though, I get this repeated about 4 ou 5 times with 20 minutes interval, and the it usually stops reporting it.
Also affects me. Some more information: After it gives that message, the I/O for any ZFS device gets really horrible (NVMe SSDs get a maximum throuput of 10MB/s while normally does 400MB/s) and only solves by restarting.
Things I have tried based on what I saw over the internet for the same issue:
1. Removing all swap
2. Reducing / Increasing the ARC RAM (always keeping at least 16GB of RAM free in the OS)
3. Checked smart data for all disks (everything is fine)
After rebooting, it _usually_ takes one or two days before the issue happens again. The performance is normal while it doesn't happen. I have several virtual machines that uses both SSD and disk drives. It usually starts running some backup routines from 10 PM to midnight, so it might be high I/O related stuff (although the backups usually only take a few minutes to do).
My dmesg is a bit different though, I get this repeated about 4 ou 5 times with 20 minutes interval, and the it usually stops reporting it.
[sex fev 24 02:31:53 2023] INFO: task txg_sync:2457 blocked for more than 120 seconds. kernel/ hung_task_ timeout_ secs" disables this message. 0x23d/0x590 timeout+ 0x87/0x140 async+0x12/ 0x20 [zfs] tick_stop+ 0x20/0x20 timeout+ 0x51/0x80 common+ 0x12c/0x170 [spl] 0x70/0x70 io+0x19/ 0x20 [spl] 0x116/0x220 [zfs] sync+0xb6/ 0x400 [zfs] 0x214/0x400 iterate_ to_convergence+ 0xe0/0x1f0 [zfs] 0x2dc/0x5b0 [zfs] thread+ 0x266/0x2f0 [zfs] callbacks+ 0x100/0x100 [zfs] generic_ wrapper+ 0x64/0x80 [spl] exit+0x20/ 0x20 [spl] struct+ 0x50/0x50 fork+0x22/ 0x30
[sex fev 24 02:31:53 2023] Tainted: P O 5.15.0-46-generic #49-Ubuntu
[sex fev 24 02:31:53 2023] "echo 0 > /proc/sys/
[sex fev 24 02:31:53 2023] task:txg_sync state:D stack: 0 pid: 2457 ppid: 2 flags:0x00004000
[sex fev 24 02:31:53 2023] Call Trace:
[sex fev 24 02:31:53 2023] <TASK>
[sex fev 24 02:31:53 2023] __schedule+
[sex fev 24 02:31:53 2023] schedule+0x4e/0xc0
[sex fev 24 02:31:53 2023] schedule_
[sex fev 24 02:31:53 2023] ? zio_issue_
[sex fev 24 02:31:53 2023] ? __bpf_trace_
[sex fev 24 02:31:53 2023] io_schedule_
[sex fev 24 02:31:53 2023] __cv_timedwait_
[sex fev 24 02:31:53 2023] ? wait_woken+
[sex fev 24 02:31:53 2023] __cv_timedwait_
[sex fev 24 02:31:53 2023] zio_wait+
[sex fev 24 02:31:53 2023] dsl_pool_
[sex fev 24 02:31:53 2023] ? __mod_timer+
[sex fev 24 02:31:53 2023] spa_sync_
[sex fev 24 02:31:53 2023] spa_sync+
[sex fev 24 02:31:53 2023] txg_sync_
[sex fev 24 02:31:53 2023] ? txg_dispatch_
[sex fev 24 02:31:53 2023] thread_
[sex fev 24 02:31:53 2023] ? __thread_
[sex fev 24 02:31:53 2023] kthread+0x12a/0x150
[sex fev 24 02:31:53 2023] ? set_kthread_
[sex fev 24 02:31:53 2023] ret_from_
[sex fev 24 02:31:53 2023] </TASK>
Machine:
- Lenovo RD450
- Dual Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
- 192GB RAM DDR4
- 2x 4TB Disks WD RED (ZFS Mirror)
- 2x 2TB Crucial NVMe (ZFS Mirror)
- 2x 8TB Disks WD RED (ZFS Mirror)
- 512GB WD Green SSD (OS Only)
- QLCNIC 10Gbps NIC
OS:
- Distributor ID: Ubuntu
- Description: Ubuntu 22.04.1 LTS
- Release: 22.04
- Codename: jammy
ZFS:
zfs-2.1. 4-0ubuntu0. 1 2.1.4-0ubuntu0. 1
zfs-kmod-