Activating autotrim results in high load average due to uninterruptible threads

Bug #2057693 reported by trackwitz
106
This bug affects 18 people
Affects Status Importance Assigned to Milestone
Native ZFS for Linux
Fix Released
Unknown
zfs-linux (Ubuntu)
Fix Released
Medium
Heitor Alves de Siqueira
Noble
Fix Released
Medium
John Cabaj
Oracular
Fix Released
Medium
Unassigned
Plucky
Fix Released
Medium
Heitor Alves de Siqueira

Bug Description

SRU Justification

[Impact]

* High load averages when activating autotrim. Logs included below SRU justification

[Fix]

* Cherry-pick a0aa7a2ee3b5: "Autotrim High Load Average Fix"

[Test Plan]

* Compile tested
* Run through autopkgtest regression tests

[Regression potential]

* Changes isolated, minimal regression risk. Changes already in upstream ZFS

When activating the autotrim feature on any ZFS version starting from 2.2.0 this will lead to a permanent increase of the load average (as diplayed in top) due to an uninterruptible vdev_autotrim thread for each vdev capable of TRIM.

This issue has been reported (https://github.com/openzfs/zfs/issues/15453) as well as fixed (https://github.com/openzfs/zfs/pull/15781) upstream but the fix is not yet backported to Ubuntu.

Since this bug was introduced with version 2.2.0 both mantic as well as noble are affected.

How to reproduce:
1. Create a pool with at least one TRIM-capable device
2. run "zpool set autotrim=on <pool>"
3. watch the output of "top" or "runtime" and see how the load average increases permanently even when the system is idle by one per vdev
4. running "ps aux | grep -w D" will show the broken threads:

[root@test ~]# ps aux | grep -w D\<
root 7193 0.0 0.0 0 0 ? D< 13:07 0:00 [vdev_autotrim]

Changed in zfs:
status: Unknown → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in zfs-linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Alfred (alf-redyoung) wrote (last edit ):

Jammy with HWE kernel seems to have same issue.

$ ps aux | grep -w D\<
root 3607 0.0 0.0 0 0 ? D< May30 0:13 [vdev_autotrim]
root 1821976 0.0 0.0 0 0 ? D< 21:43 0:03 [kworker/u41:0+i915_flip]

$ uptime
 22:37:28 up 24 days, 1:44, 8 users, load average: 1,40, 1,29, 1,22

$ uname -rv
6.5.0-35-generic #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 09:00:52 UTC 2

Revision history for this message
Marc Reymann (mreymann) wrote :

Any news on this? I did a release upgrade to Noble and have the exact same problem.

Changed in zfs-linux (Ubuntu):
assignee: nobody → Heitor Alves de Siqueira (halves)
status: Confirmed → In Progress
importance: Undecided → Medium
Revision history for this message
Marc Reymann (mreymann) wrote (last edit ):

This also happens when you install lxd on vanilla Ubuntu 24.04.1 LTS. Autotrim seems to be enabled by default (at least on SSDs), so I think a lot of users will be affected by this bug.

John Cabaj (john-cabaj)
description: updated
tags: added: patch
Revision history for this message
John Cabaj (john-cabaj) wrote :

Including 2.2.2-0ubuntu10.debdiff for updates.

Changed in zfs-linux (Ubuntu):
status: In Progress → Fix Released
assignee: Heitor Alves de Siqueira (halves) → nobody
Changed in zfs-linux (Ubuntu Noble):
status: New → In Progress
assignee: nobody → John Cabaj (john-cabaj)
importance: Undecided → Medium
Revision history for this message
florian (codarbyte) wrote :

is there an estimation when the patch will be merged into the main?
Can somebody instruct me how to apply the debdiff patch to my machine?

Revision history for this message
John Cabaj (john-cabaj) wrote :

We're hoping to get it into the queue this week. I have a version in a personal PPA, but it won't be officially supported, and was more for testing purposes. If you're really in a pinch, that might help.

Revision history for this message
MD JOBAYER ALAM (greenwebbd) wrote :

Installed lxd and upgraded to Linux 6.8.0-51-generic #52-Ubuntu SMP PREEMPT_DYNAMIC Thu Dec 5 13:09:44 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux I'm also facing the same problem.

Should we disable auto trim or ignore it?

Revision history for this message
Chris (saturos+ubu) wrote :

The only problem seems to be the higher load. But you can disable it and enable a timer instead if you want.

The fix does not seem to have been included in 6.8.0-52-generic.

Revision history for this message
florian (codarbyte) wrote :

really sad that this does not get fixed within months now...

Revision history for this message
John Cabaj (john-cabaj) wrote :

This has been fixed in the latest development version for Plucky, and we have to wait until that version gets released so as to not constitute a regression if the bug is fixed in an earlier release before later releases also are fixed. I'm as anxious as you to have these released, but it's largely out of my hands.

Revision history for this message
Heitor Alves de Siqueira (halves) wrote :

This is already fixed in Plucky and Oracular (cv_wait_idle for vdev autotrim), so only Noble is affected for now.

Changed in zfs-linux (Ubuntu Oracular):
status: New → Fix Released
importance: Undecided → Medium
Changed in zfs-linux (Ubuntu Plucky):
assignee: nobody → Heitor Alves de Siqueira (halves)
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello trackwitz, or anyone else affected,

Accepted zfs-linux into noble-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/zfs-linux/2.2.2-0ubuntu9.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-noble to verification-done-noble. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-noble. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in zfs-linux (Ubuntu Noble):
status: In Progress → Fix Committed
Revision history for this message
Chris (saturos+ubu) wrote :

Hi, regarding the above noble-proposed message, I installed zfsutils-linux 2.2.2-0ubuntu9.2 and the issue is not fixed. I guess I also need something else, such as a proposed kernel update? I am on 6.8.0-54-generic (6.8.0-54.56) now, what version would I need?

# zfs version
zfs-2.2.2-0ubuntu9.2
zfs-kmod-2.2.2-0ubuntu9.1

Revision history for this message
John Cabaj (john-cabaj) wrote :

Hi Chris, which version does ZFS give you when you run the following:

$ modinfo zfs | grep version

Thanks,
John

Revision history for this message
Chris (saturos+ubu) wrote :

version: 2.2.2-0ubuntu9.1
srcversion: 307E3A0219DD7E837933412
vermagic: 6.8.0-54-generic SMP preempt mod_unload modversions

The only thing I installed so far was zfsutils-linux/noble-proposed, which did bring in some other packages, but am I right that I also need a kernel (module) update from somewhere for the fix?

Commandline: apt-get install zfsutils-linux/noble-proposed
Upgrade: libnvpair3linux:amd64 (2.2.2-0ubuntu9.1, 2.2.2-0ubuntu9.2), libuutil3linux:amd64 (2.2.2-0ubuntu9.1, 2.2.2-0ubuntu9.2), libzpool5linux:amd64 (2.2.2-0ubuntu9.1, 2.2.2-0ubuntu9.2), libzfs4linux:amd64 (2.2.2-0ubuntu9.1, 2.2.2-0ubuntu9.2), zfsutils-linux:amd64 (2.2.2-0ubuntu9.1, 2.2.2-0ubuntu9.2)

Revision history for this message
John Cabaj (john-cabaj) wrote :

Right, so it seems that you're still running the previous release as the fixes are in 2.2.2-0ubuntu9.2. This version will be released built into the kernel in probably the 2025.03.17 cycle at the latest.

I was able to get the update from the dkms package by enabling noble-proposed (https://wiki.ubuntu.com/Testing/EnableProposed). apt was still trying to install 2.2.2-0ubuntu9.1, so I enabled only proposed (temporarily removed all but "noble-proposed" from the suites) and ran `apt update; apt install zfs-dkms`.

Give that a try and let me know if things work a bit better.

Thanks,
John

Revision history for this message
Chris (saturos+ubu) wrote :

Yep! With a simple install of zfs-dkms/noble-proposed and reboot I got 9.2:

version: 2.2.2-0ubuntu9.2

and my load average is now under 2 (I have 2 autotrims enabled): https://i.imgur.com/BRtcucv.png

Thank you!

Revision history for this message
John Cabaj (john-cabaj) wrote :

Awesome! Thanks for trying it out and thanks for getting back to us.

John

Revision history for this message
Jason L (jjlawren) wrote :

Does this still require a formal verification sign-off to be fully released to Noble?

Revision history for this message
MD JOBAYER ALAM (greenwebbd) wrote :

After enabling noble-proposed, I updated using apt update; apt install zfs-dkms and rebooted the server, and the issue is solved. Thanks a lot for solving it.

Timo Aaltonen (tjaalton)
tags: added: verification-done-noble
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zfs-linux - 2.2.2-0ubuntu9.2

---------------
zfs-linux (2.2.2-0ubuntu9.2) noble; urgency=medium

  [ John Cabaj ]
  * debian/patches/0001-Improve-performance-for-zpool-trim-on-linux.patch
  * debian/patches/0002-vdev_disk-ensure-trim-errors-are-returned-immediatel.patch
    - task txg_sync:696 blocked (LP: #2081678)
  * debian/patches/0003-Autotrim-High-Load-Average-Fix.patch
    - Activating autotrim results in high load average due to
      uninterruptible threads (LP: #2057693)
  * debian/patches/0004-Linux-Fix-zfs_prune-panics.patch
    - crash in openzfs - 2.2.2 not supported on 6.8 (LP: #2077926)

  [ Heitor Alves de Siqueira ]
  * Fix "field-spanning write" errors on zpool import (LP: #2082060):
    - d/p/4701-zfs_log-add-flex-array-fields-to-log-record-structs.patch
    - d/p/4702-lua-add-flex-array-field-to-TString-type.patch

 -- John Cabaj <email address hidden> Mon, 09 Dec 2024 14:00:32 -0600

Changed in zfs-linux (Ubuntu Noble):
status: Fix Committed → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Update Released

The verification of the Stable Release Update for zfs-linux has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Jason L (jjlawren) wrote :

It looks like the latest kernel package doesn't have the updated 2.2.2-0ubuntu9.2 kmod. What is the normal lag time for release so that installing the DKMS package isn't necessary?

Revision history for this message
Marc Reymann (mreymann) wrote :

Sort of the same question that Jason L has.
I'd rather not use DKMS, so I have these packages installed:

[root@primus:~]$ dpkg-query -l | grep -i zfs
ii libzfs4linux 2.2.2-0ubuntu9.2 amd64 OpeZFS filesystem library for Linux - general support
ii libzpool5linux 2.2.2-0ubuntu9.2 amd64 OpeZFS pool library for Linux
ii zfs-zed 2.2.2-0ubuntu9.2 amd64 OpeZFS Event Daemon
ii zfsutils-linux 2.2.2-0ubuntu9.2 amd64 command-line tools to manage OpenZFS filesystems

And the problem remains:

[root@primus:~]$ ps auxf | grep D\<
root 781 0.0 0.0 0 0 ? D< 17:54 0:00 \_ [vdev_autotrim]
root 4808 0.0 0.0 9524 2224 pts/0 S+ 17:55 0:00 \_ grep --color=auto D<

Will this be fixed?

Revision history for this message
Jason L (jjlawren) wrote :

Looks like this didn't make the recently released 6.8.0-58.60 kernel package. Probably a month away for this to hit the next cycle.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.