Backport ZoL pull request 9203 into the official packages.

Bug #1847832 reported by Alex Ingram on 2019-10-12
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
zfs-linux (Ubuntu)
High
Colin Ian King
Eoan
Undecided
Unassigned

Bug Description

== SRU Justification, EOAN ==

ZFS can deadlock, this can be sometimes triggered with a zfs rollback - "the zfs_resume_fs() code path may cause zfs to spawn new threads as it reinstantiates the suspended fs's zil. When a new thread is spawned, the kernel may attempt to free memory for that thread by freeing some unreferenced inodes. If it happens to select inodes that are a a part of the suspended fs a deadlock will occur because freeing inodes requires holding the fs's z_teardown_inactive_lock which is still held from the suspend."

== The Fix ==

Backport of ZFS upstream commit e7a2fa70c3b0d8c8cee2b484038bb5623c7c1ea9 ("Fix deadlock in 'zfs rollback'")

The backport is relatively simple context wiggle.

== Test Case ==

This is hard to trigger so testing is non-trivial. To check for regressions we run the entire Ubuntu ZFS regression test suite. Without the fix rollbacks can very occasionally trip this issue. With the test, it's not possible.

== Regression Potential ==

The fix adds in an extra z_suspended flag to track suspended state and adds an extra reference to stop the kernel from free'ing inodes on a suspected file system. The changes are small and are well-used in upstream ZFS so I believe if a regression was to have occurred it would have been found by the regression testing.

--------------------- Original Bug Report Below -------------------

Hopefully this is the correct bug tracker to report this on.

Recently, I ran into a bug in ZFS as shipped in Ubuntu 19.10 that caused the kernel to deadlock and the system to eventually hang.

I was advised by a ZFS on Linux project maintainer that this was a bug that was fixed in 0.8.2.

The relevant pull request is here: https://github.com/zfsonlinux/zfs/pull/9203

It would probably be a good idea to backport that pull request into 19.10's build of ZFS.

Colin Ian King (colin-king) wrote :

The fix in question is:

commit e7a2fa70c3b0d8c8cee2b484038bb5623c7c1ea9
Author: Tom Caputi <email address hidden>
Date: Tue Aug 27 12:55:51 2019 -0400

    Fix deadlock in 'zfs rollback'

    Currently, the 'zfs rollback' code can end up deadlocked due to
    the way the kernel handles unreferenced inodes on a suspended fs.
    Essentially, the zfs_resume_fs() code path may cause zfs to spawn
    new threads as it reinstantiates the suspended fs's zil. When a
    new thread is spawned, the kernel may attempt to free memory for
    that thread by freeing some unreferenced inodes. If it happens to
    select inodes that are a a part of the suspended fs a deadlock
    will occur because freeing inodes requires holding the fs's
    z_teardown_inactive_lock which is still held from the suspend.

    This patch corrects this issue by adding an additional reference
    to all inodes that are still present when a suspend is initiated.
    This prevents them from being freed by the kernel for any reason.

    Reviewed-by: Alek Pinchuk <email address hidden>
    Reviewed-by: Brian Behlendorf <email address hidden>
    Signed-off-by: Tom Caputi <email address hidden>
    Closes #9203

Changed in zfs-linux (Ubuntu):
assignee: nobody → Colin Ian King (colin-king)
importance: Undecided → High
status: New → In Progress
description: updated

Hello Alex, or anyone else affected,

Accepted zfs-linux into eoan-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/zfs-linux/0.8.1-1ubuntu14.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-eoan to verification-done-eoan. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-eoan. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in zfs-linux (Ubuntu Eoan):
status: New → Fix Committed
Colin Ian King (colin-king) wrote :

I've exercised this fix with the new zfsutils-linux package and zfs-dkms using the ubuntu zfs autotest regression tests:

ubuntu_zfs_smoke_test
ubuntu_zfs_fstest
ubuntu_zfs_xfs_generic
ubuntu_zfs_stress

..no regressions found.

These tests also touch zfs rollbacks. I don't see any regressions, so this passed testing and I'm happy for this to be released.

tags: added: verification-done-eoan
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers