crash in openzfs - 2.2.2 not supported on 6.8

Bug #2077926 reported by Erik Hortsch
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
zfs-linux (Ubuntu)
Status tracked in Plucky
Oracular
New
Undecided
Unassigned
Plucky
Confirmed
Undecided
Unassigned

Bug Description

Crash report on openzfs: https://github.com/openzfs/zfs/issues/16482

> You should probably report this to Ubuntu or try it on 2.2.5 vanilla, 2.2.2 doesn't claim to work on 6.8

Distribution Version 24.04
Kernel Version 6.8.0-41-generic
Architecture intel x86-64
OpenZFS Version zfs-2.2.2-0ubuntu9, zfs-kmod-2.2.2-0ubuntu9

ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: zfsutils-linux 2.2.2-0ubuntu9
ProcVersionSignature: Ubuntu 6.8.0-41.41-generic 6.8.12
Uname: Linux 6.8.0-41-generic x86_64
NonfreeKernelModules: zfs
ApportVersion: 2.28.1-0ubuntu3.1
Architecture: amd64
CasperMD5CheckResult: pass
Date: Mon Aug 26 21:30:11 2024
InstallationDate: Installed on 2024-08-16 (11 days ago)
InstallationMedia: Ubuntu 24.04 LTS "Noble Numbat" - Release amd64 (20240424)
ProcEnviron:
 LANG=en_US.UTF-8
 PATH=(custom, no user)
 SHELL=/bin/bash
 TERM=xterm-256color
 XDG_RUNTIME_DIR=<set>
SourcePackage: zfs-linux
UpgradeStatus: No upgrade log present (probably fresh install)
modified.conffile..etc.sudoers.d.zfs: [inaccessible: [Errno 13] Permission denied: '/etc/sudoers.d/zfs']

Revision history for this message
Erik Hortsch (hortsche) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in zfs-linux (Ubuntu):
status: New → Confirmed
Simon Déziel (sdeziel)
summary: - crash in openzfs - 2.22 not supported on 6.8
+ crash in openzfs - 2.2.2 not supported on 6.8
Revision history for this message
Aleksandr Mikhalitsyn (mihalicyn) wrote (last edit ):

I have some evidences that potentially we have even more serious trouble with ZFS on Ubuntu Noble 6.8 kernel.

Some time ago LXD team has got a report about LXCFS processes crashes:
https://github.com/lxc/lxcfs/issues/644

After spending really a lot of time checking everything from our side (in LXCFS) I started to have a feeling that problem can be with the kernel (memory corruption).

Issue reported was using all standard Ubuntu Noble packages - ZFS from Noble and kernel from Noble. Also LXD v5.21.1 LTS snap.

I've instructed him to try an upstream version of ZFS module with the same kernel (no luck there).
Then, I've instructed user to try an recent upstream Linux kernel version (same, no luck).

Then user downgraded to 6.6.x series kernel + upstream recent ZFS. And LXCFS stopped crashing and everything works just fine for him. And that's without any changes on the LXD snap side (LXCFS, LXC, LXD - all the same!)

Now we have another report:
https://github.com/canonical/lxd/issues/14178

I decided to share this information in this issue instead of filling a new one so we can collect all the problems with ZFS on a new kernels in one place.

To conclude: all points that we have a bug in ZFS which leads to a kernel/userspace memory corruption...

Revision history for this message
John Cabaj (john-cabaj) wrote :

Acknowledging that I'm looking into this alongside some ZFS bugs for other releases. Any concrete steps to reproduce (preferably via LXD) are welcome, though I'll search through the various issue links raised above to see if there's any reproducer there as well.

Revision history for this message
Aleksandr Mikhalitsyn (mihalicyn) wrote :

Hey John!

Of course, if I have any piece of information about this one (like reproducer) I'll share it with you. But for now, we only have these two LXD issues (links are in comment #3) and that's it.

Also you may look into this discussion:
https://github.com/openzfs/zfs/issues/16324

Also, I'm curious if we have any plans to update our Noble's ZFS version to 2.2.6, as it contains a lot of fixes like:
https://github.com/openzfs/zfs/commit/fa2480f5b3bc1d265304a950c1b5ab05497853cd
https://github.com/openzfs/zfs/commit/859f906a4b4dfc132115f10f5d372ef4281a6479
https://github.com/openzfs/zfs/commit/0f9457d1dd7a56ceccda962433281ef946132b5f
and many others.

Kind regards,
Alex

Revision history for this message
John Cabaj (john-cabaj) wrote :

Seems there a couple of potential fixes upstream - https://github.com/openzfs/zfs/pull/16770/commits/96c82faf731594f4fadf1166c37749e8e05cd9b2 and https://github.com/openzfs/zfs/pull/16788/commits/1c6da1a60a37730fa635d02f45d266e7177cc6c6.

Will try to get a package together tonight and run through autopkgtests. But I don't have a reproducer at the moment.

Revision history for this message
Andre Toerien (athepeanut4) wrote :

I have been experiencing random process crashes and kernel panics ever since I upgraded to 24.04. The first was due to a btrfs bug (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2080039), but they've continued even after the fix for that was released, although much less frequently.

The crashes seem to generally occur when the system is under I/O pressure, on both ZFS and non-ZFS filesystems. Used memory is <6GB/16GB, but the remainder is basically all used for disk caching - so it's effectively under memory pressure as well.

I set up kdump after the first crash I got since the above bug was fixed, and figured out how to do basic dump analysis. From what I can tell, the kernel panics are either page fault related, or illegal operation exceptions (i.e. memory corruption).

The process crashes are always for the process causing the high disk usage (a python downloader script), and are either SIGILL's or SIGSEGV's.

From a comment above I've just found https://github.com/canonical/lxd/issues/14178#issuecomment-2402322735, and have now enabled both KFENCE and SLUB debugging, so we'll see if I get any dmesg error messages.

So that is all to say, an upgrade to 2.2.6 on noble would be very much appreciated. I'd also be happy to test anything and try to force a crash.

Thanks!

Revision history for this message
John Cabaj (john-cabaj) wrote :

Including 2.2.2-0ubuntu10.debdiff for updates.

tags: added: patch
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.