Kernel linux-image-6.5.0-44-generic/6.8.0-40-generic appears to have issues with fscrypt running on CephFS backend.

Bug #2073679 reported by Aterfax
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux
Fix Released
Undecided
auto-ceph-devel
linux-signed-hwe-6.5 (Ubuntu)
Won't Fix
Undecided
Philip Cox
linux-signed-hwe-6.8 (Ubuntu)
New
Undecided
Philip Cox

Bug Description

3 Ubuntu VMs using the latest kernel all show the same issue when using an fscrypted folder on CephFS.

No problems occur when the fscrypted folder is not decrypted but after decryption dmesg output shows what looks like a kernel oops and the machines are partially locking up (no access to the Ceph area or fscrypted area) or fully locking up requiring a hard reset.

It is unclear to me whether this is also potentially causing data corruption so probably needs triage before this really ruins someone's day/s.

Reverting to booting the previous kernel (6.5.0-41-generic) fixes the problem, so it seems related to 6.5.0-44-generic only thus far.

I have attached dmesg output.

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: linux-image-6.5.0-41-generic 6.5.0-41.41~22.04.2
ProcVersionSignature: Ubuntu 6.5.0-41.41~22.04.2-generic 6.5.13
Uname: Linux 6.5.0-41-generic x86_64
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
CasperMD5CheckResult: unknown
Date: Sat Jul 20 22:03:31 2024
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=C.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-signed-hwe-6.5
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Aterfax (aterfax) wrote :
Revision history for this message
Aterfax (aterfax) wrote :

I've made <email address hidden> and <email address hidden> lists aware of this bug report.

Revision history for this message
Philip Cox (philcox) wrote :

Thank you for taking the time to open this bug report.

I have looked into the differences between the two kernels 6.5.0-41-generic, and 6.5.0-44-generic that have changes related to ceph.

There are three changes:

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/mantic/commit/?id=451adb3c51e01d0e207822d613a192177e3c9810

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/mantic/commit/?id=5890896daa148cab30828f7a81042132c7b3e094

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/mantic/commit/?id=11037a10f75996d0d3f221986cc2b7e9269cfc2d

The third commit here is the more interesting one. It is a backport of 8e46a2d068c92a905d01cbb018b00d66991585ab. And if we look at the upstream we find this commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=321e3c3de53c7530cd518219d01f04e7e32a9d23

which addresses fixes an issue introduced with this commit.

This patch is already in the noble:

https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/commit/net/ceph?id=321e3c3de53c7530cd518219d01f04e7e32a9d23

As the 6.5 kernels are no longer being updated, and are end of life, we can't fix it in 6.5, and it seems to already be fixed in 6.8.

Changed in linux-signed-hwe-6.5 (Ubuntu):
assignee: nobody → Philip Cox (philcox)
Changed in linux-signed-hwe-6.8 (Ubuntu):
assignee: nobody → Philip Cox (philcox)
Changed in linux-signed-hwe-6.5 (Ubuntu):
status: New → Won't Fix
Changed in linux-signed-hwe-6.8 (Ubuntu):
status: New → Invalid
Changed in kernel:
status: New → Fix Released
Revision history for this message
Aterfax (aterfax) wrote :

Hi Phillip,

Thanks for looking into this. I thought I had booted and given the latest 6.8 HWE a go and ran into the same issue with it still locking up (which was when I added 6.8 to this report).

I'll pick a node and check the most recent kernel version is installed, give it a whirl and report back shortly.

Revision history for this message
Aterfax (aterfax) wrote :

Sorry, confirming this bug is still present within kernel '6.8.0-40-generic'. Same symptoms as before:

"a kernel oops and the machines are partially locking up (no access to the Ceph area or fscrypted area) or fully locking up requiring a hard reset."

Uploading a new dmesg excerpt which shows this.

Changed in linux-signed-hwe-6.8 (Ubuntu):
status: Invalid → New
Aterfax (aterfax)
summary: - Kernel linux-image-6.5.0-44-generic appears to have issues with fscrypt
- running on CephFS backend.
+ Kernel linux-image-6.5.0-44-generic/6.8.0-40-generic appears to have
+ issues with fscrypt running on CephFS backend.
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

  • auto-ceph-devel Edit

Bug watches keep track of this bug in other bug trackers.