Brick SEGFAULTs in 11.1
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
glusterfs (Ubuntu) | Status tracked in Oracular | |||||
Noble |
Fix Released
|
Undecided
|
Bryce Harrington | |||
Oracular |
Fix Released
|
Undecided
|
Bryce Harrington |
Bug Description
[ Impact ]
* Users experience brick SEGFAULTs under certain not-yet-understood scenarios. Some reports include a high percentage of small file I/O. I encountered the issue roughly every hour with Minio backed by GlusterFS on ZFS.
* This bug introduces an increased risk of data loss or corruption depending on the user's configuration and timing of brick crashes.
* Core dumps from multiple users revealed that the SEGFAULTs are caused by a stack overflow when namespaced inodes are destroyed.
* The patch removes the recursive call to inode_unref when a namespaced inode is destroyed.
[ Test Plan ]
* I experienced brick crashes on specific volumes about once per hour. On my system, this issue only impacted a locally mounted volume backing a Minio instance (an S3 API compatible server) used by Restic clients (an incremental backup system with lots of small file creations and deletions). Other volumes served with NFS Ganesha with primarily large file random access never triggered it.
* I attempted to replicate the workload by running various file system benchmarking tools within their own user namespace (i.e. lots of small file creations and deletion) but was not able to replicate the crash.
* I've been running the proposed patch since 2024-05-06 and haven't experienced a single crash.
* The test plan is to run the packages from proposed for at least a day, under the same load as when the bug happened, and confirm that the crashes reported in this bug no longer happen.
[ Where problems could occur ]
* It's conceivable that this patch introduces undesired behavior when inodes are destroyed, however I highly doubt this scenario as __inode_destroy was not recursive before the change which introduced the bug.
[ Other Info ]
* PR which introduced the bug: https:/
* PR which added this patch: https:/
* Issue discussion: https:/
Related branches
- git-ubuntu bot: Approve
- Bryce Harrington (community): Approve
- Canonical Server Reporter: Pending requested
-
Diff: 125 lines (+84/-1)4 files modifieddebian/changelog (+35/-0)
debian/control (+2/-1)
debian/patches/20-fix-stack-overflow-in-inode-destroy.diff (+46/-0)
debian/patches/series (+1/-0)
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Changed in glusterfs (Ubuntu Noble): | |
assignee: | nobody → Nick O'Connor (nick-oconnor) |
status: | Triaged → In Progress |
summary: |
- Gluster 11.1 brick SEGFAULT + Brick SEGFAULTs in 11.1 |
Changed in glusterfs (Ubuntu Noble): | |
assignee: | Nick O'Connor (nick-oconnor) → nobody |
Changed in glusterfs (Ubuntu Oracular): | |
assignee: | nobody → Athos Ribeiro (athos-ribeiro) |
Changed in glusterfs (Ubuntu Noble): | |
assignee: | nobody → Athos Ribeiro (athos-ribeiro) |
Changed in glusterfs (Ubuntu Oracular): | |
assignee: | Athos Ribeiro (athos-ribeiro) → Bryce Harrington (bryce) |
Changed in glusterfs (Ubuntu Noble): | |
assignee: | Athos Ribeiro (athos-ribeiro) → Bryce Harrington (bryce) |
Changed in glusterfs (Ubuntu Noble): | |
status: | Fix Committed → Incomplete |
description: | updated |
description: | updated |
I've recompiled glusterfs locally with the changes. I can confirm the fix linked above addresses the issue.