zfs PANIC: accessing past end of object in 0.8.3-1ubuntu12.4

Bug #1904589 reported by silverwind
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
zfs-linux (Ubuntu)
Fix Released
Medium
Colin Ian King
Focal
Fix Released
High
Colin Ian King
Groovy
Fix Released
Medium
Colin Ian King
Hirsute
Fix Released
Medium
Colin Ian King

Bug Description

[Impact]

zfs_write() doesn't properly account partial copies done by copy_from_user(), causing accesses past the end of objects and triggering kernel panics.

[Test case]

The problem seems to be workload specific, there is not a specific test case to reproduce the problem, but the bug seems to be pretty well identified by the upstream commit reported below.

[Fix]

Apply upstream commit c9e3efdb3a6111b9795becc6594b3c52ba004522 ("Bugfix/fix uio partial copies").

[Regression potential]

Upstream commit that is basically fixing potential out-of-bounds accesses by properly checking partial copies done by copy_from_user() and preventing kernel panics. Regression potential is minimal: it seems unlikely to break other things if this change is applied.

[Original bug report]

Using latest zfs 0.8.3-1ubuntu12.4 on latest Ubuntu 20.04.1, I observe a rare zfs panics that seem to be workload-specific which render a server mostly unresponsive besides ssh still working. Attempting to reboot the server in this state makes the shutdown hang forever.

You may want to consider backporting the fix released in zfs 0.8.4 into 20.04: https://github.com/openzfs/zfs/pull/10148

Log sample of panic:
```
Nov 17 16:06:15 hostname kernel: [3385134.716024] PANIC: zfs: accessing past end of object c1c/2db52f (size=17408 access=7492+16428)
Nov 17 16:06:15 hostname kernel: [3385134.716072] Showing stack for process 3166846
Nov 17 16:06:15 hostname kernel: [3385134.716074] CPU: 25 PID: 3166846 Comm: node Tainted: P O 5.4.0-48-generic #52-Ubuntu
Nov 17 16:06:15 hostname kernel: [3385134.716075] Hardware name: <hardware>
Nov 17 16:06:15 hostname kernel: [3385134.716076] Call Trace:
Nov 17 16:06:15 hostname kernel: [3385134.716085] dump_stack+0x6d/0x9a
Nov 17 16:06:15 hostname kernel: [3385134.716097] spl_dumpstack+0x29/0x2b [spl]
Nov 17 16:06:15 hostname kernel: [3385134.716102] vcmn_err.cold+0x60/0x99 [spl]
Nov 17 16:06:15 hostname kernel: [3385134.716106] ? _cond_resched+0x19/0x30
Nov 17 16:06:15 hostname kernel: [3385134.716108] ? __kmalloc_node+0x20e/0x330
Nov 17 16:06:15 hostname kernel: [3385134.716113] ? spl_kmem_alloc_impl+0xa8/0x100 [spl]
Nov 17 16:06:15 hostname kernel: [3385134.716190] ? __list_add+0x17/0x40 [zfs]
Nov 17 16:06:15 hostname kernel: [3385134.716235] zfs_panic_recover+0x6f/0x90 [zfs]
Nov 17 16:06:15 hostname kernel: [3385134.716272] ? dsl_dir_tempreserve_impl.isra.0.constprop.0+0xed/0x330 [zfs]
Nov 17 16:06:15 hostname kernel: [3385134.716305] dmu_buf_hold_array_by_dnode+0x3a0/0x490 [zfs]
Nov 17 16:06:15 hostname kernel: [3385134.716338] dmu_write_uio_dnode+0x4c/0x140 [zfs]
Nov 17 16:06:15 hostname kernel: [3385134.716370] dmu_write_uio_dbuf+0x4f/0x70 [zfs]
Nov 17 16:06:15 hostname kernel: [3385134.716416] zfs_write+0xa1f/0xd40 [zfs]
Nov 17 16:06:15 hostname kernel: [3385134.716419] ? d_absolute_path+0x74/0xb0
Nov 17 16:06:15 hostname kernel: [3385134.716421] ? __switch_to_asm+0x34/0x70
Nov 17 16:06:15 hostname kernel: [3385134.716423] ? __switch_to_asm+0x40/0x70
Nov 17 16:06:15 hostname kernel: [3385134.716424] ? __switch_to_asm+0x40/0x70
Nov 17 16:06:15 hostname kernel: [3385134.716425] ? __switch_to_asm+0x34/0x70
Nov 17 16:06:15 hostname kernel: [3385134.716427] ? __switch_to_asm+0x34/0x70
Nov 17 16:06:15 hostname kernel: [3385134.716474] zpl_write_common_iovec+0xad/0x120 [zfs]
Nov 17 16:06:15 hostname kernel: [3385134.716567] zpl_iter_write+0x56/0x90 [zfs]
Nov 17 16:06:15 hostname kernel: [3385134.716570] do_iter_write+0x84/0x1a0
Nov 17 16:06:15 hostname kernel: [3385134.716574] ? futex_wake+0x8b/0x180
Nov 17 16:06:15 hostname kernel: [3385134.716577] do_writev+0x71/0x120
Nov 17 16:06:15 hostname kernel: [3385134.716581] do_syscall_64+0x57/0x190
Nov 17 16:06:15 hostname kernel: [3385134.716584] RIP: 0033:0x7fa366eee0cd
Nov 17 16:06:15 hostname kernel: [3385134.716587] RSP: 002b:00007fa35e7fbde0 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
Nov 17 16:06:15 hostname kernel: [3385134.716590] RDX: 000000000000000c RSI: 000000000651c7b0 RDI: 000000000000001d
```

silverwind (silverwind+)
summary: - zfs: accessing past end of object in 0.8.3-1ubuntu12.4
+ zfs PANIC: accessing past end of object in 0.8.3-1ubuntu12.4
Changed in zfs-linux (Ubuntu):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
Changed in zfs-linux (Ubuntu Hirsute):
status: In Progress → Fix Released
Changed in zfs-linux (Ubuntu Groovy):
status: New → Fix Released
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
Changed in zfs-linux (Ubuntu Focal):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Colin Ian King (colin-king) wrote :

Upstream fix that has been backported is attached

Revision history for this message
Colin Ian King (colin-king) wrote :

I've uploaded the fixed package, it will be available in the -proposed pocket for SRU testing at some point in the near future.

Andrea Righi (arighi)
description: updated
Revision history for this message
Andrea Righi (arighi) wrote :

New debdiff for focal applied on top of the latest 0.8.3-1ubuntu12.6.

Revision history for this message
Snorre Selmer (snorre-selmer-k) wrote :

Any chance of a fix for Focal Fossa? I'm seeing this issue crash my server about 1-2 times per month...

Revision history for this message
Colin Ian King (colin-king) wrote :

I'll see why this has got stuck and sort this out for focal.

Revision history for this message
Colin Ian King (colin-king) wrote :

the focal package with this fix is now awaiting approval by the distro manager; then it will be available in the -proposed pocket for testing.

Revision history for this message
Snorre Selmer (snorre-selmer-k) wrote :

Thank you!

Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello silverwind, or anyone else affected,

Accepted zfs-linux into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/zfs-linux/0.8.3-1ubuntu12.10 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in zfs-linux (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Colin Ian King (colin-king) wrote :

@Snorre, can you verify that this bug fixes your ZFS issues? If you can do that then we can release the fix. Thank you!

Revision history for this message
Colin Ian King (colin-king) wrote :

I've run this through all the ubuntu kernel team ZFS test suite and also given this a full soak test most of this afternoon with multiple concurrent stress-ng file system I/O stress tests and not been able to trip this issue.

In my opinion ZFS is functioning correctly and I've not seen this error occur. This is a hard to reproduce error, the fix definitely has not regressed ZFS according to our testing, so my opinion is that this has passed my level of sufficient SRU testing.

tags: added: verification-done-focal
tags: added: verification-done
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

I think this was aging in -proposed long enough. Let's proceed with the release.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package zfs-linux - 0.8.3-1ubuntu12.10

---------------
zfs-linux (0.8.3-1ubuntu12.10) focal; urgency=medium

  * fix uio partial copies (LP: #1904589)

 -- Andrea Righi <email address hidden> Thu, 10 Jun 2021 14:49:23 +0100

Changed in zfs-linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for zfs-linux has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.