Kernel NULL pointer dereference while receiving zfs snapshots

Bug #1870559 reported by Andreas Hasenack
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Colin Ian King

Bug Description

I was transferring my zfs snapshots to the backup server, when suddenly it stalled. I checked dmesg on the server and found a "kernel bug" entry.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-21-generic 5.4.0-21.25
ProcVersionSignature: Ubuntu 5.4.0-21.25-generic 5.4.27
Uname: Linux 5.4.0-21-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
AlsaVersion: Advanced Linux Sound Architecture Driver Version k5.4.0-21-generic.
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu22
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/pcmC0D4p', '/dev/snd/pcmC0D3p', '/dev/snd/pcmC0D2p', '/dev/snd/pcmC0D1p', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer'
Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer'
Date: Fri Apr 3 14:48:19 2020
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: System manufacturer System Product Name
ProcFB: 0 nouveaudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-21-generic root=UUID=7934268a-8e6f-11e8-828e-00133b1029de ro mitigations=off maybe-ubiquity
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-21-generic N/A
 linux-backports-modules-5.4.0-21-generic N/A
 linux-firmware 1.187
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: Upgraded to focal on 2020-01-19 (75 days ago)
dmi.bios.date: 12/23/2008
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 0214
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: P5LD2-X/1333
dmi.board.vendor: ASUSTeK Computer INC.
dmi.board.version: Rev x.xx
dmi.chassis.asset.tag: Asset-1234567890
dmi.chassis.type: 3
dmi.chassis.vendor: ASUSTek Computer INC.
dmi.chassis.version: Rev 1.xx
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr0214:bd12/23/2008:svnSystemmanufacturer:pnSystemProductName:pvrRev1.xx:rvnASUSTeKComputerINC.:rnP5LD2-X/1333:rvrRevx.xx:cvnASUSTekComputerINC.:ct3:cvrRev1.xx:
dmi.product.family: To Be Filled By O.E.M.
dmi.product.name: System Product Name
dmi.product.sku: To Be Filled By O.E.M.
dmi.product.version: Rev 1.xx
dmi.sys.vendor: System manufacturer

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Colin Ian King (colin-king) wrote :
Download full text (4.7 KiB)

For reference, the splat was:

[ 2465.077373] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 2465.077397] #PF: supervisor read access in kernel mode
[ 2465.077403] #PF: error_code(0x0000) - not-present page
[ 2465.077409] PGD 0 P4D 0
[ 2465.077415] Oops: 0000 [#1] SMP NOPTI
[ 2465.077422] CPU: 0 PID: 23215 Comm: receive_writer Tainted: P O 5.4.0-21-generic #25-Ubuntu
[ 2465.077433] Hardware name: System manufacturer System Product Name/P5LD2-X/1333, BIOS 0214 12/23/2008
[ 2465.077619] RIP: 0010:abd_verify+0xa/0x40 [zfs]
[ 2465.077626] Code: ff 85 c0 74 12 48 c7 03 00 00 00 00 48 c7 43 08 00 00 00 00 5b 5d c3 e8 04 ff ff ff eb e7 c3 90 55 48 89 e5 41 54 53 48 89 fb <8b> 3f e8 0f ff ff ff 85 c0 75 22 44 8b 63 1c 48 8b 7b 20 4d 85 e4
[ 2465.077642] RSP: 0018:ffffb28b51f6f998 EFLAGS: 00010282
[ 2465.077649] RAX: 0000000000004000 RBX: 0000000000000000 RCX: 0000000000000000
[ 2465.077656] RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
[ 2465.077662] RBP: ffffb28b51f6f9a8 R08: 00000000000003da R09: 000000000045e54c
[ 2465.077669] R10: 00000000000036bf R11: 0000000000000000 R12: 0000000000004000
[ 2465.077675] R13: ffff8ff6af5bd5f0 R14: 0000000000004000 R15: 0000000000000000
[ 2465.077683] FS: 0000000000000000(0000) GS:ffff8ff70c600000(0000) knlGS:0000000000000000
[ 2465.077691] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2465.077697] CR2: 0000000000000000 CR3: 0000000058e50000 CR4: 00000000000006f0
[ 2465.077704] Call Trace:
[ 2465.077754] abd_borrow_buf+0x19/0x60 [zfs]
[ 2465.077801] abd_borrow_buf_copy+0x1a/0x50 [zfs]
[ 2465.077873] zio_crypt_copy_dnode_bonus+0x30/0x130 [zfs]
[ 2465.077922] arc_buf_untransform_in_place.isra.0+0x2b/0x40 [zfs]
[ 2465.077971] arc_buf_fill+0x1f0/0x4a0 [zfs]
[ 2465.078021] arc_untransform+0x22/0x90 [zfs]
[ 2465.078070] dbuf_read_verify_dnode_crypt+0xed/0x160 [zfs]
[ 2465.078141] ? atomic_cmpxchg+0x16/0x30 [zfs]
[ 2465.078191] dbuf_read_impl+0x117/0x610 [zfs]
[ 2465.078240] ? atomic64_add_return+0x12/0x30 [zfs]
[ 2465.078291] dbuf_read+0xcb/0x5f0 [zfs]
[ 2465.078340] ? dbuf_hold_impl+0x2f/0x40 [zfs]
[ 2465.078395] dmu_tx_check_ioerr+0x70/0xd0 [zfs]
[ 2465.078450] dmu_tx_hold_free_impl+0x12c/0x240 [zfs]
[ 2465.078507] dmu_tx_hold_free+0x40/0x50 [zfs]
[ 2465.078559] dmu_free_long_range_impl+0x124/0x350 [zfs]
[ 2465.078612] dmu_free_long_range+0x74/0xc0 [zfs]
[ 2465.078665] dmu_free_long_object+0x27/0xc0 [zfs]
[ 2465.078720] receive_freeobjects.isra.0+0x7a/0x100 [zfs]
[ 2465.078777] receive_process_record+0x89/0x1c0 [zfs]
[ 2465.078833] receive_writer_thread+0x9a/0x150 [zfs]
[ 2465.078889] ? receive_process_record+0x1c0/0x1c0 [zfs]
[ 2465.078910] thread_generic_wrapper+0x83/0xa0 [spl]
[ 2465.078919] kthread+0x104/0x140
[ 2465.078929] ? clear_bit+0x20/0x20 [spl]
[ 2465.078934] ? kthread_park+0x90/0x90
[ 2465.078941] ret_from_fork+0x1f/0x40
[ 2465.078946] Modules linked in: nfnetlink xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bpfilte...

Read more...

Changed in linux (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
status: Confirmed → Triaged
Revision history for this message
Colin Ian King (colin-king) wrote :

Do you mind running the following command and posting the output to the bug?

dpkg -l | grep zfs

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Sorry, I've since reinstalled the server. The same pool is there, and it's still focal, but the OS disk was reinstalled. I used it to test the subiquity server installer.

Revision history for this message
Colin Ian King (colin-king) wrote :

Urm, it's going to be tricky to debug this as I can't figure out why abd_verify is being shown but the object code isn't able to find this symbol when I dig into it. Any idea if zfs-dkms was being used?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

zfs-dkms was not being used, I had no need for it on this machine. I only have zfs-dkms in my pi4, since the ubuntu kernel there doesn't include the zfs module.

Revision history for this message
Colin Ian King (colin-king) wrote :

I've debugged this back down the stack and there is a null pointer that should be caught by some assert checks but strangely isn't being reported, so a null pointer crash in the abd_verify function is most unexpected.

I've build a zfs debug package that contains more information of internal pointer state that may help debug this further at:

https://launchpad.net/~colin-king/+archive/ubuntu/zfs-sru-1870559

sudo add-apt-repository ppa:colin-king/zfs-sru-1870559
sudo apt-get update
sudo apt-get install zfs-dkms

and then reboot

When you get another crash hopefully there will be more data captured in the kernel log

Revision history for this message
Colin Ian King (colin-king) wrote :

@Ping? Any chance of trying out my debug test debs?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Sorry, I mistook this for a comment on another zfs bug that I was watching.

I haven't seen that crash again since the day I reported it. Since it's so rare (happened just once), do you think it's worth it to try the kernel from the ppa? I'd rather not stay away from the official focal kernel for long.

Revision history for this message
Colin Ian King (colin-king) wrote :

Considering it is a rare issue that does not seem to be reproducible, perhaps it may be a spurious corruption issue of some sort. I've not seen any similar bug reports against ZFS so it is a curious issue.

Perhaps we should close this bug and re-open it if you hit it again.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

+1, this is not like sarnold's crash-with-every-snapshot-send

Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I marked it incomplete so it will expire on its own if there is no new information, but feel free to actually close it if you prefer ("invalid" is fine).

Revision history for this message
Colin Ian King (colin-king) wrote :

Marking it as invalid. Fee free to re-open this bug if it occurs again.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Srdjan Markovic (glueckself) wrote :

Hello!

I've just encountered the bug too. I was receiving a snapshot while I was installing packages in a chroot running on a dataset. Both hung.
I was running 20.10 from the live DVD (downloaded today) with the pre-installed ZFS version. I don't have access to the exact version numbers right now, because I'm running memtest (since this is a brand new laptop, thought might be bad RAM). When it's done, I'll check if the pool survived.

My other systems are running zfs-0.8.3-1ubuntu12.4 on a Kubuntu 20.04 and zfs-0.8.5-2~bpo10+1(/zfs-kmod-0.8.4-2~bpo10+1 - have to reload the module at some point) on a Debian Buster, which are fine for now.

This is my first launchpad bug report, not sure whether I should post here, open a new bug or post it on OpenZFS github issues page. Please excuse me if posting here is wrong.

Revision history for this message
Srdjan Markovic (glueckself) wrote :
Download full text (5.3 KiB)

Sorry - I see right now that the bug is slightly different:

Mine has the NULL pointer dereference in abd_verify while the above posted trace shows abd_borrow_buf.

[ 7081.805511] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 7081.805517] #PF: supervisor read access in kernel mode
[ 7081.805519] #PF: error_code(0x0000) - not-present page
[ 7081.805520] PGD 0 P4D 0
[ 7081.805525] Oops: 0000 [#1] SMP NOPTI
[ 7081.805529] CPU: 5 PID: 312206 Comm: receive_writer Tainted: P O 5.8.0-25-generic #26-Ubuntu
[ 7081.805531] Hardware name: LENOVO 20T9S00K00/20T9S00K00, BIOS R1AET32W (1.08 ) 08/14/2020
[ 7081.805538] RIP: 0010:sg_next+0x0/0x20
[ 7081.805541] Code: cc cc cc cc cc cc cc cc cc cc c7 47 10 00 00 00 00 89 57 0c 48 89 37 89 4f 08 c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <f6> 07 02 75 17 48 8b 57 20 48 83 c7 20 48 89 d0 48 83 e0 fc 83 e2
[ 7081.805543] RSP: 0018:ffffaef209a379e0 EFLAGS: 00010293
[ 7081.805546] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000
[ 7081.805548] RDX: 0000000000004000 RSI: ffff9c6ea4a90000 RDI: 0000000000000000
[ 7081.805549] RBP: ffffaef209a379f8 R08: ffff9c6ea4a93e00 R09: 0000000000000000
[ 7081.805551] R10: 0000000000000000 R11: 0000000000000000 R12: 000000001138482c
[ 7081.805553] R13: ffff9c6ea4a90000 R14: 0000000000004000 R15: ffff9c6d56f2daf0
[ 7081.805555] FS: 0000000000000000(0000) GS:ffff9c6fff940000(0000) knlGS:0000000000000000
[ 7081.805557] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7081.805559] CR2: 0000000000000000 CR3: 0000000111f1a000 CR4: 0000000000340ee0
[ 7081.805561] Call Trace:
[ 7081.805642] ? abd_verify+0x29/0x40 [zfs]
[ 7081.805712] abd_return_buf+0x1c/0x50 [zfs]
[ 7081.805815] zio_crypt_copy_dnode_bonus+0x106/0x130 [zfs]
[ 7081.805886] arc_buf_untransform_in_place.constprop.0+0x2b/0x40 [zfs]
[ 7081.805957] arc_buf_fill+0x219/0x4d0 [zfs]
[ 7081.806028] arc_untransform+0x22/0x90 [zfs]
[ 7081.806100] dbuf_read_verify_dnode_crypt+0xed/0x160 [zfs]
[ 7081.806185] dbuf_read_impl+0x107/0x5e0 [zfs]
[ 7081.806198] ? spl_kmem_free_impl+0x25/0x30 [spl]
[ 7081.806270] dbuf_read+0xc1/0x580 [zfs]
[ 7081.806280] ? spl_kmem_free+0xe/0x10 [spl]
[ 7081.806351] ? dbuf_hold_impl+0x2f/0x40 [zfs]
[ 7081.806430] dmu_tx_check_ioerr+0x70/0xd0 [zfs]
[ 7081.806505] dmu_tx_hold_free_impl+0x128/0x240 [zfs]
[ 7081.806578] dmu_tx_hold_free+0x40/0x50 [zfs]
[ 7081.806659] dmu_free_long_range_impl+0x11f/0x330 [zfs]
[ 7081.806735] dmu_free_long_range+0x74/0xc0 [zfs]
[ 7081.806808] dmu_free_long_object+0x27/0xc0 [zfs]
[ 7081.806888] receive_freeobjects+0x72/0x100 [zfs]
[ 7081.806967] receive_process_record+0x83/0x170 [zfs]
[ 7081.807044] receive_writer_thread+0x9a/0x150 [zfs]
[ 7081.807120] ? spl_fstrans_unmark.isra.0+0x20/0x20 [zfs]
[ 7081.807136] thread_generic_wrapper+0x79/0x90 [spl]
[ 7081.807141] kthread+0x12f/0x150
[ 7081.807151] ? __thread_exit+0x20/0x20 [spl]
[ 7081.807154] ? __kthread_bind_mask+0x70/0x70
[ 7081.807159] ret_from_fork+0x22/0x30
[ 7081.807162] Modules linked in: nfnetlink ufs qnx4 hfsplus hfs minix ntfs msdos btrfs blake2b_generic xor raid6_pq ccm rfcomm cmac algif_hash algif_skcipher af_alg bnep zfs(PO) zunico...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.