general protection fault in zfs module

Bug #1749715 reported by Simon Déziel
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
zfs-linux (Ubuntu)
Won't Fix
Medium
Colin Ian King

Bug Description

Got this call trace during a rsync backup of a machine using ZFS:

general protection fault: 0000 [#1] SMP
Modules linked in: ip6table_filter ip6_tables xt_tcpudp xt_conntrack iptable_filter ip_tables x_tables zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) input_leds sch_fq_codel nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack virtio_scsi
CPU: 0 PID: 4238 Comm: rsync Tainted: P O 4.4.0-112-generic #135-Ubuntu
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff880078a4f2c0 ti: ffff880047c28000 task.ti: ffff880047c28000
RIP: 0010:[<ffffffffc03135b3>] [<ffffffffc03135b3>] avl_insert+0x33/0xe0 [zavl]
RSP: 0018:ffff880047c2bc20 EFLAGS: 00010246
RAX: 0000000000000001 RBX: ffff880043b46200 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 001f880043b46208 RDI: ffff88005aa0c9a8
RBP: ffff880047c2bc20 R08: 0000000000000000 R09: ffff88007d001700
R10: ffff880043b46200 R11: 0000000000000246 R12: ffff88005aa0c9a8
R13: ffff880043b46200 R14: 0000000000000000 R15: ffff88005aa0c9a8
FS: 00007f04124ec700(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd25c1cb8c CR3: 0000000047cb0000 CR4: 0000000000000670
Stack:
 ffff880047c2bc68 ffffffffc0313721 0000000000000000 0020000000000008
 ffff880043b46200 ffff88005aa0c8c8 0000000000006b34 0000000000000000
 ffff88005aa0c9a8 ffff880047c2bcc8 ffffffffc04609ee 0000000000000000
Call Trace:
 [<ffffffffc0313721>] avl_add+0x71/0xa0 [zavl]
 [<ffffffffc04609ee>] zfs_range_lock+0x3ee/0x5e0 [zfs]
 [<ffffffffc0416e4c>] ? rrw_enter_read_impl+0xbc/0x160 [zfs]
 [<ffffffffc0465c90>] zfs_read+0xd0/0x3c0 [zfs]
 [<ffffffff9b39922d>] ? profile_path_perm.part.7+0x7d/0xa0
 [<ffffffffc04818f0>] zpl_read_common_iovec+0x80/0xd0 [zfs]
 [<ffffffffc0482430>] zpl_iter_read+0xa0/0xd0 [zfs]
 [<ffffffff9b211134>] new_sync_read+0x94/0xd0
 [<ffffffff9b211196>] __vfs_read+0x26/0x40
 [<ffffffff9b211756>] vfs_read+0x86/0x130
 [<ffffffff9b2124a5>] SyS_read+0x55/0xc0
 [<ffffffff9b8456c7>] ? entry_SYSCALL_64_after_swapgs+0xd1/0x18c
 [<ffffffff9b8457ad>] entry_SYSCALL_64_fastpath+0x2b/0xe7
Code: 83 e2 01 48 03 77 10 49 83 e0 fe 8d 04 95 00 00 00 00 55 4c 89 c1 48 83 47 18 01 83 e0 04 48 83 c9 01 48 89 e5 48 09 c8 4d 85 c0 <48> c7 06 00 00 00 00 48 c7 46 08 00 00 00 00 48 89 46 10 0f 84
RIP [<ffffffffc03135b3>] avl_insert+0x33/0xe0 [zavl]
 RSP <ffff880047c2bc20>
---[ end trace c4ba4478b6002697 ]---

This is the first time it happens but I'll report any future occurrence in here.

Additional info:

$ lsb_release -rd
Description: Ubuntu 16.04.3 LTS
Release: 16.04

$ apt-cache policy linux-image-4.4.0-112-generic zfsutils-linux
linux-image-4.4.0-112-generic:
  Installed: 4.4.0-112.135
  Candidate: 4.4.0-112.135
  Version table:
 *** 4.4.0-112.135 500
        500 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages
        100 /var/lib/dpkg/status
zfsutils-linux:
  Installed: 0.6.5.6-0ubuntu18
  Candidate: 0.6.5.6-0ubuntu18
  Version table:
 *** 0.6.5.6-0ubuntu18 500
        500 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     0.6.5.6-0ubuntu8 500
        500 http://archive.ubuntu.com/ubuntu xenial/universe amd64 Packages

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-112-generic 4.4.0-112.135
ProcVersionSignature: Ubuntu 4.4.0-112.135-generic 4.4.98
Uname: Linux 4.4.0-112-generic x86_64
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Feb 14 16:19 seq
 crw-rw---- 1 root audio 116, 33 Feb 14 16:19 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu2.15
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: [Errno 2] No such file or directory: 'fuser'
CRDA: N/A
CurrentDmesg: Error: command ['dmesg'] failed with exit code 1: dmesg: read kernel buffer failed: Operation not permitted
Date: Thu Feb 15 08:45:07 2018
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lspci: Error: [Errno 2] No such file or directory: 'lspci'
Lsusb: Error: [Errno 2] No such file or directory: 'lsusb'
MachineType: QEMU Standard PC (i440FX + PIIX, 1996)
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 EFI VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-112-generic root=UUID=db4864d4-cc2e-40c7-bc2b-a14bc0f09c9f ro console=ttyS0 net.ifnames=0 kaslr nmi_watchdog=0 possible_cpus=1 vsyscall=none pti=on
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-112-generic N/A
 linux-backports-modules-4.4.0-112-generic N/A
 linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/01/2014
dmi.bios.vendor: SeaBIOS
dmi.bios.version: Ubuntu-1.8.2-1ubuntu1
dmi.chassis.type: 1
dmi.chassis.vendor: QEMU
dmi.chassis.version: pc-i440fx-2.5
dmi.modalias: dmi:bvnSeaBIOS:bvrUbuntu-1.8.2-1ubuntu1:bd04/01/2014:svnQEMU:pnStandardPC(i440FX+PIIX,1996):pvrpc-i440fx-2.5:cvnQEMU:ct1:cvrpc-i440fx-2.5:
dmi.product.name: Standard PC (i440FX + PIIX, 1996)
dmi.product.version: pc-i440fx-2.5
dmi.sys.vendor: QEMU

Revision history for this message
Simon Déziel (sdeziel) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1749715

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
affects: linux (Ubuntu) → zfs-linux (Ubuntu)
Changed in zfs-linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Triaged
tags: added: kernel-da-key
Changed in zfs-linux (Ubuntu):
status: Triaged → In Progress
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Colin Ian King (colin-king) wrote :

There is a similar issue with upstream bug report https://github.com/zfsonlinux/zfs/issues/6781 - this is not identical but does have an AVL corruption issue that may be the same root issue.

Note: RSI register is a bit weird, which maybe indicative of AVL corruption.

Revision history for this message
Colin Ian King (colin-king) wrote :

If this occurs again, please try:

echo 0 | sudo tee /sys/module/spl/parameters/spl_taskq_thread_dynamic

..and see if this stops the issue from repeating.

Changed in zfs-linux (Ubuntu):
status: In Progress → Incomplete
Changed in zfs:
status: Unknown → New
Revision history for this message
Colin Ian King (colin-king) wrote :

@Simon, any feedback from the suggested help in comment #4?

Revision history for this message
Simon Déziel (sdeziel) wrote :

@Colin, sorry for the delay, I just checked and /sys/module/spl/parameters/spl_taskq_thread_dynamic defaults to 0 already.

That said, the issue happens only occasionally. Since I first reported it, there was only one other event during which it occurred several times in a row (between 2018-03-14 and 2018-03-15). I rebooted the VM and have yet to see it happen again. I still have the logs from those if that could be of any use to you.

Thanks for taking such good care of ZoL for Ubuntu!

Changed in zfs-linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Colin Ian King (colin-king) wrote :

The logs would be useful, maybe I can figure out something from them. Thanks!

Revision history for this message
Simon Déziel (sdeziel) wrote :

The machine is named "smb" and the attached logs were extract from syslog with the pattern '^Mar 1[45] [0-9:]+ smb kernel: '.

Revision history for this message
Colin Ian King (colin-king) wrote :

@Simon, the log in comment #8 contains different crashes so I'm going to focus on the original crash report on comment #1. Has this problem re-occurred with more recent kernels?

no longer affects: zfs
Revision history for this message
Colin Ian King (colin-king) wrote :

After debugging the object code, I can see that the error occurs because of corruption in the internal AVL tree; the bug occurs during an insertion into the AVL tree in avl_insert(), namely when nullifying node->avl_child[0]:

        node->avl_child[0] = NULL;
        node->avl_child[1] = NULL;

From what I gather, it looks like there is some internal memory corruption probably causing this issue. Without a full kernel core I can't track this back much further, so my current hunch is that this may not be a software error after all. I've had an extensive hunt around and cannot find similar breakage patterns, so I'm fairly confident this may be a one-off memory issue. I'm going to close this as Won't Fix, but if it happens again, please feel free to re-open the bug.

Changed in zfs-linux (Ubuntu):
status: Confirmed → Won't Fix
Revision history for this message
Simon Déziel (sdeziel) wrote : Re: [Bug 1749715] Re: general protection fault in zfs module

On 2018-04-24 11:29 AM, Colin Ian King wrote:
> Has this problem re-occurred with more recent kernels?

No it has not occurred again, I'll let you know if it does. Thanks for
investigating!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.