btrfs balance leads to kernel panic

Bug #1529146 reported by David Miller
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Linux
Expired
Undecided
linux (Ubuntu)
Triaged
High
Unassigned
Wily
Triaged
High
Unassigned

Bug Description

Running rebalance on a 3 drive btrfs volume with both data and metadata set to raid1 leads to a kernel panic. In my case everything is on BTRFS so the only way I was able to capture the attached backtrace was by sending syslog messages to another server.

ProblemType: Bug
DistroRelease: Ubuntu 15.10
Package: linux-image-4.2.0-22-generic 4.2.0-22.27
ProcVersionSignature: Ubuntu 4.2.0-22.27-generic 4.2.6
Uname: Linux 4.2.0-22-generic x86_64
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Dec 24 10:52 seq
 crw-rw---- 1 root audio 116, 33 Dec 24 10:52 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.19.1-0ubuntu5
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Thu Dec 24 13:47:46 2015
HibernationDevice: RESUME=UUID=9087f95c-b9c5-4b2e-8837-758c7f9e978c
InstallationDate: Installed on 2015-12-04 (19 days ago)
InstallationMedia: Ubuntu-Server 15.10 "Wily Werewolf" - Release amd64 (20151021)
MachineType: Supermicro H8SGL
PciMultimedia:

ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/@boot/vmlinuz-4.2.0-22-generic root=UUID=341b28d7-2939-4b10-afc8-b865fd46eec6 ro rootflags=subvol=@
RelatedPackageVersions:
 linux-restricted-modules-4.2.0-22-generic N/A
 linux-backports-modules-4.2.0-22-generic N/A
 linux-firmware 1.149.3
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 11/25/2013
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 3.5
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: H8SGL
dmi.board.vendor: Supermicro
dmi.board.version: 1234567890
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 17
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 1234567890
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr3.5:bd11/25/2013:svnSupermicro:pnH8SGL:pvr1234567890:rvnSupermicro:rnH8SGL:rvr1234567890:cvnSupermicro:ct17:cvr1234567890:
dmi.product.name: H8SGL
dmi.product.version: 1234567890
dmi.sys.vendor: Supermicro

Revision history for this message
In , Mikhail (mikhail-redhat-bugs) wrote :
Download full text (3.8 KiB)

Additional info:
reporter: libreport-2.6.3
kernel BUG at fs/btrfs/extent-tree.c:1833!
invalid opcode: 0000 [#1] SMP
Modules linked in: xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nls_utf8 isofs rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge ebtable_filter ebtables ip6table_mangle ip6table_raw ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_security ip6table_filter ip6_tables iptable_mangle iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_security bnep snd_usb_audio snd_usbmidi_lib snd_rawmidi hid_logitech_hidpp btusb btrtl btbcm btintel bluetooth gspca_zc3xx gspca_main videodev uas media joydev usb_storage hid_logitech_dj rfkill btrfs xor intel_rapl iosf_mbi x86_pkg_temp_thermal snd_hda_codec_hdmi snd_hda_codec_realtek coretemp
 snd_hda_codec_generic snd_hda_codec_ca0132 iTCO_wdt iTCO_vendor_support kvm_intel vfat ppdev fat kvm snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq crct10dif_pclmul snd_seq_device crc32_pclmul crc32c_intel snd_pcm raid6_pq snd_timer snd mei_me mei soundcore shpchp i2c_i801 lpc_ich parport_pc tpm_infineon parport tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc i915 i2c_algo_bit drm_kms_helper drm 8021q garp stp llc mrp serio_raw r8169 mii video
CPU: 7 PID: 10775 Comm: kworker/u16:15 Not tainted 4.2.6-301.fc23.x86_64 #1
Hardware name: Gigabyte Technology Co., Ltd. Z87M-D3H/Z87M-D3H, BIOS F11 08/12/2014
Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
task: ffff8805c042bb00 ti: ffff88074d748000 task.ti: ffff88074d748000
RIP: 0010:[<ffffffffa05e9af7>] [<ffffffffa05e9af7>] insert_inline_extent_backref+0xe7/0xf0 [btrfs]
RSP: 0018:ffff88074d74baa8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff880000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffff88074d74bb28 R08: 0000000000004000 R09: ffff88074d74b9a0
R10: 0000000000000000 R11: 0000000000000003 R12: ffff8807ec932800
R13: ffff88048185d090 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88081e3c0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000013dd12294020 CR3: 0000000001c0b000 CR4: 00000000001406e0
Stack:
 0000000000000000 0000000000000005 0000000000000000 0000000000000000
 0000000000000001 ffffffff81200206 ffff88074d74bb08 ffffffffa05dcd4a
 00000000000033e8 000000004e7e87f3 ffff8807ea905800 0000000000000000
Call Trace:
 [<ffffffff81200206>] ? kmem_cache_alloc+0x1d6/0x210
 [<ffffffffa05dcd4a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
 [<ffffffffa05e9f99>] __btrfs_inc_extent_ref.isra.52+0xa9/0x270 [btrfs]
 [<ffffffffa05ef6b4>] __btrfs_run_delayed_refs+0xc84/0x1080 [btrfs]
 [<ffffffff811a5347>] ? mempool_free_slab+0x17/0x20
 [<ffffffff812001c3>] ? kmem_cache_alloc+0x193/0x210
 [<ffffffffa05f2674>] btrfs_run_delayed_refs.part.73+0x74/0x270 [btrfs]
 [<ffffffffa05f290e>] delayed_ref_async_start+0x7e/0x90 [btrfs]
 [<ffffffffa06384b2>] btrfs_scrubparity_helper+0xc2/0x260 [btrfs]
 [<ffffffffa063868e>] btrfs_extent_refs_helper+0xe/0...

Read more...

Revision history for this message
In , Mikhail (mikhail-redhat-bugs) wrote :

Created attachment 1101363
File: dmesg

Revision history for this message
In , Mikhail (mikhail-redhat-bugs) wrote :

Occured after btrfs balance start
# btrfs balance start /home -v

Revision history for this message
In , Sinan (sinan-redhat-bugs) wrote :

I do confirm that btrfs balance causes hard lockups here, with the same error. The raid1 is freshly modified (sdd is brand new) so had a lot to balance - didnt see the problem with the previous sdd that got replaced. Letting it balance when the box is idle (nighttime, no logged users) was OK, bu trying to do stuff at the same time locked it randomly.

This is my layout:

$ btrfs fi show /mnt/btrfs_pool/
Label: none uuid: 8aef485a-3e5e-4f45-96dd-44ee9d41fb09
        Total devices 2 FS bytes used 227.66GiB
        devid 1 size 931.51GiB used 229.03GiB path /dev/sdd1
        devid 2 size 931.51GiB used 229.03GiB path /dev/sdc1

$ btrfs fi df /mnt/btrfs_pool/
Data, RAID1: total=227.00GiB, used=226.36GiB
System, RAID1: total=32.00MiB, used=48.00KiB
Metadata, RAID1: total=2.00GiB, used=1.29GiB
GlobalReserve, single: total=448.00MiB, used=0.00B

$ btrfs fi show /mnt/backups/
Label: none uuid: 49963f07-64bf-427c-864e-c7a9b4bbcdf4
        Total devices 1 FS bytes used 227.33GiB
        devid 1 size 465.76GiB used 235.02GiB path /dev/sda1

$ uname -a
Linux cygn 4.2.6-301.fc23.x86_64 #1 SMP Fri Nov 20 22:22:41 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
In , Dylan (dylan-redhat-bugs) wrote :

Created attachment 1104585
dmesg

Revision history for this message
David Miller (david3d) wrote :
summary: - btrfs rebalance leads to kernel panic
+ btrfs balance leads to kernel panic
Revision history for this message
David Miller (david3d) wrote :

One more note.

The 4.3.3 kernel from http://kernel.ubuntu.com/~kernel-ppa/mainline/ does not exhibit this behaviour and is what this system is now running.

Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Critical
Revision history for this message
Jeff Billimek (billimek) wrote :

Happens to me too any time I try a balance against either a single drive btrfs partition or a multi-drive btrfs partition. It will usually happen about 20% into the balance operation. I've been able to reproduce this on demand 6 times now.

Ubuntu 15.10
kernel 4.2.0-22-generic
btrfs-progs v4.0

tags: added: kernel-fs
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a kernel version where you were not having this particular problem? This will help determine if the problem you are seeing is the result of a regression, and when this regression was introduced. If this is a regression, we can perform a kernel bisect to identify the commit that introduced the problem.

Changed in linux (Ubuntu):
importance: Critical → High
Changed in linux (Ubuntu Wily):
status: New → Triaged
importance: Undecided → High
tags: added: kernel-da-key
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Also, can you test the latest 4.2 upstream stable kernel? It can be downloaded from:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.2.8-wily/

Revision history for this message
Lucy Llewellyn (lucyllewy) wrote :
Revision history for this message
Jeff Billimek (billimek) wrote :

@jsalisbury, I can verify that running kernel 4.2.8 fixes the issue that was present in 4.2.0-22.

Revision history for this message
In , Laura (laura-redhat-bugs) wrote :

*********** MASS BUG UPDATE **************

We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 23 kernel bugs.

Fedora 23 has now been rebased to 4.7.4-100.fc23. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 24 or 25, and are still experiencing this issue, please change the version to Fedora 24 or 25.

If you experience different issues, please open a new bug report for those.

Revision history for this message
In , Laura (laura-redhat-bugs) wrote :

*********** MASS BUG UPDATE **************
This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 4 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously.

Changed in linux:
importance: Unknown → Undecided
status: Unknown → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.