btrfs segmentation fault when balancing after adding 4th disk to 3 disk raid5

Bug #1212679 reported by Robert Heel
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

[ 1897.180398] ------------[ cut here ]------------
[ 1897.181058] Kernel BUG at facfde70 [verbose debug info unavailable]
[ 1897.181652] invalid opcode: 0000 [#1] SMP
[ 1897.182227] Modules linked in: hidp dm_crypt(F) joydev(F) cx22702 snd_hda_codec_realtek isl6421 snd_soc_wm8776 cx24116 snd_soc_core cx88_dvb cx88_vp3054_i2c videobuf_dvb snd_compress(F) wm8775 dvb_core ir_lirc_codec lirc_dev ir_mce_kbd_decoder hid_generic ir_sanyo_decoder ir_jvc_decoder ir_sony_decoder ir_rc6_decoder ir_rc5_decoder ir_nec_decoder rc_hauppauge usbhid hid btusb tuner_simple tuner_types tda9887 snd_hda_intel tda8290 cx88_alsa tuner snd_hda_codec snd_hwdep(F) cx8800 snd_pcm(F) cx8802 snd_page_alloc(F) cx88xx snd_seq_midi(F) snd_seq_midi_event(F) btcx_risc tveeprom snd_rawmidi(F) videobuf_dma_sg bnep rfcomm parport_pc(F) psmouse(F) rc_core ppdev(F) bluetooth serio_raw(F) snd_seq(F) v4l2_common nfsd(F) auth_rpcgss(F) nfs_acl(F) nfs(F) lockd(F) sunrpc(F) fscache(F) videobuf_core snd_seq_device(F) snd_timer(F) videodev snd(F) mac_hid k8temp ohci_pci soundcore(F) i2c_nforce2 lp(F) parport(F) usblp btrfs(F) xor(F) zlib_deflate(F) raid6_pq(F) libcrc32c(F) nouveau mxm_wmi wmi video(F) i2c_algo_bit ttm pata_acpi drm_kms_helper drm pata_amd sata_nv forcedeth ahci(F) libahci(F)
[ 1897.184274] CPU: 0 PID: 6899 Comm: btrfs-balance Tainted: GF 3.11.0-2-generic #5-Ubuntu
[ 1897.184274] Hardware name: Gigabyte Technology Co., Ltd. M61P-S3/M61P-S3, BIOS F7f 06/18/2009
[ 1897.184274] task: f5884d40 ti: ea904000 task.ti: ea904000
[ 1897.184274] EIP: 0060:[<facfde70>] EFLAGS: 00010246 CPU: 0
[ 1897.184274] EIP is at build_backref_tree+0x210/0xf40 [btrfs]
[ 1897.184274] EAX: ef31f0c0 EBX: ef31f0c8 ECX: ea905c78 EDX: 00000000
[ 1897.184274] ESI: db829b40 EDI: 00000000 EBP: ea905cb8 ESP: ea905c28
[ 1897.184274] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 1897.184274] CR0: 8005003b CR2: aefa7000 CR3: 2faae000 CR4: 000007f0
[ 1897.184274] Stack:
[ 1897.184274] db829120 00000000 00000000 02905c54 f00f3894 db829660 00000004 ed28c000
[ 1897.184274] db82968c efc610e0 ea326094 eafbcb40 db829b40 db829660 ea326000 00000000
[ 1897.184274] ea326010 efc61150 00000000 00000000 ea905c78 ea905c78 ea905c80 ea905c80
[ 1897.184274] Call Trace:
[ 1897.184274] [<facfed79>] relocate_tree_blocks+0x1d9/0x610 [btrfs]
[ 1897.184274] [<facfd729>] ? add_data_references+0x239/0x250 [btrfs]
[ 1897.184274] [<facffdae>] relocate_block_group+0x25e/0x650 [btrfs]
[ 1897.184274] [<fad00352>] btrfs_relocate_block_group+0x1b2/0x310 [btrfs]
[ 1897.184274] [<facd6af9>] btrfs_relocate_chunk.isra.28+0x59/0x720 [btrfs]
[ 1897.184274] [<facd40ba>] ? read_extent_buffer+0xaa/0xf0 [btrfs]
[ 1897.184274] [<c163617d>] ? _raw_spin_lock+0xd/0x10
[ 1897.184274] [<faccc6d8>] ? release_extent_buffer+0x58/0xb0 [btrfs]
[ 1897.184274] [<facd317d>] ? free_extent_buffer+0x4d/0xa0 [btrfs]
[ 1897.184274] [<facdb851>] __btrfs_balance+0x481/0x8c0 [btrfs]
[ 1897.184274] [<facdc19a>] btrfs_balance+0x50a/0x730 [btrfs]
[ 1897.184274] [<facdc427>] balance_kthread+0x67/0x70 [btrfs]
[ 1897.184274] [<facdc3c0>] ? btrfs_balance+0x730/0x730 [btrfs]
[ 1897.184274] [<c1070074>] kthread+0x94/0xa0
[ 1897.184274] [<c1070000>] ? kthread+0x20/0xa0
[ 1897.184274] [<c163d837>] ret_from_kernel_thread+0x1b/0x28
[ 1897.184274] [<c106ffe0>] ? kthread_create_on_node+0xc0/0xc0
[ 1897.184274] Code: e8 01 89 47 20 8b 45 a4 f6 40 45 10 0f 85 be 09 00 00 8b 7d a4 8b 47 2c 8d 77 2c 89 75 90 39 c6 74 0b 3b 47 30 0f 84 4a 08 00 00 <0f> 0b c7 45 ac 00 00 00 00 8d b4 26 00 00 00 00 e8 0b 6d 93 c6
[ 1897.184274] EIP: [<facfde70>] build_backref_tree+0x210/0xf40 [btrfs] SS:ESP 0068:ea905c28
[ 1897.214978] ---[ end trace c3a3e666541660a2 ]---

ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: linux-image-3.11.0-2-generic 3.11.0-2.5
ProcVersionSignature: Ubuntu 3.11.0-2.5-generic 3.11.0-rc5
Uname: Linux 3.11.0-2-generic i686
ApportVersion: 2.9.2-0ubuntu8.1
Architecture: i386
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: robert 4167 F.... pulseaudio
 /dev/snd/controlC1: robert 4167 F.... pulseaudio
Date: Thu Aug 15 15:26:08 2013
IwConfig:
 tap0 no wireless extensions.

 lo no wireless extensions.

 eth0 no wireless extensions.
Lsusb:
 Bus 002 Device 002: ID 0a12:0001 Cambridge Silicon Radio, Ltd Bluetooth Dongle (HCI mode)
 Bus 002 Device 003: ID 0e6a:6001 Megawin Technology Co., Ltd GEMBIRD Flexible keyboard KB-109F-B-DE
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: Gigabyte Technology Co., Ltd. M61P-S3
MarkForUpload: True
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
ProcFB: 0 nouveaufb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.11.0-2-generic root=UUID=7a3c13e0-efa7-4551-b899-ff2f9b401d3a ro quiet splash max_loop=64 vt.handoff=7
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.11.0-2-generic N/A
 linux-backports-modules-3.11.0-2-generic N/A
 linux-firmware 1.106
RfKill:
 0: hci0: Bluetooth
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
UpgradeStatus: Upgraded to raring on 2013-04-23 (113 days ago)
dmi.bios.date: 06/18/2009
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: F7f
dmi.board.name: M61P-S3
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF7f:bd06/18/2009:svnGigabyteTechnologyCo.,Ltd.:pnM61P-S3:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnM61P-S3:rvr:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
dmi.product.name: M61P-S3
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

Revision history for this message
Robert Heel (2-launchpad-net-bobosch-de) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Robert Heel (2-launchpad-net-bobosch-de) wrote :

kernel ist tainted here:
[ 1.634999] libahci: module verification failed: signature and/or required key missing - tainting kernel

error occurs after reboot, too (balance starts automaticaly)

btrfs filesystem is readable

btrfs filesystem was converted some times:
1 disk -> 2 disk raid 0
2 disk raid 0 -> 3 disk raid 5
3 disk raid 5 -> 4 disk raid 5

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a kernel version where you were not having this particular problem? This will help determine if the problem you are seeing is the result of the introduction of a regression, and when this regression was introduced. If this is a regression, we can perform a kernel bisect to identify the commit that introduced the problem.

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Robert Heel (2-launchpad-net-bobosch-de) wrote :

The last complete and error free balance was with kernel 3.9 after converting from raid 0 to raid 5.
I've done some kernel updates. I recognize the issue on balance after adding the new disc

Just checked some files and directorys, some are readable, some not :-(
In case of not readable it hangs some time...

Not 100% sure that the error not exists before.

According smart information the discs are good.

Revision history for this message
Robert Heel (2-launchpad-net-bobosch-de) wrote :

This was with kernel 3.11.0-2

Revision history for this message
Robert Heel (2-launchpad-net-bobosch-de) wrote :

Found also previous btrfs call traces in kern.log, e.g. with kernel 3.10.0-6:

Aug 3 21:33:37 server kernel: [44521.116044] INFO: task btrfs-transacti:1603 blocked for more than 120 seconds.
Aug 3 21:33:37 server kernel: [44521.116055] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 3 21:33:37 server kernel: [44521.116060] btrfs-transacti D 00000000 0 1603 2 0x00000000
Aug 3 21:33:37 server kernel: [44521.116071] efedbe6c 00000046 00018000 00000000 d24c73e0 f593b070 efedbe48 c1a2c580
Aug 3 21:33:37 server kernel: [44521.116085] c18c91a0 00000000 c1a2c580 f7bbc580 f15ca6a0 fb1b53f8 f50dd140 00000000
Aug 3 21:33:37 server kernel: [44521.116098] f5162400 0014cbf3 00000000 f50dd000 00018000 d24c73e0 efedbe48 fb1ea1fe
Aug 3 21:33:37 server kernel: [44521.116111] Call Trace:
Aug 3 21:33:37 server kernel: [44521.116202] [<fb1b53f8>] ? release_extent_buffer+0x58/0xb0 [btrfs]
Aug 3 21:33:37 server kernel: [44521.116256] [<fb1ea1fe>] ? btrfs_release_delayed_inode.part.16+0x2e/0x40 [btrfs]
Aug 3 21:33:37 server kernel: [44521.116293] [<fb172ceb>] ? btrfs_put_tree_mod_seq+0x11b/0x150 [btrfs]
Aug 3 21:33:37 server kernel: [44521.116305] [<c1622203>] schedule+0x23/0x60
Aug 3 21:33:37 server kernel: [44521.116313] [<c1620a85>] schedule_timeout+0x1a5/0x250
Aug 3 21:33:37 server kernel: [44521.116321] [<c16237cd>] ? _raw_spin_lock+0xd/0x10
Aug 3 21:33:37 server kernel: [44521.116368] [<fb1b340f>] ? btrfs_run_ordered_operations+0x19f/0x220 [btrfs]
Aug 3 21:33:37 server kernel: [44521.116380] [<c1039b28>] ? default_spin_lock_flags+0x8/0x10
Aug 3 21:33:37 server kernel: [44521.116388] [<c1068d28>] ? prepare_to_wait+0x48/0x70
Aug 3 21:33:37 server kernel: [44521.116431] [<fb1987fc>] btrfs_commit_transaction+0x1dc/0xc80 [btrfs]
Aug 3 21:33:37 server kernel: [44521.116440] [<c1068ee0>] ? wake_up_bit+0x20/0x20
Aug 3 21:33:37 server kernel: [44521.116481] [<fb192c5d>] transaction_kthread+0x16d/0x200 [btrfs]
Aug 3 21:33:37 server kernel: [44521.116522] [<fb192af0>] ? btrfs_cleanup_transaction+0x4b0/0x4b0 [btrfs]
Aug 3 21:33:37 server kernel: [44521.116529] [<c1068454>] kthread+0x94/0xa0
Aug 3 21:33:37 server kernel: [44521.116537] [<c1070000>] ? smpboot_register_percpu_thread+0x50/0xc0
Aug 3 21:33:37 server kernel: [44521.116546] [<c162aa77>] ret_from_kernel_thread+0x1b/0x28
Aug 3 21:33:37 server kernel: [44521.116553] [<c10683c0>] ? kthread_create_on_node+0xc0/0xc0

Revision history for this message
Robert Heel (2-launchpad-net-bobosch-de) wrote :

btrfs check don't crash, but found errors - is it safe to try repair on a partially balanced raid 5?

Checking filesystem on /dev/sda
UUID: 783bcdc8-9912-4b5e-84a1-b59afe80564b
checking extents
checking fs roots
root 5 inode 1077162 errors 400
root 5 inode 1077284 errors 400
root 5 inode 1077286 errors 400
root 5 inode 1077299 errors 400
root 5 inode 1077307 errors 400
root 5 inode 1077319 errors 400
root 5 inode 1077331 errors 400
root 5 inode 1077337 errors 400
root 5 inode 1077340 errors 400
root 5 inode 1077347 errors 400
root 5 inode 1077351 errors 400
root 5 inode 1077359 errors 400
root 5 inode 1077362 errors 400
root 5 inode 1077365 errors 400
root 5 inode 1077371 errors 400
root 5 inode 1077374 errors 400
root 5 inode 1077377 errors 400
root 5 inode 1077383 errors 400
root 5 inode 1077386 errors 400
root 5 inode 1077389 errors 400
root 5 inode 1077392 errors 400
root 5 inode 1093499 errors 400
root 5 inode 1093501 errors 400
root 5 inode 1093505 errors 400
root 5 inode 1130116 errors 400
root 5 inode 1249902 errors 400
root 5 inode 1249990 errors 400
root 5 inode 1249992 errors 400
root 5 inode 1250001 errors 400
root 5 inode 1250005 errors 400
root 5 inode 1250009 errors 400
root 5 inode 1250013 errors 400
root 5 inode 1250016 errors 400
root 5 inode 1250019 errors 400
root 5 inode 1250022 errors 400
root 5 inode 1250025 errors 400
root 5 inode 1250028 errors 400
root 5 inode 1250031 errors 400
root 5 inode 1250034 errors 400
root 5 inode 1250037 errors 400
found 4356563644416 bytes used err is 1
total csum bytes: 4247418896
total tree bytes: 6601850880
total fs tree bytes: 1267642368
btree space waste bytes: 796516436
file data blocks allocated: 4365092147200
 referenced 4339408449536
Btrfs v0.20-rc1

Revision history for this message
Robert Heel (2-launchpad-net-bobosch-de) wrote :

Or do you need a copy of filesystem metadata to reproduce the bug?

Revision history for this message
Robert Heel (2-launchpad-net-bobosch-de) wrote :

Same error witch linux-image-3.11.0-3-generic

Revision history for this message
Robert Heel (2-launchpad-net-bobosch-de) wrote :

Same error with linux-image-3.11.0-4-generic

btrfs check --repair also crashes. /var/crash/_sbin_btrfs.0.crash size is 659MB

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.