soft lockup during btrfs balance

Bug #1559723 reported by Thomas Mayer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

I performed a btrfs balance for a disk which ran out of space. When I did this some months ago, I got an error message, but this time, I got a soft lockup and the file system got mounted read-only:

# cat /proc/version_signature
Ubuntu 4.2.0-27.32~14.04.1-generic 4.2.8-ckt1

I see multiple errors in dmesg output, like

[3099814.088753] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [btrfs:27516]
[3099814.089403] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c intel_rapl w83627ehf x86_pkg_temp_thermal intel_powerclamp hwmon_vid coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda
_codec_generic lp joydev input_leds ppdev gpio_ich 8250_fintek parport_pc parport nuvoton_cir rc_core mac_hid snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd lpc_ich mei_me soundcore mei s
hpchp serio_raw btrfs drbg ansi_cprng dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid0 multipath linear raid1 crct10dif_pclmul crc32_pclmul hid_generic i915 aes
ni_intel ahci aes_x86_64 lrw usbhid video gf128mul i2c_algo_bit glue_helper e1000e ablk_helper drm_kms_helper libahci ptp cryptd hid psmouse pps_core drm pata_acpi
[3099814.089436] CPU: 3 PID: 27516 Comm: btrfs Tainted: G U W L 4.2.0-27-generic #32~14.04.1-Ubuntu
[3099814.089437] Hardware name: /DQ67OW, BIOS SWQ6710H.86A.0067.2014.0313.1347 03/13/2014
[3099814.089438] task: ffff8800aca0a940 ti: ffff880110c5c000 task.ti: ffff880110c5c000
[3099814.089438] RIP: 0010:[<ffffffffc031f9d1>] [<ffffffffc031f9d1>] __del_reloc_root+0xd1/0x100 [btrfs]
[3099814.089451] RSP: 0018:ffff880110c5fa60 EFLAGS: 00000246
[3099814.089452] RAX: ffff880142cd36f8 RBX: ffffffffc02a912a RCX: 00000000ffffffe2
[3099814.089453] RDX: 000007ba9c902000 RSI: ffffffffc034e280 RDI: ffff88002a9e8d70
[3099814.089454] RBP: ffff880110c5fa78 R08: 0000000000000000 R09: 00000000ffffffe2
[3099814.089454] R10: ffffffffc02a912a R11: ffffea00029bc9c0 R12: 00000000ffffffe2
[3099814.089455] R13: ffff880110c5fa78 R14: 000000002b4bf801 R15: ffff88042b4bf800
[3099814.089456] FS: 00007f93651f0880(0000) GS:ffff88043e2c0000(0000) knlGS:0000000000000000
[3099814.089457] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3099814.089458] CR2: 0000559c1ffff7d0 CR3: 00000002fca04000 CR4: 00000000000406e0
[3099814.089459] Stack:
[3099814.089460] ffff88002a9e8800 ffff88042b52b000 ffff880110c5fab8 ffff880110c5fa98
[3099814.089461] ffffffffc031fa25 ffff88042b4bfbc0 ffff88042b4bfbc0 ffff880110c5faf8
[3099814.089462] ffffffffc0325f2a ffff88002a9e8d78 000000002a9e8d78 ffff88042b4bdbc0
[3099814.089464] Call Trace:
[3099814.089471] [<ffffffffc031fa25>] free_reloc_roots+0x25/0x40 [btrfs]
[3099814.089477] [<ffffffffc0325f2a>] merge_reloc_roots+0x16a/0x230 [btrfs]
[3099814.089483] [<ffffffffc0326246>] relocate_block_group+0x256/0x600 [btrfs]
[3099814.089489] [<ffffffffc03267b3>] btrfs_relocate_block_group+0x1c3/0x2d0 [btrfs]
[3099814.089496] [<ffffffffc02f9dee>] btrfs_relocate_chunk.isra.39+0x3e/0xc0 [btrfs]
[3099814.089502] [<ffffffffc02fb26f>] __btrfs_balance+0x48f/0x8c0 [btrfs]
[3099814.089509] [<ffffffffc02fba1d>] btrfs_balance+0x37d/0x650 [btrfs]
[3099814.089516] [<ffffffffc03085e2>] ? btrfs_ioctl_balance+0x282/0x530 [btrfs]
[3099814.089522] [<ffffffffc0308756>] btrfs_ioctl_balance+0x3f6/0x530 [btrfs]
[3099814.089528] [<ffffffffc030a17f>] btrfs_ioctl+0x56f/0x2470 [btrfs]
[3099814.089532] [<ffffffff8118559b>] ? lru_cache_add_active_or_unevictable+0x2b/0xa0
[3099814.089534] [<ffffffff811a5e55>] ? handle_mm_fault+0xba5/0x1840
[3099814.089536] [<ffffffff811fdfbd>] do_vfs_ioctl+0x2cd/0x4b0
[3099814.089539] [<ffffffff810642a6>] ? __do_page_fault+0x1b6/0x430
[3099814.089541] [<ffffffff811fe219>] SyS_ioctl+0x79/0x90
[3099814.089543] [<ffffffff817bc3b2>] entry_SYSCALL_64_fastpath+0x16/0x75
[3099814.089543] Code: f0 01 00 00 49 89 84 24 c0 03 00 00 49 89 84 24 c8 03 00 00 48 81 c7 38 09 00 00 c6 07 00 66 66 66 90 48 89 df e8 b0 d3 ea c0 5b <41> 5c 41 5d 41 5e 5d c3 76 09 48 8b 5b 08 e9 5a ff ff ff 49 8d

I also can't access some of the directories now (even in read-only mode):

cd somedir
-su: cd: somedir: Eingabe-/Ausgabefehler

# lsb_release -rd
Description: Ubuntu 14.04.4 LTS
Release: 14.04

Tags: trusty
Revision history for this message
Thomas Mayer (thomas303) wrote :
Revision history for this message
Thomas Mayer (thomas303) wrote :
Revision history for this message
Thomas Mayer (thomas303) wrote :
Revision history for this message
Thomas Mayer (thomas303) wrote :

related: #1349711

Revision history for this message
Thomas Mayer (thomas303) wrote :

Note that some months ago (when I got the error message), I still was able to cleanly free up some diskspace and perform a clean btrfs balance (which freed a lot of space).

My hope was that I'd get an error message and exit cleanly if disk space is missing. Instead, I got the soft lockup.

Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
tags: added: trusty
Revision history for this message
Thomas Mayer (thomas303) wrote :

For my setup, btrfs file system runs on top of a LUKS encrypted partition which itself is on top of a mdraid (mirroring).

The Layout is:

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 2,7T 0 disk
├─sda1 8:1 0 7M 0 part
├─sda2 8:2 0 286M 0 part /boot
└─sda3 8:3 0 2,7T 0 part
  └─md0 9:0 0 2,7T 0 raid1
    └─md0_crypt (dm-0) 252:0 0 2,7T 0 crypt
sdb 8:16 0 2,7T 0 disk
├─sdb1 8:17 0 7M 0 part
├─sdb2 8:18 0 286M 0 part /boot2
└─sdb3 8:19 0 2,7T 0 part
  └─md0 9:0 0 2,7T 0 raid1
    └─md0_crypt (dm-0) 252:0 0 2,7T 0 crypt

The btrfs filesystem does not use btrfs-raid functionality.

Scenario: 3 hosts are backed up using rsync and btrfs-snapshots on this server. I have 3 groups of subvolumes with about 50 snapshots in each group.

Revision history for this message
Thomas Mayer (thomas303) wrote :
Revision history for this message
Thomas Mayer (thomas303) wrote :
Revision history for this message
Thomas Mayer (thomas303) wrote :

I have # btrfs --version
Btrfs v3.12

Kernel is from Hardware enablement stack 14.04.4, Btrfs-tools from ubuntu's package respectively.

Revision history for this message
Thomas Mayer (thomas303) wrote :

# apt-cache policy btrfs-tools
btrfs-tools:
  Installiert: 3.12-1ubuntu0.1

Revision history for this message
Thomas Mayer (thomas303) wrote :

related: https://<email address hidden>/msg48609.html

Revision history for this message
Thomas Mayer (thomas303) wrote :

Some relevant lines from /var/log/syslog:

Mar 19 19:15:10 server smartd[2251]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 107 to 106

Mar 19 07:59:21 server console-kit-daemon[2041]: GLib-CRITICAL: Source ID 376 was not found when attempting to remove it
Mar 19 19:23:17 server console-kit-daemon[2041]: GLib-CRITICAL: Source ID 332 was not found when attempting to remove it
Mar 19 19:23:17 server console-kit-daemon[2041]: GLib-CRITICAL: Source ID 332 was not found when attempting to remove it
Mar 19 19:23:41 server console-kit-daemon[2041]: WARNING: Record was not written to disk (No space left on device)

Mar 19 19:24:39 server console-kit-daemon[2041]: message repeated 3 times: [ WARNING: Record was not written to disk (No space left on device)]
Mar 19 19:45:07 server console-kit-daemon[2041]: GLib-CRITICAL: Source ID 384 was not found when attempting to remove it
Mar 19 19:45:11 server smartd[2251]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 106 to 107

So besides full disk and maybe broken file system, the temperature of one drive also was too high, presumably during the balance operation. After >10 hours of idling, temperature is normal again (46 degrees celsius):

smartctl -a /dev/sdb
194 Temperature_Celsius 0x0022 104 101 000 Old_age Always - 46

Revision history for this message
Thomas Mayer (thomas303) wrote :

The underlying md-raid (mirroring) is fine. So according to md-raid status, there's nothing wrong with the mirrored partitions:

cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sda3[0] sdb3[1]
      2929833792 blocks super 1.2 [2/2] [UU]

Basically that means, if there is no issue with LUKS (and I don't see any), then the btrfs-balance on a full disk led to the ro-mounted, presumably corrupted file system.

I'll try to check the raid integrity, which might take a day or two.

Revision history for this message
Thomas Mayer (thomas303) wrote :

Currently, I'm checking the md-raid (raid 1) with

echo check > /sys/block/md0/md/sync_action

I have 2 x 3.5TB hard drives, so this will take some while.

Revision history for this message
Thomas Mayer (thomas303) wrote :

Something noticable: The command btrfs is consuming 100% of the cpu for a long time:

# top
[...]
PID BENUTZER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27516 root 20 0 15476 132 0 R 100,0 0,0 1348:33 btrfs

I assume that this process originates from my "btrfs balance ...", some days ago. At least can I say that I did not start some other btrfs command in the meantime.

Revision history for this message
Thomas Mayer (thomas303) wrote :

Error messages in dmesg |tail are still ongoing, presumably originating from the btrfs command which still is eating 100% of the cpu,

I did not reboot yet nor kill the btrfs process (pid 27516).

Here's two lines from two of the latest error messages in dmesg:
[3111643.576434] [<ffffffff817bc3b2>] entry_SYSCALL_64_fastpath+0x16/0x75
[3111671.589435] [<ffffffff817bc3b2>] entry_SYSCALL_64_fastpath+0x16/0x75

so the time interval beween two messages is about 28 seconds.

Revision history for this message
Thomas Mayer (thomas303) wrote :

I found a few entries in kern.log, nothing special during btrfs balance:

Mar 19 03:47:03 server kernel: [2970569.341605] BTRFS info (device dm-0): relocating block group 9553211031552 flags 36
Mar 19 04:01:08 server kernel: [2971415.588391] BTRFS info (device dm-0): relocating block group 9552674160640 flags 36
Mar 19 04:11:30 server kernel: [2972037.616074] BTRFS info (device dm-0): relocating block group 9552137289728 flags 36
Mar 19 04:24:01 server kernel: [2972788.648812] BTRFS info (device dm-0): relocating block group 9488249651200 flags 36
Mar 19 04:58:24 server kernel: [2974852.613046] BTRFS info (device dm-0): relocating block group 9440468140032 flags 36
Mar 19 05:39:25 server kernel: [2977314.792513] BTRFS info (device dm-0): relocating block group 9424898883584 flags 36
Mar 19 06:29:42 server kernel: [2980333.819035] BTRFS info (device dm-0): relocating block group 9266521964544 flags 36
Mar 19 11:01:36 server kernel: [2996654.661281] BTRFS info (device dm-0): relocating block group 9215519227904 flags 36

Revision history for this message
Thomas Mayer (thomas303) wrote :

cat /proc/mounts
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,relatime,size=8157548k,nr_inodes=2039387,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=1634416k,mode=755 0 0
/dev/mapper/md0_crypt / btrfs ro,relatime,space_cache,subvolid=256,subvol=/@ 0 0
none /sys/fs/cgroup tmpfs rw,relatime,size=4k,mode=755 0 0
none /sys/fs/fuse/connections fusectl rw,relatime 0 0
none /sys/kernel/debug debugfs rw,relatime 0 0
none /sys/kernel/security securityfs rw,relatime 0 0
none /tmp tmpfs rw,relatime,size=6291456k 0 0
none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
none /run/user tmpfs rw,nosuid,nodev,noexec,relatime,size=102400k,mode=755 0 0
none /sys/fs/pstore pstore rw,relatime 0 0
none /var/tmp/mysql tmpfs rw,relatime,size=6291456k 0 0
/dev/mapper/md0_crypt /home btrfs ro,relatime,space_cache,subvolid=258,subvol=/@home 0 0
/dev/mapper/md0_crypt /backup btrfs ro,relatime,space_cache,subvolid=263,subvol=/@backup 0 0
/dev/sda2 /boot ext4 rw,relatime,stripe=4,data=ordered 0 0
/dev/sdb2 /boot2 ext4 rw,relatime,stripe=4,data=ordered 0 0
systemd /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,name=systemd 0 0
/dev/mapper/md0_crypt /mnt/x btrfs ro,relatime,space_cache,subvolid=5,subvol=/ 0 0

All the btrfs mount points were mounted read-only (they all originate from one partition /dev/mapper/md0_crypt). I started the btrfs balance online (some days ago), meaning I mounted /dev/mapper/md0_crypt to mount point /mnt/x for the btrfs balance operation.

Revision history for this message
Thomas Mayer (thomas303) wrote :

Now, this one is funny after ~2 days:

# btrfs balance status x
Balance on 'x' is running
0 out of about 2797 chunks balanced (533 considered), 100% left

# btrfs balance cancel x
ERROR: balance cancel on 'x' failed - Read-only file system

So the balance is still running, consuming 100% of one cpu, but now that the file system is read-only, nothing can be written to disk. The interval of ~30s between error messages in dmesg tells me that this might be interval of periodic comit according to https://btrfs.wiki.kernel.org/index.php/Mount_options

Revision history for this message
Thomas Mayer (thomas303) wrote :

Checking the underlying mdraid with

echo check > /sys/block/md0/md/sync_action

now has finished and it is just fine:

cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sda3[0] sdb3[1]
      2929833792 blocks super 1.2 [2/2] [UU]

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.5 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.5-wily/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Thomas Mayer (thomas303) wrote :

@jsalisbury

What exactly should I test on a broken file system with a newer Kernel? How could this help you? If btrfs balance corrupted my file system, how should a newer kernel fix that?

If you can tell me test a scenario which will help you to see if the bug is fixed upstream, I'd be happy to hear about it.

Revision history for this message
Thomas Mayer (thomas303) wrote :
Download full text (23.4 KiB)

I rebooted the system in the meantime. It now had Kernel 4.2.0-30.36 from previous updates:

cat /proc/version_signature
Ubuntu 4.2.0-30.36~14.04.1-generic 4.2.8-ckt3

During the reboot, the system performed some disk I/O for several hours. Before services like sshd became available. I guess this disk operation happened before the file system was mounted in read-write mode.

What I can say is that the system tried to continue the failed balance operation during the boot (from syslog) without intervention from my side:

BTRFS info (device dm-0): continuing balance
BTRFS info (device dm-0): relocating block group 9553211031552 flags 36
found 106150 extents

The btrfs balance operation did not succeed again, but in a manner which I was used to before writing this bug report. This is MUCH better than going into an unpredictable state (ro mode with Input/Output errors). I guess the OS totally got out of sync.

BTRFS info (device dm-0): 2 enospc errors during balance

The btrfs file system was mounted in READ-WRITE-mode this time (not in read-only mode as before!). And this time, I can read the files and navigate through directories without any Input/Output errors (before the reboot, I had lots of them!).

Finally, everything looks fine for a btrfs file system with no space left on the disk.

Now that the fs is online and accessible, I can try to add another disk to the fs to have some empty space again. After that, I hopefully can delete some data and hopefully redo the btrfs balance operation.

/var/log/syslog gives me:

Mar 22 12:42:37 server kernel: [ 3481.725273] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 22 12:42:37 server kernel: [ 3481.726122] mount D ffff88043e296640 0 587 455 0x00000000
Mar 22 12:42:37 server kernel: [ 3481.726125] ffff88042798f6a8 0000000000000086 ffff88042b98b700 ffff880427946e00
Mar 22 12:42:37 server kernel: [ 3481.726127] ffff88042798f788 ffff880427990000 0000000000000000 7fffffffffffffff
Mar 22 12:42:37 server kernel: [ 3481.726129] ffff88043e5aeee8 ffffffff817b9240 ffff88042798f6c8 ffffffff817b8ac7
Mar 22 12:42:37 server kernel: [ 3481.726130] Call Trace:
Mar 22 12:42:37 server kernel: [ 3481.726136] [<ffffffff817b9240>] ? bit_wait+0x60/0x60
Mar 22 12:42:37 server kernel: [ 3481.726138] [<ffffffff817b8ac7>] schedule+0x37/0x80
Mar 22 12:42:37 server kernel: [ 3481.726140] [<ffffffff817bb4a1>] schedule_timeout+0x201/0x2a0
Mar 22 12:42:37 server kernel: [ 3481.726143] [<ffffffff8101d749>] ? read_tsc+0x9/0x10
Mar 22 12:42:37 server kernel: [ 3481.726146] [<ffffffff810e6e5e>] ? ktime_get+0x3e/0xa0
Mar 22 12:42:37 server kernel: [ 3481.726147] [<ffffffff817b9240>] ? bit_wait+0x60/0x60
Mar 22 12:42:37 server kernel: [ 3481.726149] [<ffffffff817b8136>] io_schedule_timeout+0xa6/0x110
Mar 22 12:42:37 server kernel: [ 3481.726151] [<ffffffff817b925f>] bit_wait_io+0x1f/0x70
Mar 22 12:42:37 server kernel: [ 3481.726152] [<ffffffff817b8e90>] __wait_on_bit+0x60/0x90
Mar 22 12:42:37 server kernel: [ 3481.726155] [<ffffffff81176708>] ? find_get_pages_tag+0xc8/0x170
Mar 22 12:42:37 server kernel: [ 3481.726157] [<ffffffff81175980>] wait_on_page_bit+0xc0/0xd0
Mar 22 ...

Revision history for this message
Thomas Mayer (thomas303) wrote :

As described in my previous comment, the fs was available in read-write mode again.

I added another disk to the btrfs and with the additional space I was able to rerun and successfully finish the btrfs balance operation (as described in https://www.slicewise.net/debian/balancierung-eines-vollen-btrfs-dateisystems/ ):

btrfs balance start x
Done, had to relocate 2797 out of 2797 chunks

So my file system is completely rescued now.

To sum it up:

There is no way I could reproduce this error again with a recent kernel.

All I can say is that Kernel 4.2.0-27.32~14.04.1-generic 4.2.8-ckt1 ran into unexpected behaviour when doing a btrfs balance:

Expected Result: Error "enospc errors during balance" (at least)

Actual Result: Kernel bug in dmesg, Filesystem goes read-only, Input/Output errors occur, many directories are not readable even in read-only mode, even ls does not work. It seems as if the system totally screwed it up.

Suggested Fix: All of these problems should be avoidable because the later reboot "healed" them (Error "enospc errors during balance" reported, fs mounted read-write, but btrfs balance still fails, which is totally fine at this point)

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.