soft lockup during btrfs balance

Bug #1559723 reported by Thomas Mayer on 2016-03-20
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned

Bug Description

I performed a btrfs balance for a disk which ran out of space. When I did this some months ago, I got an error message, but this time, I got a soft lockup and the file system got mounted read-only:

# cat /proc/version_signature
Ubuntu 4.2.0-27.32~14.04.1-generic 4.2.8-ckt1

I see multiple errors in dmesg output, like

[3099814.088753] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [btrfs:27516]
[3099814.089403] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs libcrc32c intel_rapl w83627ehf x86_pkg_temp_thermal intel_powerclamp hwmon_vid coretemp kvm_intel kvm snd_hda_codec_realtek snd_hda
_codec_generic lp joydev input_leds ppdev gpio_ich 8250_fintek parport_pc parport nuvoton_cir rc_core mac_hid snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd lpc_ich mei_me soundcore mei s
hpchp serio_raw btrfs drbg ansi_cprng dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid0 multipath linear raid1 crct10dif_pclmul crc32_pclmul hid_generic i915 aes
ni_intel ahci aes_x86_64 lrw usbhid video gf128mul i2c_algo_bit glue_helper e1000e ablk_helper drm_kms_helper libahci ptp cryptd hid psmouse pps_core drm pata_acpi
[3099814.089436] CPU: 3 PID: 27516 Comm: btrfs Tainted: G U W L 4.2.0-27-generic #32~14.04.1-Ubuntu
[3099814.089437] Hardware name: /DQ67OW, BIOS SWQ6710H.86A.0067.2014.0313.1347 03/13/2014
[3099814.089438] task: ffff8800aca0a940 ti: ffff880110c5c000 task.ti: ffff880110c5c000
[3099814.089438] RIP: 0010:[<ffffffffc031f9d1>] [<ffffffffc031f9d1>] __del_reloc_root+0xd1/0x100 [btrfs]
[3099814.089451] RSP: 0018:ffff880110c5fa60 EFLAGS: 00000246
[3099814.089452] RAX: ffff880142cd36f8 RBX: ffffffffc02a912a RCX: 00000000ffffffe2
[3099814.089453] RDX: 000007ba9c902000 RSI: ffffffffc034e280 RDI: ffff88002a9e8d70
[3099814.089454] RBP: ffff880110c5fa78 R08: 0000000000000000 R09: 00000000ffffffe2
[3099814.089454] R10: ffffffffc02a912a R11: ffffea00029bc9c0 R12: 00000000ffffffe2
[3099814.089455] R13: ffff880110c5fa78 R14: 000000002b4bf801 R15: ffff88042b4bf800
[3099814.089456] FS: 00007f93651f0880(0000) GS:ffff88043e2c0000(0000) knlGS:0000000000000000
[3099814.089457] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[3099814.089458] CR2: 0000559c1ffff7d0 CR3: 00000002fca04000 CR4: 00000000000406e0
[3099814.089459] Stack:
[3099814.089460] ffff88002a9e8800 ffff88042b52b000 ffff880110c5fab8 ffff880110c5fa98
[3099814.089461] ffffffffc031fa25 ffff88042b4bfbc0 ffff88042b4bfbc0 ffff880110c5faf8
[3099814.089462] ffffffffc0325f2a ffff88002a9e8d78 000000002a9e8d78 ffff88042b4bdbc0
[3099814.089464] Call Trace:
[3099814.089471] [<ffffffffc031fa25>] free_reloc_roots+0x25/0x40 [btrfs]
[3099814.089477] [<ffffffffc0325f2a>] merge_reloc_roots+0x16a/0x230 [btrfs]
[3099814.089483] [<ffffffffc0326246>] relocate_block_group+0x256/0x600 [btrfs]
[3099814.089489] [<ffffffffc03267b3>] btrfs_relocate_block_group+0x1c3/0x2d0 [btrfs]
[3099814.089496] [<ffffffffc02f9dee>] btrfs_relocate_chunk.isra.39+0x3e/0xc0 [btrfs]
[3099814.089502] [<ffffffffc02fb26f>] __btrfs_balance+0x48f/0x8c0 [btrfs]
[3099814.089509] [<ffffffffc02fba1d>] btrfs_balance+0x37d/0x650 [btrfs]
[3099814.089516] [<ffffffffc03085e2>] ? btrfs_ioctl_balance+0x282/0x530 [btrfs]
[3099814.089522] [<ffffffffc0308756>] btrfs_ioctl_balance+0x3f6/0x530 [btrfs]
[3099814.089528] [<ffffffffc030a17f>] btrfs_ioctl+0x56f/0x2470 [btrfs]
[3099814.089532] [<ffffffff8118559b>] ? lru_cache_add_active_or_unevictable+0x2b/0xa0
[3099814.089534] [<ffffffff811a5e55>] ? handle_mm_fault+0xba5/0x1840
[3099814.089536] [<ffffffff811fdfbd>] do_vfs_ioctl+0x2cd/0x4b0
[3099814.089539] [<ffffffff810642a6>] ? __do_page_fault+0x1b6/0x430
[3099814.089541] [<ffffffff811fe219>] SyS_ioctl+0x79/0x90
[3099814.089543] [<ffffffff817bc3b2>] entry_SYSCALL_64_fastpath+0x16/0x75
[3099814.089543] Code: f0 01 00 00 49 89 84 24 c0 03 00 00 49 89 84 24 c8 03 00 00 48 81 c7 38 09 00 00 c6 07 00 66 66 66 90 48 89 df e8 b0 d3 ea c0 5b <41> 5c 41 5d 41 5e 5d c3 76 09 48 8b 5b 08 e9 5a ff ff ff 49 8d

I also can't access some of the directories now (even in read-only mode):

cd somedir
-su: cd: somedir: Eingabe-/Ausgabefehler

# lsb_release -rd
Description: Ubuntu 14.04.4 LTS
Release: 14.04

Thomas Mayer (thomas303) wrote :
Thomas Mayer (thomas303) wrote :
Thomas Mayer (thomas303) wrote :
Thomas Mayer (thomas303) wrote :

related: #1349711

Thomas Mayer (thomas303) wrote :

Note that some months ago (when I got the error message), I still was able to cleanly free up some diskspace and perform a clean btrfs balance (which freed a lot of space).

My hope was that I'd get an error message and exit cleanly if disk space is missing. Instead, I got the soft lockup.

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
tags: added: trusty
Thomas Mayer (thomas303) wrote :

For my setup, btrfs file system runs on top of a LUKS encrypted partition which itself is on top of a mdraid (mirroring).

The Layout is:

# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 2,7T 0 disk
├─sda1 8:1 0 7M 0 part
├─sda2 8:2 0 286M 0 part /boot
└─sda3 8:3 0 2,7T 0 part
  └─md0 9:0 0 2,7T 0 raid1
    └─md0_crypt (dm-0) 252:0 0 2,7T 0 crypt
sdb 8:16 0 2,7T 0 disk
├─sdb1 8:17 0 7M 0 part
├─sdb2 8:18 0 286M 0 part /boot2
└─sdb3 8:19 0 2,7T 0 part
  └─md0 9:0 0 2,7T 0 raid1
    └─md0_crypt (dm-0) 252:0 0 2,7T 0 crypt

The btrfs filesystem does not use btrfs-raid functionality.

Scenario: 3 hosts are backed up using rsync and btrfs-snapshots on this server. I have 3 groups of subvolumes with about 50 snapshots in each group.

Thomas Mayer (thomas303) wrote :

I have # btrfs --version
Btrfs v3.12

Kernel is from Hardware enablement stack 14.04.4, Btrfs-tools from ubuntu's package respectively.

Thomas Mayer (thomas303) wrote :

# apt-cache policy btrfs-tools
btrfs-tools:
  Installiert: 3.12-1ubuntu0.1

Thomas Mayer (thomas303) wrote :

related: https://<email address hidden>/msg48609.html

Thomas Mayer (thomas303) wrote :

Some relevant lines from /var/log/syslog:

Mar 19 19:15:10 server smartd[2251]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 107 to 106

Mar 19 07:59:21 server console-kit-daemon[2041]: GLib-CRITICAL: Source ID 376 was not found when attempting to remove it
Mar 19 19:23:17 server console-kit-daemon[2041]: GLib-CRITICAL: Source ID 332 was not found when attempting to remove it
Mar 19 19:23:17 server console-kit-daemon[2041]: GLib-CRITICAL: Source ID 332 was not found when attempting to remove it
Mar 19 19:23:41 server console-kit-daemon[2041]: WARNING: Record was not written to disk (No space left on device)

Mar 19 19:24:39 server console-kit-daemon[2041]: message repeated 3 times: [ WARNING: Record was not written to disk (No space left on device)]
Mar 19 19:45:07 server console-kit-daemon[2041]: GLib-CRITICAL: Source ID 384 was not found when attempting to remove it
Mar 19 19:45:11 server smartd[2251]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 106 to 107

So besides full disk and maybe broken file system, the temperature of one drive also was too high, presumably during the balance operation. After >10 hours of idling, temperature is normal again (46 degrees celsius):

smartctl -a /dev/sdb
194 Temperature_Celsius 0x0022 104 101 000 Old_age Always - 46

Thomas Mayer (thomas303) wrote :

The underlying md-raid (mirroring) is fine. So according to md-raid status, there's nothing wrong with the mirrored partitions:

cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sda3[0] sdb3[1]
      2929833792 blocks super 1.2 [2/2] [UU]

Basically that means, if there is no issue with LUKS (and I don't see any), then the btrfs-balance on a full disk led to the ro-mounted, presumably corrupted file system.

I'll try to check the raid integrity, which might take a day or two.

Thomas Mayer (thomas303) wrote :

Currently, I'm checking the md-raid (raid 1) with

echo check > /sys/block/md0/md/sync_action

I have 2 x 3.5TB hard drives, so this will take some while.

Thomas Mayer (thomas303) wrote :

Something noticable: The command btrfs is consuming 100% of the cpu for a long time:

# top
[...]
PID BENUTZER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27516 root 20 0 15476 132 0 R 100,0 0,0 1348:33 btrfs

I assume that this process originates from my "btrfs balance ...", some days ago. At least can I say that I did not start some other btrfs command in the meantime.

Thomas Mayer (thomas303) wrote :

Error messages in dmesg |tail are still ongoing, presumably originating from the btrfs command which still is eating 100% of the cpu,

I did not reboot yet nor kill the btrfs process (pid 27516).

Here's two lines from two of the latest error messages in dmesg:
[3111643.576434] [<ffffffff817bc3b2>] entry_SYSCALL_64_fastpath+0x16/0x75
[3111671.589435] [<ffffffff817bc3b2>] entry_SYSCALL_64_fastpath+0x16/0x75

so the time interval beween two messages is about 28 seconds.

Thomas Mayer (thomas303) wrote :

I found a few entries in kern.log, nothing special during btrfs balance:

Mar 19 03:47:03 server kernel: [2970569.341605] BTRFS info (device dm-0): relocating block group 9553211031552 flags 36
Mar 19 04:01:08 server kernel: [2971415.588391] BTRFS info (device dm-0): relocating block group 9552674160640 flags 36
Mar 19 04:11:30 server kernel: [2972037.616074] BTRFS info (device dm-0): relocating block group 9552137289728 flags 36
Mar 19 04:24:01 server kernel: [2972788.648812] BTRFS info (device dm-0): relocating block group 9488249651200 flags 36
Mar 19 04:58:24 server kernel: [2974852.613046] BTRFS info (device dm-0): relocating block group 9440468140032 flags 36
Mar 19 05:39:25 server kernel: [2977314.792513] BTRFS info (device dm-0): relocating block group 9424898883584 flags 36
Mar 19 06:29:42 server kernel: [2980333.819035] BTRFS info (device dm-0): relocating block group 9266521964544 flags 36
Mar 19 11:01:36 server kernel: [2996654.661281] BTRFS info (device dm-0): relocating block group 9215519227904 flags 36

Thomas Mayer (thomas303) wrote :

cat /proc/mounts
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,relatime,size=8157548k,nr_inodes=2039387,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=1634416k,mode=755 0 0
/dev/mapper/md0_crypt / btrfs ro,relatime,space_cache,subvolid=256,subvol=/@ 0 0
none /sys/fs/cgroup tmpfs rw,relatime,size=4k,mode=755 0 0
none /sys/fs/fuse/connections fusectl rw,relatime 0 0
none /sys/kernel/debug debugfs rw,relatime 0 0
none /sys/kernel/security securityfs rw,relatime 0 0
none /tmp tmpfs rw,relatime,size=6291456k 0 0
none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
none /run/user tmpfs rw,nosuid,nodev,noexec,relatime,size=102400k,mode=755 0 0
none /sys/fs/pstore pstore rw,relatime 0 0
none /var/tmp/mysql tmpfs rw,relatime,size=6291456k 0 0
/dev/mapper/md0_crypt /home btrfs ro,relatime,space_cache,subvolid=258,subvol=/@home 0 0
/dev/mapper/md0_crypt /backup btrfs ro,relatime,space_cache,subvolid=263,subvol=/@backup 0 0
/dev/sda2 /boot ext4 rw,relatime,stripe=4,data=ordered 0 0
/dev/sdb2 /boot2 ext4 rw,relatime,stripe=4,data=ordered 0 0
systemd /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,name=systemd 0 0
/dev/mapper/md0_crypt /mnt/x btrfs ro,relatime,space_cache,subvolid=5,subvol=/ 0 0

All the btrfs mount points were mounted read-only (they all originate from one partition /dev/mapper/md0_crypt). I started the btrfs balance online (some days ago), meaning I mounted /dev/mapper/md0_crypt to mount point /mnt/x for the btrfs balance operation.

Thomas Mayer (thomas303) wrote :

Now, this one is funny after ~2 days:

# btrfs balance status x
Balance on 'x' is running
0 out of about 2797 chunks balanced (533 considered), 100% left

# btrfs balance cancel x
ERROR: balance cancel on 'x' failed - Read-only file system

So the balance is still running, consuming 100% of one cpu, but now that the file system is read-only, nothing can be written to disk. The interval of ~30s between error messages in dmesg tells me that this might be interval of periodic comit according to https://btrfs.wiki.kernel.org/index.php/Mount_options

Thomas Mayer (thomas303) wrote :

Checking the underlying mdraid with

echo check > /sys/block/md0/md/sync_action

now has finished and it is just fine:

cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sda3[0] sdb3[1]
      2929833792 blocks super 1.2 [2/2] [UU]

Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.5 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.5-wily/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Thomas Mayer (thomas303) wrote :

@jsalisbury

What exactly should I test on a broken file system with a newer Kernel? How could this help you? If btrfs balance corrupted my file system, how should a newer kernel fix that?

If you can tell me test a scenario which will help you to see if the bug is fixed upstream, I'd be happy to hear about it.

Thomas Mayer (thomas303) wrote :
Download full text (23.4 KiB)

I rebooted the system in the meantime. It now had Kernel 4.2.0-30.36 from previous updates:

cat /proc/version_signature
Ubuntu 4.2.0-30.36~14.04.1-generic 4.2.8-ckt3

During the reboot, the system performed some disk I/O for several hours. Before services like sshd became available. I guess this disk operation happened before the file system was mounted in read-write mode.

What I can say is that the system tried to continue the failed balance operation during the boot (from syslog) without intervention from my side:

BTRFS info (device dm-0): continuing balance
BTRFS info (device dm-0): relocating block group 9553211031552 flags 36
found 106150 extents

The btrfs balance operation did not succeed again, but in a manner which I was used to before writing this bug report. This is MUCH better than going into an unpredictable state (ro mode with Input/Output errors). I guess the OS totally got out of sync.

BTRFS info (device dm-0): 2 enospc errors during balance

The btrfs file system was mounted in READ-WRITE-mode this time (not in read-only mode as before!). And this time, I can read the files and navigate through directories without any Input/Output errors (before the reboot, I had lots of them!).

Finally, everything looks fine for a btrfs file system with no space left on the disk.

Now that the fs is online and accessible, I can try to add another disk to the fs to have some empty space again. After that, I hopefully can delete some data and hopefully redo the btrfs balance operation.

/var/log/syslog gives me:

Mar 22 12:42:37 server kernel: [ 3481.725273] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 22 12:42:37 server kernel: [ 3481.726122] mount D ffff88043e296640 0 587 455 0x00000000
Mar 22 12:42:37 server kernel: [ 3481.726125] ffff88042798f6a8 0000000000000086 ffff88042b98b700 ffff880427946e00
Mar 22 12:42:37 server kernel: [ 3481.726127] ffff88042798f788 ffff880427990000 0000000000000000 7fffffffffffffff
Mar 22 12:42:37 server kernel: [ 3481.726129] ffff88043e5aeee8 ffffffff817b9240 ffff88042798f6c8 ffffffff817b8ac7
Mar 22 12:42:37 server kernel: [ 3481.726130] Call Trace:
Mar 22 12:42:37 server kernel: [ 3481.726136] [<ffffffff817b9240>] ? bit_wait+0x60/0x60
Mar 22 12:42:37 server kernel: [ 3481.726138] [<ffffffff817b8ac7>] schedule+0x37/0x80
Mar 22 12:42:37 server kernel: [ 3481.726140] [<ffffffff817bb4a1>] schedule_timeout+0x201/0x2a0
Mar 22 12:42:37 server kernel: [ 3481.726143] [<ffffffff8101d749>] ? read_tsc+0x9/0x10
Mar 22 12:42:37 server kernel: [ 3481.726146] [<ffffffff810e6e5e>] ? ktime_get+0x3e/0xa0
Mar 22 12:42:37 server kernel: [ 3481.726147] [<ffffffff817b9240>] ? bit_wait+0x60/0x60
Mar 22 12:42:37 server kernel: [ 3481.726149] [<ffffffff817b8136>] io_schedule_timeout+0xa6/0x110
Mar 22 12:42:37 server kernel: [ 3481.726151] [<ffffffff817b925f>] bit_wait_io+0x1f/0x70
Mar 22 12:42:37 server kernel: [ 3481.726152] [<ffffffff817b8e90>] __wait_on_bit+0x60/0x90
Mar 22 12:42:37 server kernel: [ 3481.726155] [<ffffffff81176708>] ? find_get_pages_tag+0xc8/0x170
Mar 22 12:42:37 server kernel: [ 3481.726157] [<ffffffff81175980>] wait_on_page_bit+0xc0/0xd0
Mar 22 ...

Thomas Mayer (thomas303) wrote :

As described in my previous comment, the fs was available in read-write mode again.

I added another disk to the btrfs and with the additional space I was able to rerun and successfully finish the btrfs balance operation (as described in https://www.slicewise.net/debian/balancierung-eines-vollen-btrfs-dateisystems/ ):

btrfs balance start x
Done, had to relocate 2797 out of 2797 chunks

So my file system is completely rescued now.

To sum it up:

There is no way I could reproduce this error again with a recent kernel.

All I can say is that Kernel 4.2.0-27.32~14.04.1-generic 4.2.8-ckt1 ran into unexpected behaviour when doing a btrfs balance:

Expected Result: Error "enospc errors during balance" (at least)

Actual Result: Kernel bug in dmesg, Filesystem goes read-only, Input/Output errors occur, many directories are not readable even in read-only mode, even ls does not work. It seems as if the system totally screwed it up.

Suggested Fix: All of these problems should be avoidable because the later reboot "healed" them (Error "enospc errors during balance" reported, fs mounted read-write, but btrfs balance still fails, which is totally fine at this point)

Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers