The production machine hasn't had a lockup since moving to 3.15.7-031507-generic (it's been up for 4 days) even though we could reproduce the lockup on a new machine with that kernel using a snapshot of the old volume.
Another twist is that on the productino machine I'm now reliably seeing "No space left on device", even though there appears to be in principle 63GB remaining:
$ sudo btrfs fi show /path/to/volume
Label: none uuid: 3ffd71ab-6c3d-4486-a6b0-5c1eeb9be6b3
Total devices 1 FS bytes used 432.25GiB
devid 1 size 500.00GiB used 500.00GiB path /dev/dm-0
The ENOSPC is happening for mkdir and rename syscalls in particular.
I did a rebalance with `btrfs balance start -dusage=10` (increasing 10) to try and gain more space for metadata, but this didn't fix the situation. I did however get this stack trace in dmesg.
In the end, I had to enlarge the volume before it became usable again.
The production machine hasn't had a lockup since moving to 3.15.7- 031507- generic (it's been up for 4 days) even though we could reproduce the lockup on a new machine with that kernel using a snapshot of the old volume.
Another twist is that on the productino machine I'm now reliably seeing "No space left on device", even though there appears to be in principle 63GB remaining:
$ btrfs fi df /path/to/volume
Data, single: total=489.97GiB, used=427.75GiB
System, DUP: total=8.00MiB, used=60.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=5.00GiB, used=4.50GiB
Metadata, single: total=8.00MiB, used=0.00
unknown, single: total=512.00MiB, used=0.00
$ sudo btrfs fi show /path/to/volume 6c3d-4486- a6b0-5c1eeb9be6 b3
Label: none uuid: 3ffd71ab-
Total devices 1 FS bytes used 432.25GiB
devid 1 size 500.00GiB used 500.00GiB path /dev/dm-0
The ENOSPC is happening for mkdir and rename syscalls in particular.
I've posted a mail to the BTRFS list about this: http:// thread. gmane.org/ gmane.comp. file-systems. btrfs/37415
I did a rebalance with `btrfs balance start -dusage=10` (increasing 10) to try and gain more space for metadata, but this didn't fix the situation. I did however get this stack trace in dmesg.
In the end, I had to enlarge the volume before it became usable again.
[375794.106653] ------------[ cut here ]------------ COD/linux/ fs/btrfs/ extent- tree.c: 6946 use_block_ rsv+0xfd/ 0x1a0 [btrfs]() 031507- generic #201407281235 115>] dump_stack+ 0x46/0x58 eac>] warn_slowpath_ common+ 0x8c/0xc0 f96>] warn_slowpath_ fmt+0x46/ 0x50 d1d>] use_block_ rsv+0xfd/ 0x1a0 [btrfs] 687>] btrfs_alloc_ free_block+ 0x57/0x220 [btrfs] a3c>] btrfs_copy_ root+0xfc/ 0x2b0 [btrfs] 583>] ? create_ reloc_root+ 0x33/0x2c0 [btrfs] 743>] create_ reloc_root+ 0x1f3/0x2c0 [btrfs] eb8>] btrfs_init_ reloc_root+ 0xb8/0xd0 [btrfs] 967>] record_ root_in_ trans.part. 30+0x97/ 0x100 [btrfs] 9f4>] record_ root_in_ trans+0x24/ 0x30 [btrfs] eb1>] btrfs_record_ root_in_ trans+0x51/ 0x80 [btrfs] 3d6>] start_transacti on.part. 35+0x86/ 0x560 [btrfs] ee0>] ? btrfs_reduce_ alloc_profile. isra.48+ 0x80/0x160 [btrfs] e78>] ? finish_ task_switch+ 0x128/0x180 8d9>] start_transacti on+0x29/ 0x30 [btrfs] 9a7>] btrfs_join_ transaction+ 0x17/0x20 [btrfs] 764>] flush_space+ 0xf4/0x160 [btrfs] 98a>] reserve_ metadata_ bytes+0x1ba/ 0x450 [btrfs] 073>] ? generic_ permission+ 0xf3/0x120 10c>] ? security_ inode_permissio n+0x1c/ 0x30 450>] ? __wake_ up_sync+ 0x20/0x20 f3a>] btrfs_delalloc_ reserve_ metadata+ 0x16a/0x4a0 [btrfs] b3d>] __btrfs_ buffered_ write+0x15d/ 0x5c0 [btrfs] d9c>] ? handle_ pte_fault+ 0x18c/0x1b0 19f>] btrfs_file_ aio_write+ 0x1ff/0x3b0 [btrfs] 68a>] do_sync_ write+0x5a/ 0x90 2db>] vfs_write+ 0xcb/0x1f0 7df>] SyS_write+0x4f/0xb0 8bf>] tracesys+0xe1/0xe6
[375794.106676] WARNING: CPU: 1 PID: 24706 at /home/apw/
[375794.106678] BTRFS: block rsv returned -28
[375794.106679] Modules linked in: softdog tcp_diag inet_diag dm_crypt ppdev xen_fbfront fb_sys_fops syscopyarea sysfillrect sysimgblt i2c_piix4 serio_raw parport_pc parport mac_hid isofs xt_tcpudp iptable_filter xt_owner ip_tables x_tables btrfs xor raid6_pq crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd floppy psmouse
[375794.106702] CPU: 1 PID: 24706 Comm: twsearch.py Not tainted 3.15.7-
[375794.106703] Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/23/2014
[375794.106705] 0000000000001b22 ffff88016db437c8 ffffffff8176f115 0000000000000007
[375794.106707] ffff88016db43818 ffff88016db43808 ffffffff8106ceac ffff8801e4890000
[375794.106709] ffff8800a71ab9c0 ffff8801aedcd800 0000000000001000 ffff88001c987000
[375794.106711] Call Trace:
[375794.106718] [<ffffffff8176f
[375794.106721] [<ffffffff8106c
[375794.106723] [<ffffffff8106c
[375794.106731] [<ffffffffa00d9
[375794.106739] [<ffffffffa00de
[375794.106746] [<ffffffffa00c8
[375794.106757] [<ffffffffa013a
[375794.106767] [<ffffffffa013a
[375794.106776] [<ffffffffa0140
[375794.106784] [<ffffffffa00ee
[375794.106792] [<ffffffffa00ee
[375794.106800] [<ffffffffa00ef
[375794.106808] [<ffffffffa00f1
[375794.106815] [<ffffffffa00d1
[375794.106818] [<ffffffff8109b
[375794.106826] [<ffffffffa00f1
[375794.106834] [<ffffffffa00f1
[375794.106841] [<ffffffffa00d9
[375794.106848] [<ffffffffa00d9
[375794.106851] [<ffffffff811dd
[375794.106854] [<ffffffff812f0
[375794.106857] [<ffffffff810b5
[375794.106864] [<ffffffffa00da
[375794.106873] [<ffffffffa0102
[375794.106877] [<ffffffff8118b
[375794.106886] [<ffffffffa0103
[375794.106889] [<ffffffff811d2
[375794.106892] [<ffffffff811d3
[375794.106894] [<ffffffff811d3
[375794.106897] [<ffffffff81785
[375794.106898] ---[ end trace 1853311c87a5cd93 ]---