btrfs in 2.6.38 extremely slow (when near full?)

Bug #791992 reported by Mikael Nordfeldth
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I don't know yet if this is related to the disk being full, but the following reasons are why I suspect 2.6.38 to be the culprit:

1. I have a 2TB disk with btrfs filesystem on an encrypted (cryptsetup) LUKS partition
2. I set it up and installed using Ubuntu 10.04 Lucid, and recently upgraded all the way to 11.04 Natty
3. Having had no serious performance trouble with Lucid (2.6.32 kernel), I immediatly noticed a severe degradation in performance when running the 2.6.38-8 kernel that comes with Natty
4. Downgrading to 2.6.35-28 solves my performance issues.

Also, there's no difference between -o noatime or -o noatime,nodatacow etc.

I will try a newer kernel, 2.6.39 I guess, to see if it solves my performance issues.

(The disk is currently very full, I have not tried btrfs on a 2TiB disk with <90% used yet)

By degraded performance, I mean taking several minutes to 'rm' a small file (3-4MiB). File operations quickly queue up too, so there's no reasonable chance to be using that partition under 2.6.38-8 for now.

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: linux-image-2.6.38-8-generic-pae 2.6.38-8.42
ProcVersionSignature: Ubuntu 2.6.35-28.50-generic-pae 2.6.35.11
Uname: Linux 2.6.35-28-generic-pae i686
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: i386
ArecordDevices: Error: [Errno 2] No such file or directory
CRDA: Error: [Errno 2] No such file or directory
Date: Thu Jun 2 20:31:39 2011
HibernationDevice: RESUME=UUID=d04a0c29-d5b6-4f1c-b93e-c503e651b3c1
InstallationMedia: Ubuntu-Server 10.04 "Lucid Lynx" - Release Candidate i386 (20100419.1)
Lsusb:
 Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Supermicro P4SCE
PciMultimedia:

ProcEnviron:
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.35-28-generic-pae root=UUID=8dad604e-5e67-4a8f-a67b-69e3c42f83a8 ro quiet
RelatedPackageVersions:
 linux-restricted-modules-2.6.35-28-generic-pae N/A
 linux-backports-modules-2.6.35-28-generic-pae N/A
 linux-firmware 1.52
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to natty on 2011-06-02 (0 days ago)
dmi.bios.date: 10/30/2003
dmi.bios.vendor: Phoenix Technologies, LTD
dmi.bios.version: 6.00 PG
dmi.board.name: P4SCE
dmi.board.vendor: Supermicro
dmi.board.version: PCB Version
dmi.chassis.type: 3
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 1234567890
dmi.modalias: dmi:bvnPhoenixTechnologies,LTD:bvr6.00PG:bd10/30/2003:svnSupermicro:pnP4SCE:pvr1234567890:rvnSupermicro:rnP4SCE:rvrPCBVersion:cvnSupermicro:ct3:cvr1234567890:
dmi.product.name: P4SCE
dmi.product.version: 1234567890
dmi.sys.vendor: Supermicro

Revision history for this message
Mikael Nordfeldth (mmn) wrote :
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Mikael Nordfeldth (mmn) wrote :
Download full text (5.1 KiB)

I noticed 'rsyslogd' was doing an awful lot of work, and when checking /var/log/syslog I found lots of btrfs output when rsyncing (for backup):

------------[ cut here ]------------
WARNING: at /build/buildd/linux-2.6.35/fs/btrfs/extent-tree.c:3441 btrfs_block_rsv_check+0x1c3/0x220 [btrfs]()
Hardware name: P4SCE
Modules linked in: aes_i586 aes_generic ip6table_filter ip6_tables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT iptable_mangle xt_tcpudp iptable_filter sha256_generic ip_tables x_tables blowfish bridge stp dm_crypt ppdev parport_pc lp shpchp parport raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear btrfs zlib_deflate crc32c libcrc32c intel_agp e1000 agpgart floppy
Pid: 1731, comm: rsync Not tainted 2.6.35-28-generic-pae #50-Ubuntu
Call Trace:
 [<c0152b92>] warn_slowpath_common+0x72/0xa0
 [<f888b8e3>] ? btrfs_block_rsv_check+0x1c3/0x220 [btrfs]
 [<f888b8e3>] ? btrfs_block_rsv_check+0x1c3/0x220 [btrfs]
 [<c0152be2>] warn_slowpath_null+0x22/0x30
 [<f888b8e3>] btrfs_block_rsv_check+0x1c3/0x220 [btrfs]
 [<f889f66c>] __btrfs_end_transaction+0x17c/0x1f0 [btrfs]
 [<f889f70f>] btrfs_end_transaction+0xf/0x20 [btrfs]
 [<f889f926>] start_transaction+0x206/0x210 [btrfs]
 [<f889f98c>] btrfs_start_transaction+0x1c/0x20 [btrfs]
 [<f88a07e8>] __unlink_start_trans+0x38/0x410 [btrfs]
 [<c023b1af>] ? mntput_no_expire+0x1f/0xd0
 [<f88a67f3>] btrfs_unlink+0x33/0x110 [btrfs]
 [<c022c5ee>] ? inode_permission+0x9e/0xb0
 [<c022d757>] vfs_unlink+0x87/0xd0
 [<c022f2e7>] do_unlinkat+0x157/0x180
 [<c02237b5>] ? vfs_read+0x125/0x190
 [<c0223862>] ? sys_read+0x42/0x70
 [<c022f325>] sys_unlink+0x15/0x20
 [<c05f1f04>] syscall_call+0x7/0xb
---[ end trace d4686b8f4d931797 ]---
block_rsv size 788529152 reserved 321339392 freed 0 0
------------[ cut here ]------------
WARNING: at /build/buildd/linux-2.6.35/fs/btrfs/extent-tree.c:3441 btrfs_block_rsv_check+0x1c3/0x220 [btrfs]()
Hardware name: P4SCE
Modules linked in: aes_i586 aes_generic ip6table_filter ip6_tables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT iptable_mangle xt_tcpudp iptable_filter sha256_generic ip_tables x_tables blowfish bridge stp dm_crypt ppdev parport_pc lp shpchp parport raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear btrfs zlib_deflate crc32c libcrc32c intel_agp e1000 agpgart floppy
Pid: 1731, comm: rsync Tainted: G W 2.6.35-28-generic-pae #50-Ubuntu
Call Trace:
 [<c0152b92>] warn_slowpath_common+0x72/0xa0
 [<f888b8e3>] ? btrfs_block_rsv_check+0x1c3/0x220 [btrfs]
 [<f888b8e3>] ? btrfs_block_rsv_check+0x1c3/0x220 [btrfs]
 [<c0152be2>] warn_slowpath_null+0x22/0x30
 [<f888b8e3>] btrfs_block_rsv_check+0x1c3/0x220 [btrfs]
 [<f887f6d0>] ? btrfs_free_path+0x20/0x30 [btrfs]
 [<f88d0359>] ? btrfs_insert_orphan_item+0x79/0x90 [btrfs]
 [<f889f66c>] __btrfs_end_transaction+0x17c/0x1f0 [btrfs]
 [<f889f6f2>] btrfs_end_transaction_throttle+0x12/0x20 [btrfs]
 [<f88a6879>] btrfs_unlink+0xb9/0x110 [btrfs]
 [<c022d757>] vfs_unlink+0x87/0xd0
 [<c022f2e7>] do_unlinkat+0x...

Read more...

Revision history for this message
Mikael Nordfeldth (mmn) wrote :

...I'm trying to hunt this down now to see if this was perhaps not related to the latest kernel.

There were a lot of problems with "no space left" (ENOSPC) and by looking up the 'rsv_check' stuff for btrfs, I came upon this thread:
http://www.spinics.net/lists/linux-btrfs/msg05310.html

"To avoid ENOSPC oops, btrfs in 2.6.35 reserves more metadata space for
system use than older btrfs. If the FS has already ran out of metadata space,
using btrfs in 2.6.35 doesn't help."

Since I created the filesystem in pre-2.6.35, am I right to assume that no "metadata space increase function" (or whatever) has been implemented in post-2.6.35 either?

Revision history for this message
Csimbi (turbotalicska) wrote :

Hi all,
my RAID array is filling up, so this bug started to effect me as well.
My NAS is running from a 40GB SSD drive.

Linux CSNAS 2.6.38-8-server #42-Ubuntu SMP Mon Apr 11 03:49:04 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=11.04
DISTRIB_CODENAME=natty
DISTRIB_DESCRIPTION="Ubuntu 11.04"

I have a RAID6 setup, 16x2TB disks using an Adaptec 51645 card. It is a single EXT4 volume:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 11518469088 10420881676 1097587412 91% /mnt/raid6

Mount params:
# <file system> <mount point> <type> <options> <dump> <pass>
# RAID6 - backups
/dev/sda1 /mnt/raid6 ext4 suid,dev,exec,nodelalloc 0 0

I was able to transfer data with a sustained transfer speed of 96MB/sec from my PC to it via LAN (which is nearly the full Gigabit bandwidth) using Samba (client: Windows 7 x64 SP1).
Now that I passed the 90% mark, the write speeds have dropped to a very low rate and I get random speeds in the range of 1MB/sec to 14MB/sec.
At quite a few occasions Windows reports 'disk full' and the transfer is aborted; this is most annoying with large files (size>10GB) because I simply cannot transfer these anymore (they time out eventually).

Dunno if it's related, but since this is not a system disk, I have reduced the reserved space to 0 (meaning that I should be able to utilize the remaining space just the same).

Read speeds (no write):
cs@CSNAS:~$ sudo hdparm -t /dev/sda
/dev/sda:
 Timing buffered disk reads: 1208 MB in 3.00 seconds = 402.63 MB/sec

Read speeds (write):
cs@CSNAS:~$ sudo hdparm -t /dev/sda
/dev/sda:
 Timing buffered disk reads: 314 MB in 3.29 seconds = 95.51 MB/sec

Revision history for this message
Ketil Malde (ketil-ii) wrote :
Download full text (3.5 KiB)

I see this on 3.2.0-38-generic, I have 3x3TB striped volume with LZO, and it suddenly turned dog slow. It is about 90% full atm.

In dmesg I find a bunch of warnings:

[2737995.991648] btrfs: block rsv returned -28
[2737995.991649] ------------[ cut here ]------------
[2737995.991655] WARNING: at /build/buildd/linux-3.2.0/fs/btrfs/extent-tree.c:5985 use_block_rsv+0x17c/0x190 [btrfs]()
[2737995.991656] Hardware name: Precision WorkStation T7500
[2737995.991657] Modules linked in: ses enclosure usb_storage ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ext2 dm_crypt nfsd nfs lockd fscache auth_rpcgss binfmt_misc nfs_acl sunrpc lp snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep ppdev snd_pcm dell_wmi parport_pc snd_timer parport i7core_edac sparse_keymap snd shpchp soundcore dcdbas edac_core psmouse serio_raw mac_hid snd_page_alloc btrfs zlib_deflate libcrc32c usbhid hid mxm_wmi radeon mptsas ttm mptscsih firewire_ohci drm_kms_helper mptbase firewire_core tg3 crc_itu_t scsi_transport_sas drm wmi i2c_algo_bit
[2737995.991679] Pid: 45809, comm: btrfs-endio-wri Tainted: G W 3.2.0-38-generic #59-Ubuntu
[2737995.991680] Call Trace:
[2737995.991682] [<ffffffff81067e0f>] warn_slowpath_common+0x7f/0xc0
[2737995.991684] [<ffffffff81067e6a>] warn_slowpath_null+0x1a/0x20
[2737995.991690] [<ffffffffa023162c>] use_block_rsv+0x17c/0x190 [btrfs]
[2737995.991697] [<ffffffffa02343fd>] btrfs_alloc_free_block+0x3d/0x200 [btrfs]
[2737995.991699] [<ffffffff81122ecb>] ? __set_page_dirty_nobuffers+0x1b/0x20
[2737995.991708] [<ffffffffa0263b8a>] ? read_extent_buffer+0xea/0x180 [btrfs]
[2737995.991711] [<ffffffff8165d21e>] ? _raw_spin_lock+0xe/0x20
[2737995.991717] [<ffffffffa0223122>] __btrfs_cow_block+0x122/0x430 [btrfs]
[2737995.991722] [<ffffffffa0223526>] btrfs_cow_block+0xf6/0x220 [btrfs]
[2737995.991730] [<ffffffffa0272063>] ? btrfs_set_lock_blocking_rw+0x43/0xb0 [btrfs]
[2737995.991736] [<ffffffffa022723a>] btrfs_search_slot+0x34a/0x710 [btrfs]
[2737995.991743] [<ffffffffa0238988>] btrfs_lookup_file_extent+0x38/0x40 [btrfs]
[2737995.991751] [<ffffffffa02534bc>] btrfs_drop_extents+0xec/0x930 [btrfs]
[2737995.991753] [<ffffffff8116588c>] ? kmem_cache_alloc+0x10c/0x140
[2737995.991759] [<ffffffffa022251a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
[2737995.991767] [<ffffffffa0246b63>] insert_reserved_file_extent.constprop.52+0x73/0x270 [btrfs]
[2737995.991775] [<ffffffffa0243d4d>] ? start_transaction.part.25+0x7d/0x2a0 [btrfs]
[2737995.991783] [<ffffffffa024a72a>] btrfs_finish_ordered_io+0x2ca/0x320 [btrfs]
[2737995.991791] [<ffffffffa024a7cd>] btrfs_writepage_end_io_hook+0x4d/0xc0 [btrfs]
[2737995.991799] [<ffffffffa025edc9>] end_bio_extent_writepage+0x129/0x180 [btrfs]
[2737995.991801] [<ffffffff811adf3d>] bio_endio+0x1d/0x40
[2737995.991809] [<ffffffffa023d2ad>] end_workqueue_fn+0x5d/0xa0 [btrfs]
[2737995.991817] [<ffffffffa026c4f0>] worker_loop+0xa0/0x2b0 [btrfs]
[2737995.991818] [<ffffffff8165acbc>] ? __schedule+0x3cc/0x6f0
[2737995.991827] [<ffffffffa026c450>] ? check_pending_worker_creates.isra.1+0xf0/0xf0 [btrfs]
[2737995.991829] [<ffffffff8108b19c>] kthread+0x8c/0xa0
[2737995.991831] [<ffffffff...

Read more...

Revision history for this message
Ketil Malde (ketil-ii) wrote :

Forgot to add a link to what might be a similar problem:

http://<email address hidden>/msg20547.html

Revision history for this message
Ketil Malde (ketil-ii) wrote :

I think this is caused by poorly balanced disks. I.e:

nmdbio:~ % sudo btrfs fi show /dev/sdb
Label: 'scratch' uuid: 184706ea-89f5-438a-a9f5-b5e91b3ce267
        Total devices 3 FS bytes used 7.57TB
        devid 3 size 2.73TB used 2.45TB path /dev/sdc
        devid 2 size 2.73TB used 2.73TB path /dev/sdb
        devid 1 size 2.73TB used 2.73TB path /dev/sda

Btrfs v0.20-rc1

Note that two disks are entirely filled up, and only one has free space. If I were to venture a guess, btrfs struggles to shuffle data about in order to write multiple copies of metadata to each spindle.

Revision history for this message
Csimbi (turbotalicska) wrote :

@Ketil
I have lightning-fast hardware raid and the problem still occurs.
My guess is that it has something to do with fragmentation prevention: the "search for a continuous free block" takes too much time. I am not technically adept to check that however.

Revision history for this message
Ketil Malde (ketil-ii) wrote :

Upgraded to 12.10 and 3.5.0-26, the filesystem is still hosed. I'm rerunning a 'balance', but I notice with 'iostat' that disk bandwidth is very poorly utilized (getting a couple of MB/s from each disk, and mostly one at a time). Still, there is now quite a bit of space on all spindles, and it didn't seem to help much.

Revision history for this message
Csimbi (turbotalicska) wrote :

Yes, even the newest releases are prone to this problem.

Revision history for this message
Ketil Malde (ketil-ii) wrote :

After some time, bandwidth increased to reasonable values, and after some hours, the balance operation completed. After that, performance seems to be okay. I've also added autodefrag as a mount option in fstab, just in case. It worries me that the filesystem degrades like this though.

Previous mount options: btrfs defaults,noatime,compress=lzo

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.