very sub-optimal default readahead settings on device and unused readahead setting in LVM

Bug #129488 reported by James Troup on 2007-07-31
70
This bug affects 7 people
Affects Status Importance Assigned to Milestone
lvm2 (Ubuntu)
Medium
Unassigned
Nominated for Gutsy by Lowell Alleman
Nominated for Hardy by Lowell Alleman
Nominated for Intrepid by Lowell Alleman
Nominated for Jaunty by Shaya Potter
Nominated for Karmic by Stefano Maioli

Bug Description

Binary package hint: lvm2

When you create an LV, the device ends up with a readahead of 256 512-byte sectors. A normal (non-LVM) device's readahead appears to default to 8192 512-byte sectors. This makes LVM FSs benchmark (and perform) very badly (our read rate dropped from 320M/s to 90M/s).

To make matters worse, LVM has an internal 'read ahead sectors' variable which is apparently unused, but still displayed by e.g. lvdisplay which just adds to the confusion.

This is apparently a known issue upstream:

http://linux.msede.com/lvm_mlist/archive/2004/06/0108.html

Kees Cook (kees) wrote :

From irc, work-around after boot is:

  blockdev --setra 8192 /dev/vg/lv

We need to find the right place to fix this by default. Kernel patch needed?

Changed in lvm2:
importance: Undecided → Medium
status: New → Confirmed
Kees Cook (kees) wrote :

And here's a udev hack...

James Troup (elmo) on 2007-07-31
description: updated

On Tue, Jul 31, 2007 at 09:06:14PM -0000, James Troup wrote:
> This is apparently a known issue upstream, but not considered a
> priority:

Actually patches for better readahead support are being worked on
by Zdenek Kabelac and are nearly ready.

Alasdair
--
<email address hidden>

At this time, the bug is still persistent in Gutsy. Difference at my setup: 70 mb/s vs 250 mb/s. Is it possible to give an update of the expected solution date?

James Troup (elmo) on 2007-12-14
description: updated
Lowell Alleman (lowell-alleman) wrote :

This issue appears to still be present on a fresh install of Hardy (Ubuntu 8.04) as well.

I'm in the process of setting up a new server, and I applied the "gross hack" posted by Kees Cook. I have applied this to my desktop system a while back, but was hoping that it would have been fixed by now.

Just to be sure this hasn't been fixed in some other way, I ran some quick read performance tests with "hdparm -t /dev/vg/lv". I tried values from 256 (default) up to 16384. I confirmed that 8192 seems to give the best performance (57 MB/sec), and 256 gives the lowest performance (29 MB/sec).

Anybody have an update on when this will get fixed officially?

Shaya Potter (spotter) wrote :

this bug still exists in intrepid and the fix mentioned helps.

Ulrik Mikaelsson (rawler) wrote :

Seems to exist in Jaunty as well

Jack Wasey (jackwasey) wrote :

n.b. readahead is only half of the problem.

my 4 disk RAID10 array is 2/3 of the speed of my disk disk RAID0 array for big sequential reads. this is complete madness, as there are twice as many disks to read from in the RAID10 case. ANyway, not strictly related to this bug, but the interaction of raid, lvm and overlying fs has clearly never been thought through for desktop users (who often have lots of disks nowadays).

my eyes were watering after 15 mins looking at the lvm, mdadm and ext3 man pages. stride, chunks, superblock offsets, mayhem.

Jack Wasey (jackwasey) wrote :

lvchange -r 8192 your-lv

seems to be safe and permanent, but does OS ignore this value (as suggested by the original poster)?

I'm not so sure it works in Ubuntu already. But it's been working fine in Debian
Testing for a while now.

Jack Wasey schreef:
> lvchange -r 8192 your-lv
>
> seems to be safe and permanent, but does OS ignore this value (as
> suggested by the original poster)?
>

Jack Wasey (jackwasey) wrote :

sudo blockdev --report

sudo blockdev --setra 16384 sd[abcd]

i've tried a few read-ahead amounts, and they all seem to show similar performance.
E.g. a raid 1 rebuild jumped from ~46 to ~52 MiB/s after command.

Jack Wasey (jackwasey) wrote :

(I meant when increasing ra from 256 to a large number)

Phillip Susi (psusi) wrote :

Normal disks also have a default 128kb readahead, not 4 MB, which seems to be quite sufficient. I don't see why you want the dm device layered on top of the physical disks doing MORE readahead, let alone 4 MB worth. If the setting given to lvchange is not respected though, that is a bug.

Thomas Danhorn (tdanhorn) wrote :

I suggest to mark this as "fixed". I did a few simple tests with hdparm -t /dev/xxx and while I am not convinced that this is necessarily the best, let on alone the only relevant measure of performance, I found no limitation in Karmic on a current 250 GB 7200 rpm laptop hard drive with the default setting of 256 sectors (as Phillip Susi indicated, this is the default for all block devices, both LVM and non-LVM). This agrees with Jack Wasey's observations. I get ~90+ MB/s for readaheads from 64 to 8192 sectors, and I have no reason to believe that there are is a setting within that range that would be significantly different from the others (on my system, with the hdparm test; there is some variability in repeated measurements even with the same settings). At 32 sectors there is a slight drop to ~85 MB/s, and settings of 2 and 8 sectors give ~27 and ~40 MB/s, respectively, which shows that the readahead settings do have an effect. To set the rumor to rest that the lvchange command has no effect (which was true in 2004 and perhaps later) - I tried with both lvchange -r and blockdev --setra, and they have the same effect - i.e. basically no change in performance between 64 and 8192 (highest I have tried), and a drop below 32. (Using blockdev --setra on the physical device a logical volume resides on, e.g. /dev/sdxy, rather than of the logical volume itself has no effect, by the way.)
Long story short, in my experience the default settings are fine and the lvchange command works as expected if you feel the need to change them. With SSD's this may have even less impact.

Phillip Susi (psusi) on 2010-05-13
Changed in lvm2 (Ubuntu):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers