dmraid(fakeRAID) raid1 driver doesn't loadbalance reads

Bug #361733 reported by Timothy Miller
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

This is a kernel bug. Actually, it's intended behavior, but it shouldn't be.

In theory, fake RAID and software RAID should be the same thing. The only practical difference is where they get their metadata from. One from the system BIOS, the other from the disk. However, it turns out that they take two completely different code paths. The fakeraid (dmraid) driver reads only from the primary disk, while the software raid (mdraid) driver implements load balancing for RAID1. Of course, this difference isn't well-documented, resulting in people sometimes choosing the wrong one.

(1) There's no good reason why there should be two separate software RAID1 drivers. They should be merged.
(2) In the meanwhile, people should be splashed with loud warnings that if they choose fakeraid, they'll not get the improved read performance that they expect.

Here's the LKML discussion where I asked about this:
http://kerneltrap.org/index.php?q=mailarchive/linux-kernel/2008/7/7/2370344/thread

As far as I know, they haven't addressed this problem since. I'm reporting this here, because I see Ubuntu as the distro that seems to take responsibility for fixing a lot of the silliness that others won't.

affects: ubuntu → ubuntu-meta (Ubuntu)
Colin Watson (cjwatson)
affects: ubuntu-meta (Ubuntu) → linux (Ubuntu)
Revision history for this message
Tormod Volden (tormodvolden) wrote :

Thanks for your report. Development of the dmraid driver is on-going, but is done "upstream" and not by Ubuntu developers, actually more by RedHat developers. Maybe you should change the bug title to "dmraid does not do load balancing" or something more specific.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Timothy,

This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/lucid.

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 361733

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Timothy Miller (theosib) wrote :

Hi,

I stopped using Ubuntu some time ago, since it's still not possible to boot from a software RAID volume. (The initrd does not load the mdadm driver at boot time, so I have to use other distros if I want my root volume to use software RAID. I reported this bug in launchpad, but it's never been addressed.)

However, looking at the source code to 2.6.33, it appears that the same block drivers are still there. There are separate RAID1 implementations for "dmraid" and "mdraid". Although this is not an Ubuntu bug per se, it is definitely a Linux problem that affects Ubuntu and Ubuntu users.

The consequence of this is that people trying to set up software RAID won't know which type to choose. If they chose dmraid, they'll get poor read performance, since it doesn't do load balancing.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Timothy,
     thanks for the follow up. I'll bring this up with the team. In the meantime, is this something that has been discussed with the upstream kernel maintainers? I'd really like to take a look at the discussion surrounding this problem.

Thanks!

~JFo

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

nevermind, I just reread your initial opening comments. I'll take a look at the information you have already provided.

~JFo

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
Changed in linux (Ubuntu):
status: Expired → New
Revision history for this message
Meyer S. Jacobs (meyermagic) wrote :

This bug is still relevant to me.

I'm currently using Ubuntu 10.10 with RAID1 across two disks using dmraid for isw (Intel Software Raid), due to having a Windows 7-Ubuntu dual-boot machine.

I'll probably try reconfiguring my drive layout/raid setup soon and may post any workaround I find.

-Meyer S. Jacobs

Revision history for this message
Robert Collins (lifeless) wrote :

I've retitled this bug - while its true that there being two implementations is an issue, the actual problem folk are running into is the load balancing issue; beyond that the way in which it is solved is orthogonal (merge drivers, implement the same balancing algorithm, ...)

summary: - fakeRAID and software RAID are two different things
+ dmraid(fakeRAID) raid1 driver doesn't loadbalance reads
Revision history for this message
Robert Collins (lifeless) wrote :

Here is an implementation of load balancing; this chooses the closest drive for a read UI, which results in doing sequential IO from one drive (and this works a lot better than round-robin per-IO: dm mapping happens before IO request merging - doing round-robin results in no request merging and terrible performance).

Random IO (e.g. kernel builds, regular use) shows up on both disks in my mirror set, often in similar amounts.

Future iterations can build on this to do more advanced algorithms.

Revision history for this message
Timothy Miller (theosib) wrote :

Nice patch! It should be pushed upstream.

Also, is this design now superior to the mdraid implementation? Should similar changes be made over there?

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 361733] Re: dmraid(fakeRAID) raid1 driver doesn't loadbalance reads

Thanks. Just followed my nose ;).

So this implementation handles two sequential IO operations happening
at once as-well or better than the md implementation - the md
implementation tracks sequential IO at the array level and at the
result-of-an-IO level, but due to tagged IOs the latter can be
incorrect and the former fails if two different processes are both
doing sequential reads. The implementation I've put together will
either end up sticking on one disk for the IO's if both are closer
together, or will no sequential IO on both disks if the starting
position of each is closer to the relevant sectors.

tl;dr - both implementations still have tradeoffs.

merging of IO's happens separately; I think dm has space for a merge
routine in the layer, and we probably would benefit from that -
merging IO's sent to the raid1 driver would let us get bigger IO's we
can sensibly dispatch round-robin style - basically have both disks
reading and head moving roughly in sync.

I don't know if I'll get time to fiddle with that,but who knows ;)

-Rob

Revision history for this message
Timothy Miller (theosib) wrote :

A while back, I changed my doctoral specialization to computer architecture. But prior to that, I was studying AI, and as I think about this, this problem screams for a reinforcement learning approach. For N drives, there are 2N+1 actions, those being read to each drive, write to each drive, and possibly do-nothing (even if there is work to do). If we can map past actions, knowledge of the consequences of those actions, and information about time since last action to some reasonable number of "environment states," then it would be straightforward to implement a completely adaptive online optimization algorithm. It could also be tunable to favor throughput, latency, and even energy to different degrees. The general idea is to use measurements of actual performance to judge the effectiveness of past actions in achieving those goals and integrate that into future decisions. If we did it right, the algorithm would automatically adapt to differences in drive performance characteristics. Mind you, this might only get incremental improvements in performance over the more naive approaches. But I recall reading one paper about a DRAM controller scheduler that achieved a 22% throughput improvement over the best static method available at the time. (And as I continue to think about this, perhaps the best place to implement this first would be for reordering accesses to single drives.)

Sorry about the rant. :)

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in dianosing the problem. From a terminal window please run:

apport-collect 361733

and then change the status of the bug back to 'New'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Timothy Miller (theosib) wrote :

I personally cannot run this command because I do not have an Ubuntu system in this configuration at this time. Also, this bug is easily confirmed by simple inspection of the Linux kernel source. It behaves exactly in accordance with how the code is written, although how it's written is decidedly suboptimal.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Brad Figg (brad-figg) wrote :

@robert,

If you can get that patch accepted upstream we can get it through a stable patch. However, it isn't SRU material and we're not going to carry it on our own.

tags: added: kt-worked
tags: added: patch
Revision history for this message
Robert Collins (lifeless) wrote :

I've forwarded it upstream http://www.redhat.com/archives/dm-devel/2012-June/msg00013.html but had no reply so far. Any advice on how to get upstream to look at it would be wonderful.

Revision history for this message
Or Idgar (oridgar) wrote :

three years and this issue haven't been solved. (at least not as a proven release, rather unproved patches)
what we need to do to make it happen?
i'm using dual boot in my computer - windows 8 & ubuntu 12.10. using intel rst (rapid storage technology) and unfortunately i see that reading passed only to the first device (using iostat to monitor).

will be glad to hear if there are any news.

Revision history for this message
dbkaplun (dbkaplun) wrote :

It would be great if @lifeless could try attempting to contact dm-devel again. This is a very important patch that would provide large benefit across a variety of systems.

Revision history for this message
Jan Klamta (5-jan-c) wrote :

Hi,
I am just figuring out the performance issues - thanks guys for this bug description!
It looks like the problem remains unsolved:
$iostat -cd
Linux 4.4.0-63-generic (ruprecht) 21.2.2017 _x86_64_ (8 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
           1,52 0,01 0,41 13,03 0,00 85,04

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 142,33 1832,54 541,55 2174234 642525
sdb 18,86 0,08 541,55 92 642525
...
dm-0 158,76 1832,46 541,55 2174142 642525

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.