CONFIG_SCSI_MQ_DEFAULT default changed preventing use of IO schedulers

Bug #1397061 reported by Doug Smythies
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Andy Whitcroft

Bug Description

In kernel 3.18RC1 this kernel config parameter is: # CONFIG_SCSI_MQ_DEFAULT is not set
In kernel 3.18RC2 and beyond, the kernel config parameter is: CONFIG_SCSI_MQ_DEFAULT=y

This results in loss of the ability to set the IO scheduler via /sys/block/sda/queue/scheduler.

Now we get:

doug@s15:~$ cat /sys/block/sda/queue/scheduler
none

Where we are used to getting:

doug@s15:~/temp2$ cat /sys/block/sda/queue/scheduler
noop [deadline] cfq

From the add a CONFIG_SCSI_MQ_DEFAULT option commit message:

> Add a Kconfig option to enable the blk-mq path for SCSI by default
> to ease testing and deployment in setups that know they benefit
> from blk-mq.

How do we know that all systems benifit from blk-mq?

It seems complicated to have to re-compile the kernel to get the other IO scheduler options back.
Why isn't this option done similar to the others? I.E.

doug@s15:~/temp2$ cat /sys/block/sda/queue/scheduler
noop [deadline] cfq blk-mq

(and I realize that is actually an upstream question.)

By the way, my system does seem to benefit from blk-mq, I just didn't understand why I couldn't observe and change the IO scheduler anymore, and so isolated the change.

Experimental data:

Random seeks in a large file:
blk-mq: 104 seeks per second average
deadline: 74 seeks per second average
cfq: 74 seeks per second average
noop: 74 seeks per second average

Kernel compile:
deadline: 23 minutes 37.4 seconds
blk-mq: 23 minutes 35.4 seconds

Note 1: Please do not ask for all of my apport stuff, it is not needed for this bug report.
Note 2: on IRC "apw" asked me to enter this bug report

CVE References

Revision history for this message
Andy Whitcroft (apw) wrote :

The underlying issue is that the blk-mq support currently does _not_ have support for IO schedulers, so when this method is selected we can no longer use the IO schedulers at all. This is not ready to be a default selection without IO scheduler support so we will flip the default back to off in the next upload. It can of course be reneabled via the kernel command line (at least) for those who want to experiment with it.

affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
assignee: nobody → Andy Whitcroft (apw)
status: New → Triaged
Andy Whitcroft (apw)
summary: - Kernel Config setting: CONFIG_SCSI_MQ_DEFAULT
+ CONFIG_SCSI_MQ_DEFAULT default changed preventing use of IO schedulers
Andy Whitcroft (apw)
Changed in linux (Ubuntu):
importance: Undecided → High
status: Triaged → Fix Committed
Revision history for this message
Doug Smythies (dsmythies) wrote :

Thanks for your rapid attention on this one.

Just for completeness, I want to expand on this statement from the description: "my system does seem to benefit from blk-mq"
That is, unless the IO load is high. With very high disk IO load, things fall apart. For example I had a single disk seek time of 73 seconds, and a simple "ls -l" command that never did finish in over 1/2 an hour.

With blk-mq disabled, while the system still gets sluggish under extreme disk IO load, no single task experiences such horrendous neglect.

I even several occurrences of this:

Nov 28 08:12:28 s15 kernel: [42812.544892] INFO: task master:2225 blocked for more than 120 seconds.
Nov 28 08:12:28 s15 kernel: [42812.544934] Not tainted 3.18.0-rc6-250 #173
Nov 28 08:12:28 s15 kernel: [42812.544961] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

With blk-mq disabled, and under the same extreme disk IO load, the worst disk seek time is about 1 second.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.18.0-8.9

---------------
linux (3.18.0-8.9) vivid; urgency=low

  [ Leann Ogasawara ]

  * Release Tracking Bug
    - LP: #1407692
  * rebase to v3.18.1
  * ubuntu: AUFS -- Resolve build failure union has no member named
    'd_child'

  [ Upstream Kernel Changes ]

  * arm64: optimized copy_to_user and copy_from_user assembly code
    - LP: #1400349
  * x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit
    - LP: #1400314
    - CVE-2014-8134
  * rebase to v3.18.1
 -- Leann Ogasawara <email address hidden> Mon, 05 Jan 2015 09:12:32 -0800

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.