Ubuntu 14 KVM Guest I/O Elevator Non-configurable

Bug #1346687 reported by Dusan Baljevic
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Ubuntu 14.04 guest runs pretty much standard standard configuration: KVM hypervisor, ACPI and APIC enabled, virtio disk bus, raw storage format...

While running this project, I am also involved in testing SSD drives across different Linux distributions and their support for discard option (TRIM).

To cut the story short, Ubuntu-based KVM guest did not set any scheduler on the boot disk vda):

# lsblk -io KNAME,TYPE,SCHED,ROTA,DISC-GRAN,DISC-MAX
KNAME TYPE SCHED ROTA DISC-GRAN DISC-MAX
sr0 rom deadline 1 0B 0B
vda disk 1 0B 0B
vda1 part 1 0B 0B
vda2 part 1 0B 0B
vda5 part 1 0B 0B
dm-0 lvm 1 0B 0B
dm-1 lvm 1 0B 0B

That is in stark contrast with other distributions running almost identical KVM guest configuration (with the exception of the root file system using BTRFS instead of EXT4 on SUSE).

Oracle Linux 6.5:

# lsblk -io KNAME,TYPE,SCHED,ROTA,DISC-GRAN,DISC-MAX
KNAME TYPE SCHED ROTA DISC-GRAN DISC-MAX
sr0 rom deadline 1 0B 0B
vda disk deadline 1 0B 0B
vda1 part deadline 1 0B 0B
vda2 part deadline 1 0B 0B
dm-0 lvm 1 0B 0B
dm-1 lvm 1 0B 0B

OpenSUSE 13.1:

# lsblk -io KNAME,TYPE,SCHED,ROTA,DISC-GRAN,DISC-MAX
KNAME TYPE SCHED ROTA DISC-GRAN DISC-MAX
sr0 rom cfq 1 0B 0B
vda disk cfq 1 0B 0B
vda1 part cfq 1 0B 0B
vda2 part cfq 1 0B 0B
dm-0 lvm 1 0B 0B
dm-1 lvm 1 0B 0B

Indeed, checking elevator capabilities for boot disk on Ubuntu KVM guest showed:

# cat /sys/block/vda/queue/scheduler
none

Other Linux distributions show more options:

# cat /sys/block/vda/queue/scheduler
noop deadline [cfq]

As well, attempts to change the elevator on Ubuntu guest fail. For example:

# echo noop > /sys/block/vda/queue/scheduler

# echo $?
0

# cat /sys/block/vda/queue/scheduler
none

Setting it globally in /etc/default/grub and updating it via update-grub2 fails too.

In the meantime, to automate check of elevators (schedulers) and discard support for SSD drives and thin-provisioned volumes on Linux, I wrote simple Perl script:

http://www.circlingcycle.com.au/Unix-sources/Linux-check-IO-scheduler-and-discard-support.pl.txt

Part of the results on RHEL 6.5 server would look like:

INFO: I/O elevator (scheduler) and discard support summary
INFO: Hard Disk sdb configured with I/O scheduler "cfq"
INFO: SSD sda configured with I/O scheduler "deadline" supports discard operation
INFO: Hard Disk sdc configured with I/O scheduler "cfq"
INFO: Hard Disk sdi configured with I/O scheduler "cfq"
INFO: Hard Disk sdh configured with I/O scheduler "cfq"

The Ubuntu KVM guest runs latest paches:

Linux ubuntu14-vm2.circlingcycle.com.au 3.13.0-24-generic
#47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Regards,

Dusan Baljevic VK2COT

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1346687/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Revision history for this message
Dusan Baljevic (ubuntuadm) wrote :

As suggested, I added the package for /bin/lsblk. That is probably good enough for initial bug report.

Dusan Baljevic

affects: ubuntu → util-linux (Ubuntu)
Revision history for this message
Phillip Susi (psusi) wrote :

Your other distributions are running older kernels. This behavior was changed in the kernel because it does not make sense to run an elevator on virtual devices. You will notice the same thing on bare metal for lvm and raid: they no longer have their own elevator. Instead the IO is passed straight down the stack to the real disk where there is a single elevator managing all IO to that disk, whether it comes from different virtual machines, or logical volumes.

Changed in util-linux (Ubuntu):
status: New → Invalid
tags: added: trusty
Revision history for this message
Dusan Baljevic (ubuntuadm) wrote :

Thank you for the update. I am then not sure why it is also "wrong" in CentOS 7 guest. The inconsistency with other Linux distributions is what I am concerned about...

# uname -a
Linux centos7-vm2 3.10.0-123.4.2.el7.x86_64 #1 SMP
Mon Jun 30 16:09:14 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

# cat /sys/block/vda/queue/scheduler
noop [deadline] cfq

# ./Linux-check-IO-scheduler-and-discard-support.pl
INFO: File systems and raids
NAME FSTYPE LABEL UUID MOUNTPOINT
sr0
vda
├─vda1 xfs a1652174-58b6-4e97-a3ca-4a2160a31100 /boot
└─vda2 LVM2_member fVCpR3-mtm0-J1KU-XG56-IvOx-9Use-tqS0o2
  ├─centos_centos7--vm2-swap swap ad6f9fd9-59a5-49b2-afe9-531a8b74a446 [SWAP]
  └─centos_centos7--vm2-root xfs 63a2a109-cb5a-4b75-ac63-b903d81df633 /

INFO: Block devices
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
sr0 0 512 0 512 512 1 cfq 128 128 0B
vda 0 512 0 512 512 1 deadline 128 4096 0B
├─vda1 0 512 0 512 512 1 deadline 128 4096 0B
└─vda2 0 512 0 512 512 1 deadline 128 4096 0B
  ├─centos_centos7--vm2-swap 0 512 0 512 512 1 128 4096 0B
  └─centos_centos7--vm2-root 0 512 0 512 512 1 128 4096 0B

INFO: I/O elevator (scheduler) and discard support summary
INFO: Hard Disk vda configured with I/O scheduler "deadline"

All other KVM guests that I run exhibit different reports from Ubuntu 14.01:

OpenSUSE Linux 13.1 (3.11.10-17.x86_64 #1 SMP PREEMPT)

Oracle Linux 6.5 (3.8.13-35.1.2.el6uek.x86_64 #2 SMP)

CentOS Linux 7.0.1406 (3.10.0-123.4.2.el7.x86_64 #1 SMP)

Regards,

Dusan

Chris J Arges (arges)
affects: util-linux (Ubuntu) → linux (Ubuntu)
Revision history for this message
Chris J Arges (arges) wrote :

This is a change in the kernel between 3.13 and 3.13-rc1. It also persists in the most recent kernel version. I believe this is still an invalid bug as what benefit would setting an I/O scheduler to anything be if the device is virtual? On a physical machine you are still able to set the I/O scheduler properly.

Revision history for this message
Bpkroth (bpkroth) wrote :

I hate to reopen an old bug, but I haven't seen this discussed anywhere else, and I think there is some validity in this.

Without the use of the cfq scheduler on block devices (virtual or not), the blkio cgroup controller doesn't perform any IO accounting (see below).

Without that, tools like systemd-cgtop don't show any I/O stats. Determining which (group of) processes is using the most IO can be very valuable even in a virtual environment.

It also prevents the use of ionice (which depends on cfq) on certain tasks within the VM, which again still have merit inside a VM, and the hypervisor can't affect once it reaches that level in the I/O path.

Now, for accounting at least, whether that's a fix that should be done in the choice of block device queue scheduler or the blkio cgroup controller, is perhaps up for debate. But as it stands, users/operators are prevented from making the choice of elevator for themselves which seems wrong to me. If I want to pay the CPU cost for performing IO accounting if not scheduling inside a VM, even if the underlying storage may turn around and reorder things with its own more global optimization view, I should still be allowed to.

</cent></cent>

Thanks,
Brian

This was done on a VMware VM since obviously I can't demonstrate it in a KVM VM, but the idea applies to physical machines as well. Tested running both 3.16 and 4.5 kernels.

(get the current process' blkio cgroup)
# grep blkio /proc/self/cgroup
7:blkio:/system.slice/ssh.service

(yeah, I know, I was missing pam_systemd in the stack, but whatever)

(check the current scheduler being used)
# cat /sys/block/sda/queue/scheduler
[noop] deadline cfq

(check the current stats)
# cat /sys/fs/cgroup/blkio/system.slice/ssh.service/blkio.io_serviced
Total 0

(induce some IO)
# time find /var/log -type f -print0 2>/dev/null | xargs -0 cat > /dev/null 2>/dev/null

real 0m4.539s
user 0m0.000s
sys 0m0.048s

(check the new blk io stats)
# cat /sys/fs/cgroup/blkio/system.slice/ssh.service/blkio.io_serviced
Total 0

(no change)

(switch to the cfq scheduler)
# echo cfq | sudo tee /sys/block/sda/queue/scheduler
cfq

(verify that the change happened)
# cat /sys/block/sda/queue/scheduler
noop deadline [cfq]

(check the current blkio stats)
# cat /sys/fs/cgroup/blkio/system.slice/ssh.service/blkio.io_serviced
8:0 Read 0
8:0 Write 0
8:0 Sync 0
8:0 Async 0
8:0 Total 0
Total 0

(look new fields)

(clear the page cache to make sure our next test actually causes io)
# echo 1 | sudo tee /proc/sys/vm/drop_caches
1

(induce io again)
# time find /var/log -type f -print0 2>/dev/null | xargs -0 cat > /dev/null 2>/dev/null

real 0m2.386s
user 0m0.000s
sys 0m0.040s

(check the blkio stats again)
# cat /sys/fs/cgroup/blkio/system.slice/ssh.service/blkio.io_serviced
8:0 Read 1827
8:0 Write 0
8:0 Sync 0
8:0 Async 1827
8:0 Total 1827
Total 1827

(look, data!)

Revision history for this message
Fabien COMBERNOUS (fc.) wrote :

any news ?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.