Ubuntu
linux package

Ubuntu 14 KVM Guest I/O Elevator Non-configurable

Bug #1346687 reported by Dusan Baljevic on 2014-07-22

This bug affects 4 people

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Invalid	Undecided	Unassigned

Bug Description

Ubuntu 14.04 guest runs pretty much standard standard configuration: KVM hypervisor, ACPI and APIC enabled, virtio disk bus, raw storage format...

While running this project, I am also involved in testing SSD drives across different Linux distributions and their support for discard option (TRIM).

To cut the story short, Ubuntu-based KVM guest did not set any scheduler on the boot disk vda):

# lsblk -io KNAME,TYPE,SCHED,ROTA,DISC-GRAN,DISC-MAX
KNAME TYPE SCHED ROTA DISC-GRAN DISC-MAX
sr0 rom deadline 1 0B 0B
vda disk 1 0B 0B
vda1 part 1 0B 0B
vda2 part 1 0B 0B
vda5 part 1 0B 0B
dm-0 lvm 1 0B 0B
dm-1 lvm 1 0B 0B

That is in stark contrast with other distributions running almost identical KVM guest configuration (with the exception of the root file system using BTRFS instead of EXT4 on SUSE).

Oracle Linux 6.5:

# lsblk -io KNAME,TYPE,SCHED,ROTA,DISC-GRAN,DISC-MAX
KNAME TYPE SCHED ROTA DISC-GRAN DISC-MAX
sr0 rom deadline 1 0B 0B
vda disk deadline 1 0B 0B
vda1 part deadline 1 0B 0B
vda2 part deadline 1 0B 0B
dm-0 lvm 1 0B 0B
dm-1 lvm 1 0B 0B

OpenSUSE 13.1:

# lsblk -io KNAME,TYPE,SCHED,ROTA,DISC-GRAN,DISC-MAX
KNAME TYPE SCHED ROTA DISC-GRAN DISC-MAX
sr0 rom cfq 1 0B 0B
vda disk cfq 1 0B 0B
vda1 part cfq 1 0B 0B
vda2 part cfq 1 0B 0B
dm-0 lvm 1 0B 0B
dm-1 lvm 1 0B 0B

Indeed, checking elevator capabilities for boot disk on Ubuntu KVM guest showed:

# cat /sys/block/vda/queue/scheduler
none

Other Linux distributions show more options:

# cat /sys/block/vda/queue/scheduler
noop deadline [cfq]

As well, attempts to change the elevator on Ubuntu guest fail. For example:

# echo noop > /sys/block/vda/queue/scheduler

# echo $?
0

# cat /sys/block/vda/queue/scheduler
none

Setting it globally in /etc/default/grub and updating it via update-grub2 fails too.

In the meantime, to automate check of elevators (schedulers) and discard support for SSD drives and thin-provisioned volumes on Linux, I wrote simple Perl script:

http://www.circlingcycle.com.au/Unix-sources/Linux-check-IO-scheduler-and-discard-support.pl.txt

Part of the results on RHEL 6.5 server would look like:

INFO: I/O elevator (scheduler) and discard support summary
INFO: Hard Disk sdb configured with I/O scheduler "cfq"
INFO: SSD sda configured with I/O scheduler "deadline" supports discard operation
INFO: Hard Disk sdc configured with I/O scheduler "cfq"
INFO: Hard Disk sdi configured with I/O scheduler "cfq"
INFO: Hard Disk sdh configured with I/O scheduler "cfq"

The Ubuntu KVM guest runs latest paches:

Linux ubuntu14-vm2.circlingcycle.com.au 3.13.0-24-generic
#47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Regards,

Dusan Baljevic VK2COT

Tags:

Revision history for this message

Ubuntu Foundations Team Bug Bot (crichton) wrote on 2014-07-22:

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1346687/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags:

added: bot-comment

Revision history for this message

Dusan Baljevic (ubuntuadm) wrote on 2014-07-22:

As suggested, I added the package for /bin/lsblk. That is probably good enough for initial bug report.

Dusan Baljevic

affects:

ubuntu → util-linux (Ubuntu)

Revision history for this message

Phillip Susi (psusi) wrote on 2014-07-22:

Your other distributions are running older kernels. This behavior was changed in the kernel because it does not make sense to run an elevator on virtual devices. You will notice the same thing on bare metal for lvm and raid: they no longer have their own elevator. Instead the IO is passed straight down the stack to the real disk where there is a single elevator managing all IO to that disk, whether it comes from different virtual machines, or logical volumes.

Changed in util-linux (Ubuntu):
status:	New → Invalid

Brian Murray (brian-murray) on 2014-07-22

tags:

added: trusty

Revision history for this message

Dusan Baljevic (ubuntuadm) wrote on 2014-07-23:

Thank you for the update. I am then not sure why it is also "wrong" in CentOS 7 guest. The inconsistency with other Linux distributions is what I am concerned about...

# uname -a
Linux centos7-vm2 3.10.0-123.4.2.el7.x86_64 #1 SMP
Mon Jun 30 16:09:14 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

# cat /sys/block/vda/queue/scheduler
noop [deadline] cfq

INFO: I/O elevator (scheduler) and discard support summary
INFO: Hard Disk vda configured with I/O scheduler "deadline"

All other KVM guests that I run exhibit different reports from Ubuntu 14.01:

OpenSUSE Linux 13.1 (3.11.10-17.x86_64 #1 SMP PREEMPT)

Oracle Linux 6.5 (3.8.13-35.1.2.el6uek.x86_64 #2 SMP)

CentOS Linux 7.0.1406 (3.10.0-123.4.2.el7.x86_64 #1 SMP)

Regards,

Dusan

Thank you for the update. I am then not sure why it is also "wrong" in CentOS 7 guest. The inconsistency with other Linux distributions is what I am concerned about...

# uname -a
Linux centos7-vm2 3.10.0-123.4.2.el7.x86_64 #1 SMP
Mon Jun 30 16:09:14 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

# cat /sys/block/vda/queue/scheduler 
noop [deadline] cfq

# ./Linux-check-IO-scheduler-and-discard-support.pl 
INFO: File systems and raids
NAME                         FSTYPE      LABEL UUID                                   MOUNTPOINT
sr0                                                                                   
vda                                                                                   
├─vda1                       xfs               a1652174-58b6-4e97-a3ca-4a2160a31100   /boot
└─vda2                       LVM2_member       fVCpR3-mtm0-J1KU-XG56-IvOx-9Use-tqS0o2 
  ├─centos_centos7--vm2-swap swap              ad6f9fd9-59a5-49b2-afe9-531a8b74a446   [SWAP]
  └─centos_centos7--vm2-root xfs               63a2a109-cb5a-4b75-ac63-b903d81df633   /

INFO: Block devices
NAME                         ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED    RQ-SIZE   RA WSAME
sr0                                  0    512      0     512     512    1 cfq          128  128    0B
vda                                  0    512      0     512     512    1 deadline     128 4096    0B
├─vda1                               0    512      0     512     512    1 deadline     128 4096    0B
└─vda2                               0    512      0     512     512    1 deadline     128 4096    0B
  ├─centos_centos7--vm2-swap         0    512      0     512     512    1              128 4096    0B
  └─centos_centos7--vm2-root         0    512      0     512     512    1              128 4096    0B

INFO: I/O elevator (scheduler) and discard support summary
INFO: Hard Disk vda configured with I/O scheduler "deadline"

All other KVM guests that I run exhibit different reports from Ubuntu  14.01:

OpenSUSE Linux 13.1 (3.11.10-17.x86_64 #1 SMP PREEMPT)

Oracle Linux 6.5 (3.8.13-35.1.2.el6uek.x86_64 #2 SMP)

CentOS Linux 7.0.1406 (3.10.0-123.4.2.el7.x86_64 #1 SMP)

Regards,

Dusan

Chris J Arges (arges) on 2014-10-24

affects:

util-linux (Ubuntu) → linux (Ubuntu)

Revision history for this message

Chris J Arges (arges) wrote on 2014-10-24:

This is a change in the kernel between 3.13 and 3.13-rc1. It also persists in the most recent kernel version. I believe this is still an invalid bug as what benefit would setting an I/O scheduler to anything be if the device is virtual? On a physical machine you are still able to set the I/O scheduler properly.

Revision history for this message

Bpkroth (bpkroth) wrote on 2016-05-19:

I hate to reopen an old bug, but I haven't seen this discussed anywhere else, and I think there is some validity in this.

Without the use of the cfq scheduler on block devices (virtual or not), the blkio cgroup controller doesn't perform any IO accounting (see below).

Without that, tools like systemd-cgtop don't show any I/O stats. Determining which (group of) processes is using the most IO can be very valuable even in a virtual environment.

It also prevents the use of ionice (which depends on cfq) on certain tasks within the VM, which again still have merit inside a VM, and the hypervisor can't affect once it reaches that level in the I/O path.

Now, for accounting at least, whether that's a fix that should be done in the choice of block device queue scheduler or the blkio cgroup controller, is perhaps up for debate. But as it stands, users/operators are prevented from making the choice of elevator for themselves which seems wrong to me. If I want to pay the CPU cost for performing IO accounting if not scheduling inside a VM, even if the underlying storage may turn around and reorder things with its own more global optimization view, I should still be allowed to.

</cent></cent>

Thanks,
Brian

This was done on a VMware VM since obviously I can't demonstrate it in a KVM VM, but the idea applies to physical machines as well. Tested running both 3.16 and 4.5 kernels.

(get the current process' blkio cgroup)
# grep blkio /proc/self/cgroup
7:blkio:/system.slice/ssh.service

(yeah, I know, I was missing pam_systemd in the stack, but whatever)

(check the current scheduler being used)
# cat /sys/block/sda/queue/scheduler
[noop] deadline cfq

(check the current stats)
# cat /sys/fs/cgroup/blkio/system.slice/ssh.service/blkio.io_serviced
Total 0

(induce some IO)
# time find /var/log -type f -print0 2>/dev/null | xargs -0 cat > /dev/null 2>/dev/null

real 0m4.539s
user 0m0.000s
sys 0m0.048s

(check the new blk io stats)
# cat /sys/fs/cgroup/blkio/system.slice/ssh.service/blkio.io_serviced
Total 0

(no change)

(switch to the cfq scheduler)
# echo cfq | sudo tee /sys/block/sda/queue/scheduler
cfq

(verify that the change happened)
# cat /sys/block/sda/queue/scheduler
noop deadline [cfq]

(check the current blkio stats)
# cat /sys/fs/cgroup/blkio/system.slice/ssh.service/blkio.io_serviced
8:0 Read 0
8:0 Write 0
8:0 Sync 0
8:0 Async 0
8:0 Total 0
Total 0

(look new fields)

(clear the page cache to make sure our next test actually causes io)
# echo 1 | sudo tee /proc/sys/vm/drop_caches
1

(induce io again)
# time find /var/log -type f -print0 2>/dev/null | xargs -0 cat > /dev/null 2>/dev/null

real 0m2.386s
user 0m0.000s
sys 0m0.040s

(check the blkio stats again)
# cat /sys/fs/cgroup/blkio/system.slice/ssh.service/blkio.io_serviced
8:0 Read 1827
8:0 Write 0
8:0 Sync 0
8:0 Async 1827
8:0 Total 1827
Total 1827

(look, data!)

I hate to reopen an old bug, but I haven't seen this discussed anywhere else, and I think there is some validity in this.

Without the use of the cfq scheduler on block devices (virtual or not), the blkio cgroup controller doesn't perform any IO accounting (see below).

Without that, tools like systemd-cgtop don't show any I/O stats.  Determining which (group of) processes is using the most IO can be very valuable even in a virtual environment.

Now, for accounting at least, whether that's a fix that should be done in the choice of block device queue scheduler or the blkio cgroup controller, is perhaps up for debate.  But as it stands, users/operators are prevented from making the choice of elevator for themselves which seems wrong to me.  If I want to pay the CPU cost for performing IO accounting if not scheduling inside a VM, even if the underlying storage may turn around and reorder things with its own more global optimization view, I should still be allowed to.

</cent></cent>

Thanks,
Brian

This was done on a VMware VM since obviously I can't demonstrate it in a KVM VM, but the idea applies to physical machines as well.  Tested running both 3.16 and 4.5 kernels.

(get the current process' blkio cgroup)
# grep blkio /proc/self/cgroup 
7:blkio:/system.slice/ssh.service

(yeah, I know, I was missing pam_systemd in the stack, but whatever)

(check the current scheduler being used)
# cat /sys/block/sda/queue/scheduler 
[noop] deadline cfq

(check the current stats)
# cat /sys/fs/cgroup/blkio/system.slice/ssh.service/blkio.io_serviced
Total 0

(induce some IO)
# time find /var/log -type f -print0 2>/dev/null | xargs -0 cat > /dev/null 2>/dev/null

real    0m4.539s
user    0m0.000s
sys     0m0.048s

(check the new blk io stats)
# cat /sys/fs/cgroup/blkio/system.slice/ssh.service/blkio.io_serviced
Total 0

(no change)

(switch to the cfq scheduler)
# echo cfq | sudo tee /sys/block/sda/queue/scheduler
cfq

(verify that the change happened)
# cat /sys/block/sda/queue/scheduler
noop deadline [cfq]

(check the current blkio stats)
# cat /sys/fs/cgroup/blkio/system.slice/ssh.service/blkio.io_serviced
8:0 Read 0
8:0 Write 0
8:0 Sync 0
8:0 Async 0
8:0 Total 0
Total 0

(look new fields)

(clear the page cache to make sure our next test actually causes io)
# echo 1 | sudo tee /proc/sys/vm/drop_caches
1

(induce io again)
# time find /var/log -type f -print0 2>/dev/null | xargs -0 cat > /dev/null 2>/dev/null
                                 
real    0m2.386s                 
user    0m0.000s                 
sys     0m0.040s

(check the blkio stats again)
# cat /sys/fs/cgroup/blkio/system.slice/ssh.service/blkio.io_serviced
8:0 Read 1827                    
8:0 Write 0                      
8:0 Sync 0                       
8:0 Async 1827                   
8:0 Total 1827                   
Total 1827

(look, data!)

Revision history for this message

Fabien COMBERNOUS (fc.) wrote on 2016-10-20:

any news ?

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1385123

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux package

Ubuntu 14 KVM Guest I/O Elevator Non-configurable

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches

Ubuntu
linux package