kvm guest's cpu usage with virtio storage driver goes up to 100% because of flush process

Bug #703811 reported by Roman R on 2011-01-17
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
kvm (Ubuntu)
High
Unassigned

Bug Description

Hello,

Problem:
KVM guest's CPU usage goest upto 100% and more (if multi-proc or multi-cores) and makes guest running slowly for period of time when the flush process is running on a virtio storage drive.

How to get this problem:
just start the guest with virtio storage driver and make the guest do some storage related work (ie apt-get install mc) and watch top. after 2-3 sec the cpu usage will go to 100% and guest will run slowly (so slow, that even munin processes cant do their work in 5 minutes and create an e-mail report, that there are various .lock files present)

host:
ubuntu 10.04.1 LTS
2.6.32-27-server x86_64
2x Intel(R) Xeon(R) CPU E5620 (total of 16 cpus for os)
12GB RAM
2x software raids:
160GB and 2TB
the 160GB one is for system and 2TB one is for kvm images

guest:
Ubuntu 10.04.1 LTS
Disk: virtio
network: virtio

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

 869 root 20 0 0 0 0 S 117 0.0 0:17.80 flush-252:0
252 is the vda disk:
monitor@monitor:~$ ls -l /dev/vda
brw-rw---- 1 root disk 252, 0 2011-01-13 16:44 /dev/vda

Roman R (romeo-r) wrote :

same history with 10.10 which i'm currently running instead of 10.04.1 LTS

Roman R (romeo-r) wrote :

i still have this problem and cant use any of 10.04 nor 10.10 ubuntu linux in production. no-one interested?

Serge Hallyn (serge-hallyn) wrote :

Thanks for taking the time to report this bug and make Ubuntu better.

I've not sseen this behavior locally using virtio back-ends in Lucid. I wonder whether the software raid could have something to do with it. Is there any way you could hook up another disk, without raid, and try to reproduce?

Changed in kvm (Ubuntu):
status: New → Incomplete
importance: Undecided → High
Roman R (romeo-r) wrote :

Thank You for reply.

Actually i was thinking the same about software raid.
I will try kvm on my laptop and see if the problem still exists.
cant try on same server atm, there are some virtual m$ servers, which are in use.
i will report as soon as i can.

What we could do if it is all about software raid?

Serge Hallyn (serge-hallyn) wrote :

If the bug is about software RAID, then we'll re-target the Affects line and try to diagnose. The kernel team may be able to help in that case as well.

Thanks, looking forward to the results.

Roman R (romeo-r) wrote :

Well, just installed ubuntu 10.10 guest on the 10.10 host on my laptop and i see the same problem. its fresh install and all i've done just run apt-get update and apt-get dist-upgrade and the flush-252:0 process started to use from 12% upto 100% of cpu.

any ideas?

Quoting Roman R (<email address hidden>):
> Well, just installed ubuntu 10.10 guest on the 10.10 host on my laptop
> and i see the same problem. its fresh install and all i've done just run
> apt-get update and apt-get dist-upgrade and the flush-252:0 process
> started to use from 12% upto 100% of cpu.
>
> any ideas?

Thanks for testing that.

Next it would be helpful to test (separately) both the latest
kernel and the latest qemu-kvm versions.

The most recent qemu-kvm upstream build is at
https://launchpad.net/~ubuntu-server-edgers/+archive/server-edgers-qemu-kvm

The kernel daily build is at
http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/2011-01-19-natty/
It is for natty, but should work fine in maverick. You can just download
http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/2011-01-19-natty/linux-image-2.6.37-999-generic_2.6.37-999.201101190911_amd64.deb
and install it with dpkg -i.

Roman R (romeo-r) wrote :

i'll try to test it right a way. Thanks

Roman R (romeo-r) wrote :

1. installed the latest kernel, was not able to connect to QEMU in virt-manager :( it just stays in Connecting state.
2.. installed latest qemu-kvm and kvm packages, still not able to connect to it.
no information in logs :(
during the os boot got this error:
init: ureaded main process (265)

virt-manager is latest.

3. desided to boot from old kernel with new kvm/qemu and there is no more this error.
could You please release this patched qemu-kvm and kvm for maverick? thanks.

Serge Hallyn (serge-hallyn) wrote :

The latest upstream version has too many changes to backport to a stable release, but if we can identify one or a few patches which fix the problem, we can cherrypick those. In particular, I'm wondering whether the one below is part of the problem. I'll push a package to a ppa for you to test with hopefully later today.

commit de6c8042ec55da18702fa51f09072fcaa315edc3
Author: Stefan Hajnoczi <email address hidden>
Date: Fri May 14 22:52:30 2010 +0100

    virtio-blk: Avoid zeroing every request structure

    The VirtIOBlockRequest structure is about 40 KB in size. This patch
    avoids zeroing every request by only initializing fields that are read.
    The other fields are either written to or may not be used at all.

    Oprofile shows about 10% of CPU samples in memset called by
    virtio_blk_alloc_request(). The workload is
    dd if=/dev/vda of=/dev/null iflag=direct bs=8k running concurrently 4
    times. This patch makes memset disappear to the bottom of the profile.

    Signed-off-by: Stefan Hajnoczi <email address hidden>
    Signed-off-by: Kevin Wolf <email address hidden>

Roman R (romeo-r) wrote :

okay, i'll wait to get it tested.

Serge Hallyn (serge-hallyn) wrote :

The package is built at ppa:serge-hallyn/virt. You can browse to
https://launchpad.net/~serge-hallyn/+archive/virt?field.series_filter=maverick

to manually download the qemu-kvm packages, or use add-apt-repository. Please let us know if it gets any better.

Roman R (romeo-r) wrote :

i'm not able to update my current packages for my 10.10 host using this ppa repo. apt says no updates avalible.

Serge Hallyn (serge-hallyn) wrote :

Quoting Roman R (<email address hidden>):
> i'm not able to update my current packages for my 10.10 host using this
> ppa repo. apt says no updates avalible.

This might be because you still have the version from the server-edgers
ppa?

You should be able to

 apt-get remove qemu-kvm
 apt-get update
 apt-get install qemu-kvm

then make sure that

 dpkg -l | grep qemu

shows that you have version 0.12.5+noroms-0ubuntu7virtio1.

Roman R (romeo-r) wrote :

no, i try to update on the semi-production server, not on my laptop

atm i've got this:

ii qemu-common 0.12.5+noroms-0ubuntu7.1
ii qemu-kvm 0.12.5+noroms-0ubuntu7.1

....

Calculating upgrade... Done
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

Roman R (romeo-r) wrote :

and same on the laptop. ive removed the edgers repo, added yours, removed qemu-kvm, isntalled it back and same versions as above are on the semi-production server.

something is wrong.

Serge Hallyn (serge-hallyn) wrote :

Quoting Roman R (<email address hidden>):
> and same on the laptop. ive removed the edgers repo, added yours,
> removed qemu-kvm, isntalled it back and same versions as above are on
> the semi-production server.
>
> something is wrong.

Right, it's because you have the proposed pocket in your sources.list.
I've uploaded a new version to my ppa which will supersede the one in
-proposed. The last 3 builds have been pretty speedy, so hopefully this
will be available within the hour. After that,

 apt-get update && apt-get dist-upgrade

should install qemu-kvm version 0.12.5+noroms-0ubuntu7.1virtio2.diff.

Sorry about that.

Roman R (romeo-r) wrote :

ok. ill give it a try.

Roman R (romeo-r) wrote :

well, didt help much.
still flush process takes up to 46% cpu. i didnt tested much though.. with the edgers version it was 0-1% all the time.

Roman R (romeo-r) wrote :

anything?

Serge Hallyn (serge-hallyn) wrote :

Quoting Roman R (<email address hidden>):
> anything?

Back-porting the full 0.13.0 to maverick is not going to be allowed.
Finding the patches to cherrypick to fix this is on my to-do list.
The first step will be to re-create it myself reliably. I've not yet
had a chance to do that.

Roman R (romeo-r) wrote :

well, as problem solving takes so much time, i had to switch to proxmox, no such problems there. but i still have ubuntu 10.10 with kvm on my laptop, so if there will be something for testing, let me know i will be glad to help with this bug.

Changed in kvm (Ubuntu):
status: Incomplete → In Progress
assignee: nobody → Serge Hallyn (serge-hallyn)
Serge Hallyn (serge-hallyn) wrote :

I'm currently trying to reproduce this on a fully uptodate maverick
(no -updates) box. I have 3 lucid vm's up, each of which started as
a server instance and is doing 'apt-get install openoffice.org' to
tax disk (and network). I'm not getting the behavior you are seeing
though.

Each VM has its own 30G LVM volume for a root disk, and I'm running
each with:

 kvm -drive file=/dev/data/vm1,if=virtio,index=0,boot=on -m 1G -smp 2 -vnc :1
 kvm -drive file=/dev/data/vm2,if=virtio,index=0,boot=on -m 1G -smp 2 -vnc :2
 kvm -drive file=/dev/data/vm3,if=virtio,index=0,boot=on -m 1G -smp 2 -vnc :3

Of course at times each kvm process hits 100% of a cpu, but they're
absolutely not staying there, and I don't see anything like flush-252:0
even making it into 'top' results.

Serge Hallyn (serge-hallyn) wrote :

I've also tried with 3 VMs each with 30G qcow2 and raw formatted drives, still with no success in trying to reproduce.

Please give the full kvm command (and, if using libvirt, libvirt.xml files) for the hosts which cause this problem. Are you using ecryptfs to back the guest image files?

Changed in kvm (Ubuntu):
status: In Progress → Incomplete
Roman R (romeo-r) wrote :

i will provide config files as soon as i get to laptop. meanwhile i'm usind virt-manager to create kvm guests try it out.

Roman R (romeo-r) wrote :

Have You tried the virt-manager and do You still need my confs?
Sorry, i was away for a while.

Serge Hallyn (serge-hallyn) wrote :

Quoting Roman R (<email address hidden>):
> Have You tried the virt-manager and do You still need my confs?
> Sorry, i was away for a while.

No, sorry, I've not been in an enviroment where I could use the gui.
Please do upload your confs if you can. Otherwise I should be able
to run virt-manager by the end of the week.

Roman R (romeo-r) wrote :

<domain type='kvm'>
  <name>test</name>
  <uuid>cefb1457-6927-3fde-d0f1-a94c6c1aecb0</uuid>
  <memory>393216</memory>
  <currentMemory>393216</currentMemory>
  <vcpu>2</vcpu>
  <os>
    <type arch='x86_64' machine='pc-0.12'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/libvirt/images/test.img'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <disk type='block' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <address type='drive' controller='0' bus='1' unit='0'/>
    </disk>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>

and i run the kvm guest on image.

Changed in kvm (Ubuntu):
status: Incomplete → In Progress
Serge Hallyn (serge-hallyn) wrote :

At this point the only thing I"ve not reproduced is software raid. I simply cannot reproduce this, with or without libvirt, with virtio, with multiple CPUS and small or large amounts of memory, and with all of the xml segment above used as the basis for a libvirt VM definition.

Changed in kvm (Ubuntu):
status: In Progress → Incomplete
Serge Hallyn (serge-hallyn) wrote :

If you would, please provide all details needed to recreate your software RAID setup, and I'll see if I can reproduce that way.

Roman R (romeo-r) wrote :

I dont have any raids configured at my laptop but had on my production server and the reason, the reason why i've installed kvm on laptop was that i was thinking that its raid problem :)
and to reproduce (for me) only i've got to do is fresh ubuntu installation with kvm.
i dont know, may be its up to my hardware specs and some of kvm / libvirt code part. but the patch u've provided seems to solve the problem. so there must be something in the changelog

Serge Hallyn (serge-hallyn) wrote :

Hm, one thing we've never tried (or at least logged): do you get the same problem when you use non-virtio network?

Changed in kvm (Ubuntu):
assignee: Serge Hallyn (serge-hallyn) → nobody
Launchpad Janitor (janitor) wrote :

[Expired for kvm (Ubuntu) because there has been no activity for 60 days.]

Changed in kvm (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related questions