qemu-kvm takes 100% CPU when running redhat/centos 7.6 guest VM OS

Bug #1829696 reported by qianxi
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
QEMU
Expired
Undecided
Unassigned

Bug Description

Description
===========
When running redhat or centos 7.6 guest os on vm,
the cpu usage is very low on vm(100% idle), but on host,
qemu-kvm reports 100% cpu busy usage.

After searching some related bugs report,
I suspect that it is due to the clock settings in vm's domain xml.
My Openstack cluster uses the default clock settings as follow:
    <clock offset='utc'>
      <timer name='rtc' tickpolicy='catchup'/>
      <timer name='pit' tickpolicy='delay'/>
      <timer name='hpet' present='no'/>
    </clock>
And in this report, https://bugs.launchpad.net/qemu/+bug/1174654
it claims that <timer name='rtc' track='guest'/> can solve the 100% cpu usage problem when using Windows Image Guest OS,
but I makes some tests, the solusion dose not work for me.

Steps to reproduce
==================
* create a vm using centos or redhat 7.6 image
* using sar tool inside vm and host to check the cpu usage, and compare them

Expected result
===============
host's cpu usage report should be same with vm's cpu usage

Actual result
=============
vm's cpu usage is 100% idle, host's cpu usage is 100% busy

Environment
===========
1. Exact version of OpenStack you are running.
# rpm -qa | grep nova
openstack-nova-compute-13.1.2-1.el7.noarch
python2-novaclient-3.3.2-1.el7.noarch
python-nova-13.1.2-1.el7.noarch
openstack-nova-common-13.1.2-1.el7.noarch

2. Which hypervisor did you use?
   (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
   What's the version of that?
# libvirtd -V
libvirtd (libvirt) 3.9.0

# /usr/libexec/qemu-kvm --version
QEMU emulator version 2.6.0 (qemu-kvm-ev-2.6.0-28.el7_3.6.1), Copyright (c) 2003-2008 Fabrice Bellard

Logs & Configs
==============
The VM xml:
<domain type='kvm' id='29'>
  <name>instance-00005022</name>
  <uuid>7f5a66a5-****-****-****-75dec****bbb</uuid>
  <metadata>
    <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
      <nova:package version="13.1.2-1.el7"/>
      <nova:name>*******</nova:name>
      <nova:creationTime>2019-05-20 03:08:46</nova:creationTime>
      <nova:flavor name="2d2dab36-****-****-****-246e9****110">
        <nova:memory>2048</nova:memory>
        <nova:disk>12</nova:disk>
        <nova:swap>2048</nova:swap>
        <nova:ephemeral>0</nova:ephemeral>
        <nova:vcpus>1</nova:vcpus>
      </nova:flavor>
      <nova:owner>
        <nova:user uuid="********************">****</nova:user>
        <nova:project uuid="********************">****</nova:project>
      </nova:owner>
      <nova:root type="image" uuid="4496a420-****-****-****-b50f****ada3"/>
    </nova:instance>
  </metadata>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <cputune>
    <shares>1024</shares>
    <vcpupin vcpu='0' cpuset='27'/>
    <emulatorpin cpuset='27'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='1'/>
    <memnode cellid='0' mode='strict' nodeset='1'/>
  </numatune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <sysinfo type='smbios'>
    <system>
      <entry name='manufacturer'>Fedora Project</entry>
      <entry name='product'>OpenStack Nova</entry>
      <entry name='version'>13.1.2-1.el7</entry>
      <entry name='serial'>64ab0e89-****-****-****-05312ef66983</entry>
      <entry name='uuid'>7f5a66a5-****-****-****-75decaf82bbb</entry>
      <entry name='family'>Virtual Machine</entry>
    </system>
  </sysinfo>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.3.0'>hvm</type>
    <boot dev='hd'/>
    <smbios mode='sysinfo'/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>IvyBridge</model>
    <topology sockets='1' cores='1' threads='1'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='xsaveopt'/>
    <numa>
      <cell id='0' cpus='0' memory='2097152' unit='KiB'/>
    </numa>
  </cpu>
  <clock offset='utc'>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/data/instances/7f5a66a5-****-****-****-75decaf82bbb/disk'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/data/instances/7f5a66a5-****-****-****-75decaf82bbb/disk.swap'/>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/data/instances/7f5a66a5-****-****-****-75decaf82bbb/disk.config'/>
      <backingStore/>
      <target dev='hdd' bus='ide'/>
      <readonly/>
      <alias name='ide0-1-1'/>
      <address type='drive' controller='0' bus='1' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='piix3-uhci'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='ide' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
      <mac address='fa:16:3e:a6:ea:4f'/>
      <source bridge='brq52c66dc3-64'/>
      <bandwidth>
        <inbound average='102400'/>
        <outbound average='102400'/>
      </bandwidth>
      <target dev='tapa29e94e5-42'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='file'>
      <source path='/data/instances/7f5a66a5-****-****-****-75decaf82bbb/console.log'/>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial0'/>
    </serial>
    <serial type='pty'>
      <source path='/dev/pts/10'/>
      <target type='isa-serial' port='1'>
        <model name='isa-serial'/>
      </target>
      <alias name='serial1'/>
    </serial>
    <console type='file'>
      <source path='/data/instances/7f5a66a5-****-****-****-75decaf82bbb/console.log'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'>
      <alias name='input1'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input2'/>
    </input>
    <graphics type='vnc' port='5910' autoport='yes' listen='0.0.0.0' keymap='en-us'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <video>
      <model type='cirrus' vram='16384' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <stats period='10'/>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+107:+107</label>
    <imagelabel>+107:+107</imagelabel>
  </seclabel>
</domain>

CPU Usage Report inside VM:
# sar -u -P 0 1 5
Linux 3.10.0-957.el7.x86_64 (******) 05/20/2019 _x86_64_ (1 CPU)

11:34:40 AM CPU %user %nice %system %iowait %steal %idle
11:34:41 AM 0 0.00 0.00 0.00 0.00 0.00 100.00
11:34:42 AM 0 0.00 0.00 0.00 0.00 0.00 100.00
11:34:43 AM 0 0.00 0.00 0.00 0.00 0.00 100.00
11:34:44 AM 0 0.00 0.00 0.00 0.00 0.00 100.00
11:34:45 AM 0 0.00 0.00 0.00 0.00 0.00 100.00
Average: 0 0.00 0.00 0.00 0.00 0.00 100.00

CPU Usage Report ON HOST(the vm's cpu is pinned on host's no.27 physic cpu):
# sar -u -P 27 1 5
Linux 3.10.0-862.el7.x86_64 (******) 05/20/2019 _x86_64_ (48 CPU)

11:34:40 AM CPU %user %nice %system %iowait %steal %idle
11:34:41 AM 27 100.00 0.00 0.00 0.00 0.00 0.00
11:34:42 AM 27 100.00 0.00 0.00 0.00 0.00 0.00
11:34:43 AM 27 100.00 0.00 0.00 0.00 0.00 0.00
11:34:44 AM 27 100.00 0.00 0.00 0.00 0.00 0.00
11:34:45 AM 27 100.00 0.00 0.00 0.00 0.00 0.00
Average: 27 100.00 0.00 0.00 0.00 0.00 0.00

clocksource inside VM:
# cat /sys/devices/system/clocksource/clocksource0/current_clocksource
kvm_clock

Revision history for this message
qianxi (qianxi416) wrote :

It shows that only the version 7.6 of redhat or centos affected by this bug,
in my cluster, it is OK for versions from 6.8 to 7.5, but 7.6 is not normal.

Revision history for this message
Daniel Berrange (berrange) wrote :

> # libvirtd -V
> libvirtd (libvirt) 3.9.0
>
> # /usr/libexec/qemu-kvm --version
> QEMU emulator version 2.6.0 (qemu-kvm-ev-2.6.0-28.el7_3.6.1), Copyright (c) 2003-2008 Fabrice Bellard

This looks like the host is running CentOS, with packages dating from approx 7.3. This is very outdated given current version is 7.6. So I don't think it is worth spending time to debug until the host OS is upgraded to the latest QEMU/libvirt/kernel packages at least to see if the problem still reproduces.

affects: nova → qemu
Revision history for this message
qianxi (qianxi416) wrote :
Download full text (5.5 KiB)

I tested two newest QEMU versions these days, and sadly, the problem still happens.

I tried to find the reason why the qemu process take 100% usage of cpu, and collected some facts about it.
I compared the facts with other normal vm's qemu process(who's cpu usage is normal) and didn't turn out any interesting result.

Please give me some guides to debug this problem if you could, thanks very much.

(The full content of facts is in the attachment)
1.
======the command line to start a VM======
# ps -ef |grep 1567284 | cat
qemu 1567284 1 99 Jun21 ? 2-18:09:33 /usr/libexec/qemu-kvm -name guest=instance-0000530a,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-instance-0000530a/master-key.aes -machine pc-i440fx-4.0,accel=kvm,usb=off,dump-guest-core=off -cpu IvyBridge -m 2048 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object memory-backend-ram,id=ram-node0,size=2147483648,host-nodes=1,policy=bind -numa node,nodeid=0,cpus=0,memdev=ram-node0 -uuid 60d3ad85-1004-47e7-b2cb-5cf1a029ab47 -smbios type=1,manufacturer=Fedora Project,product=OpenStack Nova,version=13.1.2-1.el7,serial=c0700413-4a28-4e97-85c4-66fe3ec367dc,uuid=60d3ad85-1004-47e7-b2cb-5cf1a029ab47,family=Virtual Machine -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-1-instance-0000530a/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/data/instances/60d3ad85-1004-47e7-b2cb-5cf1a029ab47/disk,format=raw,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/data/instances/60d3ad85-1004-47e7-b2cb-5cf1a029ab47/disk.swap,format=raw,if=none,id=drive-virtio-disk1,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/data/instances/60d3ad85-1004-47e7-b2cb-5cf1a029ab47/disk.config,format=raw,if=none,id=drive-ide0-1-1,readonly=on,cache=none -device ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:21:2c:70,bus=pci.0,addr=0x3 -add-fd set=2,fd=31 -chardev file,id=charserial0,path=/dev/fdset/2,append=on -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on
root 1567288 2 0 Jun21 ? 00:00:01 [vhost-1567284]
root 1567291 2 0 Jun21 ? 00:00:00 [kvm-pit/1567284]

2.
===version of QEMU===
# /usr/libexec/qemu-kvm --version
QEMU emulator version 4.0.0
Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers

===version of libvirt===
# libvirtd -V
libvirtd (libvirt) 3.9.0

===version of kernal===
# uname -a
Lin...

Read more...

Revision history for this message
qianxi (qianxi416) wrote :

===vmstat on the host===
# vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r b swpd free buff cache si so bi bo in cs us sy id wa st
 2 0 0 252897184 366768 7497768 0 0 0 0 0 0 0 0 100 0 0
 1 0 0 252896752 366768 7497768 0 0 0 0 1394 367 2 0 98 0 0
 1 0 0 252896752 366768 7497768 0 0 0 0 1442 387 2 0 98 0 0
 1 0 0 252896752 366768 7497768 0 0 0 0 1479 470 2 0 98 0 0
 1 0 0 252896752 366776 7497768 0 0 0 12 1373 371 2 0 98 0 0

===vmstat on the VM===
# vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r b swpd free buff cache si so bi bo in cs us sy id wa st
 1 0 0 1490564 81708 238688 0 0 1 2 14 30 0 0 99 1 0
 0 0 0 1490564 81708 238688 0 0 0 0 29 55 0 0 100 0 0
 0 0 0 1490564 81708 238688 0 0 0 0 26 56 0 0 100 0 0
 0 0 0 1490564 81708 238688 0 0 0 0 17 31 0 0 99 0 0
 0 0 0 1490564 81708 238688 0 0 0 0 19 41 0 0 100 0 0

Revision history for this message
Thomas Huth (th-huth) wrote :

The QEMU project is currently considering to move its bug tracking to
another system. For this we need to know which bugs are still valid
and which could be closed already. Thus we are setting older bugs to
"Incomplete" now.

If you still think this bug report here is valid, then please switch
the state back to "New" within the next 60 days, otherwise this report
will be marked as "Expired". Or please mark it as "Fix Released" if
the problem has been solved with a newer version of QEMU already.

Thank you and sorry for the inconvenience.

Changed in qemu:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.