libvirt+KVM: High CPU usage on Windows 10 (1803) guests

Bug #1805087 reported by Bernhard Denner on 2018-11-26
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Jack Ding

Bug Description

One of our clients recently started to use Windows 10 (update 1803) guest instances and reported very "slow responsiveness" of those instances. E.g. the boot up times are in a range of minutes whereas older Windows instances boot up in seconds.

After some tests with plain libvirt I cloud relate this issue to the following bug [1] (https://bugs.launchpad.net/qemu/+bug/1775702)

[1] and [2] suggests to enable Libvirt hyperv features 'synic' and 'stimer':
<features>
  <hyperv>
    <synic state='on'/>
    <stimer state='on'/>
    ...
  </hyperv>
  ...
</features>

However, since our on-prem environment is still running on Ocata on Ubuntu 16.04 I'm not able to use those settings on that environment. The only way to workaround that issues is enabling the 'HPET' timer:
<clock ...>
  <timer 'hpet' present='yes'/>
  ...
</clock>

whereas Nova disables this by default.

Having HPET configurable is already requested and discussed by blueprint [6] (https://blueprints.launchpad.net/nova/+spec/support-hpet-on-guest). Nevertheless, until this is available I've done a simple implementation of this to solve our current issues (patch attached)

Hopefully this is useful for anybody facing the same issues.

Environment
===========
$> lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.5 LTS
Release: 16.04
Codename: xenial

$> dpkg -l | grep nova
ii nova-api 2:15.1.3-0ubuntu1~cloud1 all OpenStack Compute - API frontend
ii nova-common 2:15.1.3-0ubuntu1~cloud1 all OpenStack Compute - common files
ii nova-compute 2:15.1.3-0ubuntu1~cloud1 all OpenStack Compute - compute node base
ii nova-compute-kvm 2:15.1.3-0ubuntu1~cloud1 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 2:15.1.3-0ubuntu1~cloud1 all OpenStack Compute - compute node libvirt support
ii nova-conductor 2:15.1.3-0ubuntu1~cloud1 all OpenStack Compute - conductor service
ii nova-consoleauth 2:15.1.3-0ubuntu1~cloud1 all OpenStack Compute - Console Authenticator
ii nova-novncproxy 2:15.1.3-0ubuntu1~cloud1 all OpenStack Compute - NoVNC proxy
ii nova-placement-api 2:15.1.3-0ubuntu1~cloud1 all OpenStack Compute - placement API frontend
ii nova-scheduler 2:15.1.3-0ubuntu1~cloud1 all OpenStack Compute - virtual machine scheduler
ii python-nova 2:15.1.3-0ubuntu1~cloud1 all OpenStack Compute Python libraries
ii python-novaclient 2:7.1.0-0ubuntu1~cloud0 all client library for OpenStack Compute API - Python 2.7

$> dpkg -l | grep qemu
ii ipxe-qemu 1.0.0+git-20150424.a25a16d-1ubuntu1.2 all PXE boot firmware - ROM images for qemu
ii qemu-block-extra:amd64 1:2.8+dfsg-3ubuntu2.9~cloud3 amd64 extra block backend modules for qemu-system and qemu-utils
ii qemu-kvm 1:2.8+dfsg-3ubuntu2.9~cloud3 amd64 QEMU Full virtualization
ii qemu-system-common 1:2.8+dfsg-3ubuntu2.9~cloud3 amd64 QEMU full system emulation binaries (common files)
ii qemu-system-x86 1:2.8+dfsg-3ubuntu2.9~cloud3 amd64 QEMU full system emulation binaries (x86)
ii qemu-utils 1:2.8+dfsg-3ubuntu2.9~cloud3 amd64 QEMU utilities

References
==========

[1] https://bugs.launchpad.net/qemu/+bug/1775702
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1610461
[3] https://forum.proxmox.com/threads/high-cpu-load-for-windows-10-guests-when-idle.44531/
[4] https://www.reddit.com/r/VFIO/comments/80p1q7/high_kvmqemu_cpu_utilization_when_windows_10/
[5] https://askubuntu.com/questions/1033985/kvm-high-host-cpu-load-after-upgrading-vm-to-windows-10-1803
[6] https://blueprints.launchpad.net/nova/+spec/support-hpet-on-guest
[7] https://specs.openstack.org/openstack/nova-specs/specs/stein/approved/support-hpet-on-guest.html

Matt Riedemann (mriedem) wrote :

Thanks for this very nicely detailed bug report, it sounds like you're OK with your solution and this should ultimately be resolved with the hpet blueprint in stein which you're already aware of. Given that, we'll likely close this as part of that blueprint since I'm not sure what kind of backportable fix we'd have for this.

Changed in nova:
status: New → Fix Released
importance: Undecided → Medium
assignee: nobody → Jack Ding (jackding)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers