[SRU] Incorrect host cpu is given to emulator threads when cpu_realtime_mask flag is set

Bug #1614054 reported by Sahid Orentino
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Sahid Orentino
Newton
Fix Committed
Medium
Lee Yarwood
Ubuntu Cloud Archive
Fix Released
Undecided
Unassigned
Declined for Newton by James Page
Mitaka
Fix Released
Medium
Edward Hope-Morley
nova (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Fix Released
Medium
Unassigned

Bug Description

[Impact]

This bug affects users of Openstack Nova who want to create instances that will leverage the realtime functionality that libvirt/qemu offers by, amongst other things, pinning guest vcpus and qemu emulator threads to specific pcpus. Nova provides the means for the user to control, via the flavor hw:cpu_realtime_mask or image property hw_cpu_realtime_mask, which physical cpus these resources will pinned to. This mask allows you to mask the set of N pins that Nova selects such that 1 or more of your vcpus can be declared "real-time" by ensuring that they do not have emulator threads also pinned to them. The remaining "non-realtime" vcpus will have vcpu and emulator threads colocated. The bug fixes the case where e.g. you have a guest that has 2 vcpus (logically 0 and 1) and Nova selects pcpus 14 and 22 and you have mask ^0 to indicate that you want all but the first vcpu to be realtime. This should result in the following being present in your libvirt xml for the guest:

  <cputune>
    <vcpupin vcpu='0' cpuset='14'/>
    <vcpupin vcpu='1' cpuset='22'/>
    <emulatorpin cpuset='14'/>
    <vcpusched vcpus='1' scheduler='fifo' priority='1'/>
  </cputune>

But (current only Mitaka since it does not have this patch) you will get this:

  <cputune>
    <vcpupin vcpu='0' cpuset='14'/>
    <vcpupin vcpu='1' cpuset='22'/>
    <emulatorpin cpuset='0'/>
    <vcpusched vcpus='1' scheduler='fifo' priority='1'/>
  </cputune>

i.e. Nova will always set the emulator pin to be id of the vcpu instead of the corresponding pcpu that it is pinned to.

In terms of actual impact this could result in vcpus that are supposed to be isolated not being so and therefore not behaving as expected.

[Test Case]

 * deploy openstack mitaka and configure nova.conf with vcpu-pin-set=0,1,2,3

   https://pastebin.ubuntu.com/25133260/

 * configure compute host kernel opts with "isolcpus=0,1,2,3" + reboot

 * create flavor with:

   openstack flavor create --public --ram 2048 --disk 10 --vcpus 2 --swap 0 test_flavor
   openstack flavor set --property hw:cpu_realtime_mask='^0' test_flavor
   openstack flavor set --property hw:cpu_policy=dedicated test_flavor
   openstack flavor set --property hw:cpu_thread_policy=prefer test_flavor
   openstack flavor set --property hw:cpu_realtime=yes test_flavor

 * boot instance with ^^ flavor

 * check that libvirt xml for vm has correct emulator pin cpuset #

[Regression Potential]

Since the patch being backported only touches the specific aread of code that was causing the original problem and that code only serves to select cpusets based on flavor filters, i can't think of any regressions that it would introduce. However, one potential side effect/change to be aware of is that once nova-compute is upgraded to this newer version, any new instances created will have the correct/expected cpuset assignments whereas instances created prior to upgrade will remain unchanged i.e. they will all likely still have their emulation threads pinned to the wrong pcpu. In terms of side effects this will mean less load on the pcpu that was previously incorrectly chosen for existing guests but it will mean that older instances will need to be recreated in order to benefit from the fix.

====

Description of problem:
When using the cpu_realtime and cpu_realtim_mask flag to create new instance, the 'cpuset' of 'emulatorpin' option is using the id of vcpu which is incorrect. The id of host cpu should be used here.

e.g.
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='3'/>
    <emulatorpin cpuset='0'/> ### the cpuset should be '2' here, when cpu_realtime_mask=^0.
    <vcpusched vcpus='1' scheduler='fifo' priority='1'/>
  </cputune>

How reproducible:
Boot new instance with cpu_realtime_mask flavor.

Steps to Reproduce:
1. Create RT flavor
nova flavor-create m1.small.performance 6 2048 20 2
nova flavor-key m1.small.performance set hw:cpu_realtime=yes
nova flavor-key m1.small.performance set hw:cpu_realtime_mask=^0
nova flavor-key m1.small.performance set hw:cpu_policy=dedicated
2. Boot a instance with this flavor
3. Check the xml of the new instance

Actual results:
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='3'/>
    <emulatorpin cpuset='0'/>
    <vcpusched vcpus='1' scheduler='fifo' priority='1'/>
  </cputune>

Expected results:
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='3'/>
    <emulatorpin cpuset='2'/>
    <vcpusched vcpus='1' scheduler='fifo' priority='1'/>
  </cputune>

Changed in nova:
assignee: nobody → sahid (sahid-ferdjaoui)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/356383

Changed in nova:
status: New → In Progress
Jay Pipes (jaypipes)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/356383
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6683bf9b7dc575ef9516f0cdc395b8da1b81c233
Submitter: Jenkins
Branch: master

commit 6683bf9b7dc575ef9516f0cdc395b8da1b81c233
Author: Sahid Orentino Ferdjaoui <email address hidden>
Date: Wed Aug 17 06:51:25 2016 -0400

    libvirt: fix incorrect host cpus giving to emulator threads when RT

    Realtime guarantees in certain operating systems require that the
    thread that is running the QEMU emulator is pinned to a physical CPU
    that is *not* the same as any physical CPU that the vCPUs for a
    realtime guest are pinned to. This patch ensures that the value of the
    hw:cpu_realtime_mask flavor extraspec property is respected when
    creating the libvirt configuration XML and sets emulatorpin values to
    a physical CPU matching the hw:cpu_realtime_mask value.

    Change-Id: I7f50dde0753b059a690dc50172fee645c94b8e5b
    Closes-Bug: #1614054

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/393487

Tony Breeds (o-tony)
tags: added: newton-backport-potential
removed: liberty-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.0.0b1

This issue was fixed in the openstack/nova 15.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/newton)

Reviewed: https://review.openstack.org/393487
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=046b640e2fbc243c6e9c55708d521e53f1485ddf
Submitter: Jenkins
Branch: stable/newton

commit 046b640e2fbc243c6e9c55708d521e53f1485ddf
Author: Sahid Orentino Ferdjaoui <email address hidden>
Date: Wed Aug 17 06:51:25 2016 -0400

    libvirt: fix incorrect host cpus giving to emulator threads when RT

    Realtime guarantees in certain operating systems require that the
    thread that is running the QEMU emulator is pinned to a physical CPU
    that is *not* the same as any physical CPU that the vCPUs for a
    realtime guest are pinned to. This patch ensures that the value of the
    hw:cpu_realtime_mask flavor extraspec property is respected when
    creating the libvirt configuration XML and sets emulatorpin values to
    a physical CPU matching the hw:cpu_realtime_mask value.

    Change-Id: I7f50dde0753b059a690dc50172fee645c94b8e5b
    Closes-Bug: #1614054
    (cherry picked from commit 6683bf9b7dc575ef9516f0cdc395b8da1b81c233)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 14.0.4

This issue was fixed in the openstack/nova 14.0.4 release.

Amad Ali (amad)
tags: added: mitaka-backport-potential
tags: added: sts sts-sru-needed
removed: mitaka-backport-potential
description: updated
Revision history for this message
James Page (james-page) wrote :

Resolved since newton; targeting to Xenial + the Mitaka UCA

Changed in cloud-archive:
status: New → Fix Released
Changed in nova (Ubuntu):
status: New → Fix Released
Changed in nova (Ubuntu Xenial):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Edward Hope-Morley (hopem) wrote :
Revision history for this message
Edward Hope-Morley (hopem) wrote :

debdiff re-uploaded due to formatting error in previous version

summary: - Incorrect host cpu is given to emulator threads when cpu_realtime_mask
- flag is set
+ [SRU] Incorrect host cpu is given to emulator threads when
+ cpu_realtime_mask flag is set
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Thank you for submitting the bug and uploading the fix!

Could we please make sure that the [Impact] field of a bug is used to nicely outline the impact the bug has on users? Without specific knowledge of the package itself it's hard for an SRU reviewer to see the actual impact of the bug and how important the fix is - and these we use to determine if a fix/change is worth the risk of including it in a stable release.

For instance here the [Impact] field is completely meaningless (the title doesn't give too much info for me) - I get some overview when seeing the patch description in the proposed upload, but this should be present here. If I would accept it the way it is, another SRU member that would like to release this change into -updates would have issues getting an overview of the impact.

Please update the bug description appropriately. The SRU template is there for something. I know it might sometimes look like that but it's not just useless paperwork that needs to be copy-pasted blindly.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Hi @sil2100, apologies for not providing an intuitive Impact field. I have updated it and hope it will be more helpful. Please let me know if you need more info.

description: updated
description: updated
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello sahid, or anyone else affected,

Accepted nova into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nova/2:13.1.4-0ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nova (Ubuntu Xenial):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-xenial
Revision history for this message
Edward Hope-Morley (hopem) wrote :

Verified for xenial-proposed:

root@orgen:~# dpkg -l| grep nova-compute
ii nova-compute 2:13.1.4-0ubuntu2 all OpenStack Compute - compute node base
ii nova-compute-kvm 2:13.1.4-0ubuntu2 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 2:13.1.4-0ubuntu2 all OpenStack Compute - compute node libvirt support
root@orgen:~# virsh dumpxml instance-00000001| grep cpuset
    <vcpupin vcpu='0' cpuset='17'/>
    <vcpupin vcpu='1' cpuset='5'/>
    <emulatorpin cpuset='17'/>

tags: added: verification-done verification-done-xenial
removed: verification-needed verification-needed-xenial
Revision history for this message
James Page (james-page) wrote :

Hello sahid, or anyone else affected,

Accepted nova into mitaka-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:mitaka-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-mitaka-needed to verification-mitaka-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-mitaka-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-mitaka-needed
Revision history for this message
Edward Hope-Morley (hopem) wrote :

Tested trusty-mitaka/proposed and lgtm:

  <cputune>
    <shares>2048</shares>
    <vcpupin vcpu='0' cpuset='17'/>
    <vcpupin vcpu='1' cpuset='5'/>
    <emulatorpin cpuset='17'/>
    <vcpusched vcpus='1' scheduler='fifo' priority='1'/>
  </cputune>

tags: added: verification-mitaka-done
removed: verification-mitaka-needed
Revision history for this message
Robie Basak (racb) wrote :

> [Regression Potential]
>
> None expected.

See https://wiki.ubuntu.com/StableReleaseUpdates#Procedure

If nobody can be bothered to think about what this update might regress, and then look for that during Ubuntu's SRU verification process, then I'm not prepared to risk user stability by releasing this update to Ubuntu.

If you object to this, then please seek a policy change in Ubuntu removing the requirement that this be checked.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Hi @racb I will provide a more informative regression potential.

description: updated
Revision history for this message
Brian Murray (brian-murray) wrote :

Will there be any messaging done about the need to recreate instances to benefit from this fix?

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nova - 2:13.1.4-0ubuntu2

---------------
nova (2:13.1.4-0ubuntu2) xenial; urgency=medium

  * d/p/libvirt-fix-incorrect-host-cpus-giving-to-emulator-t.patch:
    Backport fix for cpu pinning libvirt config incorrect emulator
    pin cpuset (LP: #1614054).

 -- Edward Hope-Morley <email address hidden> Wed, 26 Jul 2017 06:46:18 +0100

Changed in nova (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for nova has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Ryan Beisner (1chb1n) wrote :

The verification of the Stable Release Update for nova has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Ryan Beisner (1chb1n) wrote :

This bug was fixed in the package nova - 2:13.1.4-0ubuntu2~cloud0
---------------

 nova (2:13.1.4-0ubuntu2~cloud0) trusty-mitaka; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 nova (2:13.1.4-0ubuntu2) xenial; urgency=medium
 .
   * d/p/libvirt-fix-incorrect-host-cpus-giving-to-emulator-t.patch:
     Backport fix for cpu pinning libvirt config incorrect emulator
     pin cpuset (LP: #1614054).

tags: added: sts-sru-done
removed: sts-sru-needed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.