live-migration will not honor destination vcpu_pin_set config

Bug #1496135 reported by Nikola Đipanov on 2015-09-15
40
This bug affects 7 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Unassigned

Bug Description

Reporting this based on code inspection of the current master (commit: 9f61d1eb642785734f19b5b23365f80f033c3d9a)

When we attempt to live-migrate an instance onto a host that has a different vcpu_pin_set than the one that was on the source host, we may either break the policy set by the destination host or fail (as we will not recalculate the vcpu cpuset attribute to match that of the destination host, so we may end up with an invalid range).

The first solution that jumps out is to make sure the XML is updated in https://github.com/openstack/nova/blob/6d68462c4f20a0b93a04828cb829e86b7680d8a4/nova/virt/libvirt/driver.py#L5422

However that would mean passing over the requested info from the destination host.

Diana Clarke (diana-clarke) wrote :

Hi folks:

As a learning exercise[1], I tried to reproduce this bug.

Here are my notes from two test cases I tried using a multi-node devstack sandbox.

1. Source host vcpu_pin_set is larger than the destination host vcpu_pin_set:

    http://paste.openstack.org/show/473274/

2. Source host vcpu_pin_set is the same size as the destination host vcpu_pin_set, but with completely different values:

    http://paste.openstack.org/show/473286/

Conclusion: The live migration completed successfully in both cases, and it looks to me like the cpuset attribute was updated correctly to match the destination host policy, but perhaps I've misunderstood & failed to simulate a case that would trigger a migration failure or policy conflict.

Thoughts?

[1] I'm new to OpenStack.

Nikola Đipanov (ndipanov) wrote :

Thanks a lot for looking into this Diana!

So seems like it's not a bug and I missed the bit where the vcpu_pin_set gets updated when looking at the code. As stated from the bug - I reported this based only on reading the code.

I think we can move this to 'Invalid' - it'd be great to get the commit hash which you used to test this if possible.

Changed in nova:
status: New → Invalid
Diana Clarke (diana-clarke) wrote :

Thanks for reviewing my testing notes, Nikola!

I'm not 100% certain that I reproduced the case you thought might fail, but if you reviewed my testing notes and are okay with marking this as invalid, then I think we're probably good. Thanks again.

PS. I tested using nova's master branch at commit d7de04240dbfbda854875bf878ee24e7c1b7c2d8.

Changed in nova:
assignee: nobody → Diana Clarke (diana-clarke)
Nan Zhang (nanzhang) wrote :

in cases:
compute01
    vcpu_pin_set=4-31
compute02
    vcpu_pin_set=8-31

boot instance01 on compute01, grep virsh xml:
virsh dumpxml instance-0000006b | grep cpu
  <vcpu placement='static' cpuset='4-31'>2</vcpu>

live-migration instance01 to compute02, grep virsh xml:
virsh dumpxml instance-0000006b | grep cpu
  <vcpu placement='static' cpuset='4-31'>2</vcpu>

after live-migration instance's cpuset is not changed by compute02 vcpu_pin_set config, I thin it is a bug.

Changed in nova:
status: Invalid → Confirmed
Nan Zhang (nanzhang) on 2015-12-09
Changed in nova:
assignee: Diana Clarke (diana-clarke) → Nan Zhang (nanzhang)
Nan Zhang (nanzhang) on 2015-12-09
tags: added: live-migration numa
Nan Zhang (nanzhang) on 2015-12-09
summary: - libvirt live-migration will not honor destination vcpu_pin_set config
+ live-migration will not honor destination vcpu_pin_set config
Nan Zhang (nanzhang) wrote :

I test again

after live-migration, virsh edit instance, in xml cpuset value is right (same as dist node vcpu_pin_set ), not as wrong as dumpxml

reboot instance, dumpxml is corrected.

Changed in nova:
assignee: Nan Zhang (nanzhang) → nobody
Sujitha (sujitha-neti) on 2016-03-28
Changed in nova:
assignee: nobody → Sujitha (sujitha-neti)
Changed in nova:
assignee: Sujitha (sujitha-neti) → Nikola Đipanov (ndipanov)
status: Confirmed → In Progress
Changed in nova:
assignee: Nikola Đipanov (ndipanov) → sahid (sahid-ferdjaoui)
tangxing (tang-xing) on 2016-06-23
Changed in nova:
assignee: sahid (sahid-ferdjaoui) → tangxing (tang-xing)
tangxing (tang-xing) on 2016-06-23
Changed in nova:
assignee: tangxing (tang-xing) → nobody
tangxing (tang-xing) on 2016-06-24
Changed in nova:
assignee: nobody → Nikola Đipanov (ndipanov)
assignee: Nikola Đipanov (ndipanov) → sahid (sahid-ferdjaoui)
Changed in nova:
assignee: sahid (sahid-ferdjaoui) → Stephen Finucane (stephenfinucane)
GuoChang Tang (tanggc) wrote :

this is still not fixed?

Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status: In Progress → Confirmed
assignee: Stephen Finucane (stephenfinucane) → nobody
Sean Dague (sdague) wrote :

Found open reviews for this bug in gerrit, setting to In Progress.

review: https://review.openstack.org/286744 in branch: master

Changed in nova:
status: Confirmed → In Progress
assignee: nobody → Stephen Finucane (stephenfinucane)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/286744
Reason: This review is > 4 weeks without comment, and is not mergable in it's current state. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Matt Riedemann (mriedem) wrote :

I thought this might have been resolved in Train with this blueprint:

https://blueprints.launchpad.net/nova/+spec/numa-aware-live-migration

But that doesn't support the case of an instance with a vcpu pin set without a NUMA topology. Artom at one point had that mostly fixed in that series but it was complicating things very close to feature freeze so it was removed, but could probably be easily added back in as a fix in the Ussuri release.

Changed in nova:
assignee: Stephen Finucane (stephenfinucane) → nobody
status: In Progress → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers