resize with PCI devices doesn't work

Bug #1368201 reported by Baodong (Robert) Li
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Moshe Levi

Bug Description

In icehouse, if an active instance without PCI passthrough devices is resized with PCI passthrough devices being added in the new flavor, the instance will be reported as ACTIVE after resizing, but becomes inaccessible. Worse, the whole compute service seems to be hosed, and killing it would cause the compute node to hang.

if an active instance with PCI passthrough devices is resized with more PCI passthrough devices in the new flavor, the resulting PCI devices in the instance's domain xml are more than being requested, and the same symptoms as described above will result as well.

Sean Dague (sdague)
Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Yongli He (yongli-he)
Changed in nova:
assignee: nobody → Yongli He (yongli-he)
tags: added: pci-passthrough
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/137530

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/154362

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/154363

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/154364

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/154365

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Yongli He (<email address hidden>) on branch: master
Review: https://review.openstack.org/137530
Reason: this patch is replaced by a patch set, based on collection back from instance method:
https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:pci_resize,n,z

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Joe Gordon (<email address hidden>) on branch: master
Review: https://review.openstack.org/154363
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Joe Gordon (<email address hidden>) on branch: master
Review: https://review.openstack.org/154362
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Joe Gordon (<email address hidden>) on branch: master
Review: https://review.openstack.org/154364
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Joe Gordon (<email address hidden>) on branch: master
Review: https://review.openstack.org/154365
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

tags: added: pci
removed: pci-passthrough
Changed in nova:
assignee: Yongli He (yongli-he) → nobody
status: In Progress → Confirmed
Yongli He (yongli-he)
Changed in nova:
assignee: nobody → Yongli He (yongli-he)
Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Yongli He (<email address hidden>) on branch: master
Review: https://review.openstack.org/154362
Reason: use another patch:https://review.openstack.org/#/c/154365/

Changed in nova:
assignee: Yongli He (yongli-he) → Moshe Levi (moshele)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Yongli He (<email address hidden>) on branch: master
Review: https://review.openstack.org/154365
Reason: use Moshe's patch set for resize.

Changed in nova:
assignee: Moshe Levi (moshele) → Jay Pipes (jaypipes)
Changed in nova:
assignee: Jay Pipes (jaypipes) → Moshe Levi (moshele)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/mitaka)

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/317064

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/317065

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/307124
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c2c3b97259258eec3c98feabde3b411b519eae6e
Submitter: Jenkins
Branch: master

commit c2c3b97259258eec3c98feabde3b411b519eae6e
Author: Moshe Levi <email address hidden>
Date: Mon Apr 18 14:33:59 2016 +0300

    pci: Move PCI devices and PCI requests into migration context

    When resizing guest to flavor with pci passthrogth, we need to drop
    the old pci devices and allocate new ones. To be able to do that we
    are leveraging the migration context (that used only for NUMA).

    Adds old and new PCI devices/ PCI requests into the MigrationContext
    object and uses the nova.pci.request.get_pci_requests_from_flavor()
    function to grab the set of requested PCI devices during a migration.

    Then, in the resource tracker's _update_usage_from_migration() call, we
    use the old and new PCI devices and PCI requests stored in the
    MigrationContext to properly account for changes.

    Closes-Bug: #1368201
    Co-Authored-by: Jay Pipes <email address hidden>

    Change-Id: Ie8690f2b7235d677ebe15fabaae81b0a6bda29de

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 14.0.0.0b2

This issue was fixed in the openstack/nova 14.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/mitaka)

Reviewed: https://review.openstack.org/317064
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=257cfb7e6f2f3414640c632909f78db6b71f40b3
Submitter: Jenkins
Branch: stable/mitaka

commit 257cfb7e6f2f3414640c632909f78db6b71f40b3
Author: Jay Pipes <email address hidden>
Date: Fri Apr 1 16:03:47 2016 -0700

    pci: pass in instance PCI requests to claim

    Removes the calls to InstancePCIRequests.get_XXX() from within the
    claims.Claim and claims.MoveClaim constructors and instead has the
    resource tracker construct the PCI requests and pass them into the
    constructor.

    This allows us to remove the needlessly duplicative _test_pci() method
    in claims.MoveClaim and will allow the next patch in the series to
    remove the call in nova.pci.manager.PciDevTracker.claim_instance() that
    re-fetches PCI requests for the supplied instance.

    Related-Bug: #1368201
    Related-Bug: #1582278

    Change-Id: Ib2cc7c985839fbf88b5e6e437c4b395ab484b1b6
    (cherry picked from commit 74fbff88639891269f6a0752e70b78340cf87e9a)

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/317065
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2150d5d8e17323bc9fce11903da3afffda211d26
Submitter: Jenkins
Branch: stable/mitaka

commit 2150d5d8e17323bc9fce11903da3afffda211d26
Author: Jay Pipes <email address hidden>
Date: Mon Apr 4 12:32:56 2016 -0700

    pci: eliminate DB lookup PCI requests during claim

    The nova.pci.manager.PciDevTracker.claim_instance() accepted an Instance
    object and called nova.objects.InstancePCIRequests.get_by_instance() to
    retrieve the PCI requests for the instance. This caused a DB lookup of
    the PCI requests for that instance, even though in all situations other
    than for migration/resize, the instance's PCI requests were already
    retrieved by the resource tracker.

    This change removes that additional DB lookup during claim_instance() by
    changing the instance parameter to instead be an InstancePCIRequests
    object and an InstanceNUMATopology object.

    Also in this patch is a change to nova.objects.PciDevice.claim() that
    changes the single parameter to an instance UUID instead of an Instance
    object, since nothing other than the instance's UUID was used in the
    method.

    Closes-Bug: #1582278
    Related-Bug: #1368201

    Change-Id: I9ab10c3035628f083233114b47b43a9b9ecdd166
    (cherry picked from commit 1f259e2a9423a4777f79ca561d5e6a74747a5019)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.