live migration generates several network-changed events which lock up refreshing the nw info cache

Bug #1691602 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Medium
Matt Riedemann
Queens
In Progress
Medium
Matt Riedemann
neutron
Fix Released
Undecided
Matt Riedemann

Bug Description

Chris Friesen has reported that in Newton with a live migration that has ~16 ports per instance, the "network-changed" events generated from neutron when the vifs are unplugged from the source host can effectively block the network info cache refresh that's called at the end of the live migration operation. Details are in the IRC logs:

http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2017-05-17.log.html#t2017-05-17T22:50:31

But this stands out:

cfriesen mriedem: so it looks like _build_network_info_model() costs about 200ms plus about 125ms per port since we query each port separatly from neutron. and the refresh_cache lock is held the whole time

In Nova the 'network-changed' event is handled generically because there is no port id in the event, so nova just refreshes the entire nw info cache on the instance - which can be expensive and redundant since it's doing a lot of queries to Neutron to build up information about ports, fixed IPs, floating IPs, subnets and networks, and Neutron doesn't have bulk query APIs or allow OR filters in the API for bulk queries on things like floating IPs.

https://github.com/openstack/nova/blob/8d492c76d53f3fcfacdd945a277446bdfe6797b0/nova/compute/manager.py#L6854

Looking in neutron's code that sends the network-changed event, there is a port in scope, it's just not sent like for network-vif-deleted events.

We should be able to scope the network-changed event to a specific port on the neutron side and check for that on the nova side so we don't have to refresh the entire network info cache, but just the vif that was updated.

Matt Riedemann (mriedem)
tags: added: neutron
Changed in neutron:
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
Matt Riedemann (mriedem) wrote :

The messy thing with this is going to be the existing nw info cache refresh code in nova doesn't know how to handle a single port id, if it's given that it expects it's from attaching a port to an instance.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/465783

Changed in neutron:
status: New → In Progress
Matt Riedemann (mriedem)
Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/465787

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/465783
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=bf8e6007cfa50d461790be325e9e97b8b396ae47
Submitter: Jenkins
Branch: master

commit bf8e6007cfa50d461790be325e9e97b8b396ae47
Author: Matt Riedemann <email address hidden>
Date: Wed May 17 22:18:54 2017 -0400

    Send port ID in network-changed event to Nova

    When Nova gets a network-changed event, it rebuilds the
    entire network info cache for the instance if it does not
    have a specific port ID. This can be costly and redundant
    when performing something like a live migration with multiple
    ports attached to the same instance.

    This change simply adds the port ID to the network-changed event
    since we have it in scope. Nova can use it or not, but at least
    the information is provided for context.

    Change-Id: Ifdaef05208d09ddd9587fed6214cf388e5265ba4
    Closes-Bug: #1691602

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
Sean Dague (sdague) wrote :

Automatically discovered version newton in description. If this is incorrect, please update the description to include 'nova version: ...'

tags: added: openstack-version.newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.0.0b3

This issue was fixed in the openstack/neutron 11.0.0.0b3 development milestone.

tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/466449
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2f67b97577fa28f7e6ab23971e7d6f259889c992
Submitter: Jenkins
Branch: master

commit 2f67b97577fa28f7e6ab23971e7d6f259889c992
Author: Matt Riedemann <email address hidden>
Date: Fri May 19 22:58:08 2017 -0400

    Pull out code that builds VIF in _build_network_info_model

    This is a refactor that pulls the code out of the
    neutronv2 API _build_network_info_model method so that
    upcoming changes can use it for updating the network
    info cache for a specific port rather than rebuilding the
    entire list of VIFs every time Nova gets a network-changed
    event from Neutron.

    Change-Id: Ic5833f59152bbf5ee64300cdd2df32002708e096
    Related-Bug: #1691602

tags: removed: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/465787
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=537f480927293755bd6365068c1abd0c80951bad
Submitter: Zuul
Branch: master

commit 537f480927293755bd6365068c1abd0c80951bad
Author: Matt Riedemann <email address hidden>
Date: Wed May 17 22:48:29 2017 -0400

    Handle network-changed event for a specific port

    It turns out that when Neutron sends the "network-changed"
    event it has a port in scope and can provide the port ID.

    Older versions of Neutron wouldn't send this, but if it's
    provided, we can try to optimize the network info cache
    refresh and scope it to just that single port, rather than
    build the entire cache all over again when the other ports
    in the cache may not have changed at all since the last
    time the cache was refreshed.

    This can be especially beneficial for instances with a
    relatively large number of ports which are being migrated.

    Depends-On: Ifdaef05208d09ddd9587fed6214cf388e5265ba4

    Change-Id: I023b5b1ccb248e68189f62ba0ff75d41093c1f60
    Partial-Bug: #1691602

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/578612

Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/578612
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c8756d5a4a7c06d25705af6243ea8ce8fd148411
Submitter: Zuul
Branch: stable/queens

commit c8756d5a4a7c06d25705af6243ea8ce8fd148411
Author: Matt Riedemann <email address hidden>
Date: Wed May 17 22:48:29 2017 -0400

    Handle network-changed event for a specific port

    It turns out that when Neutron sends the "network-changed"
    event it has a port in scope and can provide the port ID.

    Older versions of Neutron wouldn't send this, but if it's
    provided, we can try to optimize the network info cache
    refresh and scope it to just that single port, rather than
    build the entire cache all over again when the other ports
    in the cache may not have changed at all since the last
    time the cache was refreshed.

    This can be especially beneficial for instances with a
    relatively large number of ports which are being migrated.

    Depends-On: Ifdaef05208d09ddd9587fed6214cf388e5265ba4

    Change-Id: I023b5b1ccb248e68189f62ba0ff75d41093c1f60
    Partial-Bug: #1691602
    (cherry picked from commit 537f480927293755bd6365068c1abd0c80951bad)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/465792

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/687410

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.opendev.org/687410

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.opendev.org/465792

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.