SR-IOV: sometimes a port may hang in BUILD state

Bug #1702635 reported by Oleg Bondarev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
Medium
Oleg Bondarev
neutron
Fix Released
Medium
Oleg Bondarev

Bug Description

Scenario:

1) vfio-pci driver is used for VFs
2) 2 ports are created in neutron with binding type 'direct'
3) VMs are spawned and deleted on 2 compute nodes using pre-created ports
4) one neutron port may be bound to different compute nodes at different
   moments
5) for some reason (probably a bug, but current bug is not about it)
   vfio-pci is not properly handling VF reset after VM deletion and for
   sriov agent it looks like some port's MAC is still mapped to some PCI
   slot though the port is not bound to the node
6) sriov agent requests port info from server with
   get_devices_details_list() but doesn't specify 'host' in parameters
7) in this case neutron server sets this port to BUILD, though it may be
   bound to another host:

    def _get_new_status(self, host, port_context):
        port = port_context.current
        if not host or host == port_context.host:
            new_status = (n_const.PORT_STATUS_BUILD if port['admin_state_up']
                          else n_const.PORT_STATUS_DOWN)
            if port['status'] != new_status:
                return new_status

8) after processing, the agent notifies server with update_device_list() and this time specifies 'host' parameter
9) server detects port's and agent's host mismatch and doesn't update status of the port
10) port stays in BUILD state

A simple fix would be to specify host at step 6 - in this case neutron server won't set port's status to BUILD because of host mismatch.

Tags: sriov-pci-pt
description: updated
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/480997

Changed in neutron:
status: Confirmed → In Progress
Changed in mos:
milestone: none → 9.2-mu-3
assignee: nobody → Oleg Bondarev (obondarev)
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/neutron (9.0/mitaka)

Reviewed: https://review.fuel-infra.org/35913
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0/mitaka

Commit: b1a0491ff1ae6d84a71f23c0b612decf30660368
Author: Oleg Bondarev <email address hidden>
Date: Mon Jul 10 15:51:15 2017

SR-IOV agent should specify host when requesting devices info

Otherwise neutron server may set BUILD status for ports not bound
to agent's host.

Closes-Bug: #1702635
Change-Id: I4c1eb8fdd9ad5b1365f0cbe87120f60a46d69daf

Changed in mos:
status: In Progress → Fix Committed
Changed in neutron:
milestone: none → pike-rc1
Revision history for this message
Ilya Bumarskov (ibumarskov) wrote :

Can't reproduce the bug on a test environment due to lack of appropriate HW. In accordance with our policy, fix should be verified on customer side.
Fix is present in snapshots/9.0-2017-10-16-142324

Changed in mos:
status: Fix Committed → Fix Released
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/neutron (mcp/1.0/mitaka)

Fix proposed to branch: mcp/1.0/mitaka
Change author: Oleg Bondarev <email address hidden>
Review: https://review.fuel-infra.org/38213

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/480997
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=6402cd37c9fdc1b21b0320f5c04a509112e783a8
Submitter: Zuul
Branch: master

commit 6402cd37c9fdc1b21b0320f5c04a509112e783a8
Author: Oleg Bondarev <email address hidden>
Date: Thu Jul 6 15:43:15 2017 +0400

    SR-IOV agent should specify host when requesting devices info

    Otherwise neutron server may set BUILD status for ports not bound
    to agent's host.

    Closes-Bug: #1702635
    Change-Id: Ic0aa2b5d8fb5ad682293ce2b8e44606ef862a62d

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.0.0b1

This issue was fixed in the openstack/neutron 13.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.