Neutron openvswitch-agent doesn't recover ports from binding_failed status

Bug #1399249 reported by Yair Fried
140
This bug affects 27 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Nir Magnezi

Bug Description

Ports created when neutron-openvswitch-agent is down are in status down and "binding:vif_type=binding_failed" which is as it should be. When the agent is rebooted it should be able to recreate the ports according to the DB, but instead it logs a WARNING and creates the port with status DOWN. only solution is to delete the port and create it again

From agent log:
2014-12-04 16:53:00.559 16319 WARNING neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-2dcc9141-7439-450a-bb2a-fe31ab577f47 None] Device 3dc73917-93b1-4f6d-a2e1-90c74cea6de7 not defined on plugin

Recreation steps:

shut down ovs-agent and wait for neutron to notice:

[root@RHEL7Server ~]# systemctl stop neutron-openvswitch-agent.service
[root@RHEL7Server ~(keystone_admin)]# neutron agent-list | grep open
| 2d97bbd1-b937-4b19-8205-4167bbcb659d | Open vSwitch agent | node_29 | xxx | True | neutron-openvswitch-agent |

create router and attach it to network
[root@RHEL7Server ~(keystone_admin)]# neutron router-create myrouter --ha False
Created a new router:
+-----------------------+--------------------------------------+
| Field | Value |
+-----------------------+--------------------------------------+
| admin_state_up | True |
| distributed | False |
| external_gateway_info | |
| ha | False |
| id | 8210f453-2a17-400e-ae32-74aa1503d0a5 |
| name | myrouter |
| routes | |
| status | ACTIVE |
| tenant_id | 183611eb84204b839e43d97c081973c0 |
+-----------------------+--------------------------------------+
[root@RHEL7Server ~(keystone_admin)]# neutron router-interface-add myrouter private
Added interface 3dc73917-93b1-4f6d-a2e1-90c74cea6de7 to router myrouter.
[root@RHEL7Server ~(keystone_admin)]# neutron l3-agent-list-hosting-router myrouter
+--------------------------------------+---------+----------------+-------+
| id | host | admin_state_up | alive |
+--------------------------------------+---------+----------------+-------+
| 0110d49c-59dd-496c-a2a3-549a2ad4ba4d | node_29 | True | :-) |
+--------------------------------------+---------+----------------+-------+

Port will show status DOWN, and "binding_failed"
[root@RHEL7Server ~(keystone_admin)]# neutron port-show 3dc73917-93b1-4f6d-a2e1-90c74cea6de7
+-----------------------+---------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+---------------------------------------------------------------------------------+
| admin_state_up | True |
| allowed_address_pairs | |
| binding:host_id | node_29 |
| binding:profile | {} |
| binding:vif_details | {} |
| binding:vif_type | binding_failed |
| binding:vnic_type | normal |
| device_id | 8210f453-2a17-400e-ae32-74aa1503d0a5 |
| device_owner | network:router_interface |
| extra_dhcp_opts | |
| fixed_ips | {"subnet_id": "d8881a14-bd8b-4595-b497-8da6587a46c1", "ip_address": "10.0.0.1"} |
| id | 3dc73917-93b1-4f6d-a2e1-90c74cea6de7 |
| mac_address | fa:16:3e:db:0f:9b |
| name | |
| network_id | 6091abc0-4fdf-402d-aaf0-3a955fabd6b7 |
| security_groups | |
| status | DOWN |
| tenant_id | 183611eb84204b839e43d97c081973c0 |
+-----------------------+---------------------------------------------------------------------------------+

start ovs-agent
systemctl start neutron-openvswitch-agent.service
[root@RHEL7Server ~(keystone_admin)]# neutron agent-list | grep open
| 2d97bbd1-b937-4b19-8205-4167bbcb659d | Open vSwitch agent | node_29 | :-) | True | neutron-openvswitch-agent |

Port will be forever down and even when restarting the agent won't bring it up

From agent log:
2014-12-04 16:53:00.559 16319 WARNING neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-2dcc9141-7439-450a-bb2a-fe31ab577f47 None] Device 3dc73917-93b1-4f6d-a2e1-90c74cea6de7 not defined on plugin

Workarounds:
1) Recreate the resource (VM, DHCP or router port)
2) Update DB and restart OVS agent as detailed in https://bugs.launchpad.net/neutron/+bug/1399249/comments/10.

Changed in neutron:
status: New → Confirmed
yalei wang (yalei-wang)
Changed in neutron:
assignee: nobody → yalei wang (yalei-wang)
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Is bringing port up and down helps? (I mean by updating it with admin_state_up False/True)

Changed in neutron:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/141044

Changed in neutron:
status: Confirmed → In Progress
Jian Wen (wenjianhn)
tags: added: icehouse-backport-potential
Kyle Mestery (mestery)
Changed in neutron:
milestone: none → kilo-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/141044
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=67e45d324af39a66fbd4ad175e410407b2720e68
Submitter: Jenkins
Branch: master

commit 67e45d324af39a66fbd4ad175e410407b2720e68
Author: Yalei Wang <email address hidden>
Date: Thu Jan 8 10:46:08 2015 +0800

    Add the rebinding chance in _bind_port_if_needed

    Make function _bind_port_if_needed to bind at least one time when the port's
    binding status passed in is already in binding_failed.

    Change-Id: I823ff5ca66833cdca459f13ab28f5075ae03ded3
    Closes-Bug: #1399249

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/157161

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/162260

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

The fix is going to be reverted in master. Reopening the bug and assigning it to myself. I have a patch that fixes the issue that resulted in regression and revert.

Changed in neutron:
status: Fix Committed → Confirmed
assignee: yalei wang (yalei-wang) → Ihar Hrachyshka (ihar-hrachyshka)
Changed in neutron:
status: Confirmed → In Progress
Kyle Mestery (mestery)
Changed in neutron:
milestone: kilo-3 → kilo-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/juno)

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: stable/juno
Review: https://review.openstack.org/157161
Reason: The fix for the bug is not in master, and if/once it will merge, it will be another patch.

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

Unscheduled from RC1 since it's not release critical, and there are review comments that suggest significant code refactoring before actual bug fix.

Changed in neutron:
milestone: kilo-rc1 → none
Revision history for this message
Sam Stoelinga (sammiestoel) wrote :

what's the current work around if this occurs...?

Revision history for this message
Sam Stoelinga (sammiestoel) wrote :

I used the following mysql query as workaround:

mysql
use neutron
update ml2_port_bindings set vif_type="unbound" where port_id="e5e3906b-24d8-4642-9876-2ac06f984809";

Changed in neutron:
assignee: Ihar Hrachyshka (ihar-hrachyshka) → Eugene Nikanorov (enikanorov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Kyle Mestery (<email address hidden>) on branch: master
Review: https://review.openstack.org/162260
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/184925

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by enikanorov (<email address hidden>) on branch: master
Review: https://review.openstack.org/184925
Reason: addressed by https://review.openstack.org/#/c/162260/

Revision history for this message
Assaf Muller (amuller) wrote :

Raised priority to high: This bug has been longstanding in Neutron for several releases now. It's high severity (No connectivity via the affected port) and there is currently no workaround aside from custom DB tinkering.

Changed in neutron:
importance: Medium → High
assignee: Eugene Nikanorov (enikanorov) → Ihar Hrachyshka (ihar-hrachyshka)
Revision history for this message
Claudiu Belu (cbelu) wrote :

I would like to see this in, it's been a common problem for us as well. Users often forget to add 'hyperv' as a mechanism_driver in the ml2 config file, which leads to ports with binding_failed status.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Kyle Mestery (<email address hidden>) on branch: master
Review: https://review.openstack.org/162260
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Zhenzan Zhou (zhenzan-zhou) wrote :

I just hit this issue and manually applying Yalei's fix helped me to bring the port back into active status.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

hi Ihar Hrachyshka, i don't know if you have had time to work on this. the code seams to be more or less complete so i would be happy to rebase it and rework the code to address the remaining open comments if you are busy with other work.

regards sean

Revision history for this message
Nir Magnezi (nmagnezi) wrote :

Hi Sean, I'm already on top of this. will send a new patch set shortly.

Changed in neutron:
assignee: Ihar Hrachyshka (ihar-hrachyshka) → Nir Magnezi (nmagnezi)
Revision history for this message
Artur Korzeniewski (artur-korzeniewski) wrote :

Hi, are we somewhere near to merge this into master? rc1?

Changed in neutron:
assignee: Nir Magnezi (nmagnezi) → Ihar Hrachyshka (ihar-hrachyshka)
Changed in neutron:
assignee: Ihar Hrachyshka (ihar-hrachyshka) → Nir Magnezi (nmagnezi)
Assaf Muller (amuller)
Changed in neutron:
milestone: none → liberty-rc1
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

Not gonna happen in L, moving to M.

Changed in neutron:
milestone: liberty-rc1 → mitaka-1
tags: removed: icehouse-backport-potential
Changed in neutron:
milestone: mitaka-1 → mitaka-2
Revision history for this message
Cedric Brandily (cbrandily) wrote :

It doesn't seem reasonable to set M2 as milestone

Changed in neutron:
assignee: Nir Magnezi (nmagnezi) → Cedric Brandily (cbrandily)
milestone: mitaka-2 → none
Assaf Muller (amuller)
description: updated
Changed in neutron:
assignee: Cedric Brandily (cbrandily) → Nir Magnezi (nmagnezi)
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

It's a long standing high impact bug with no clear workaround for operators (if you don't consider messing with database tables to be a proper workaround). I target it to M3.

Changed in neutron:
milestone: none → mitaka-3
Changed in neutron:
assignee: Nir Magnezi (nmagnezi) → Ihar Hrachyshka (ihar-hrachyshka)
Changed in neutron:
assignee: Ihar Hrachyshka (ihar-hrachyshka) → Nir Magnezi (nmagnezi)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/162260
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=63fe3a418c685ca90077f1dd4c35fd9ccf586fca
Submitter: Jenkins
Branch: master

commit 63fe3a418c685ca90077f1dd4c35fd9ccf586fca
Author: Ihar Hrachyshka <email address hidden>
Date: Mon Mar 9 18:05:18 2015 +0100

    Add the rebinding chance in _bind_port_if_needed

    Make function _bind_port_if_needed to bind at least one time when the port's
    binding status passed in is already in binding_failed.

    This is the second attempt to introduce the patch (the first one was
    reverted due to regression that broke Ironic), now with proper
    notification sent even when binding attempt failed.

    The patch also fixes several cases when we attempted to notify with a
    binding context that was not committed into database.

    The patch changes _attempt_binding to call _commit_port_binding only
    with the binding final state:
    1. Successful binding: will just call _commit_port_binding.
    2. Unsuccessful binding: will call _commit_port_binding at the final
    attempt to bind the port.
    This is in order to refrain from reverts, with will really complicate
    things even more.

    Co-Authored-By: Yalei Wang <email address hidden>
    Co-Authored-By: Nir Magnezi <email address hidden>
    Co-Authored-By: John Schwarz <email address hidden>
    Change-Id: I437290affd8eb87177d0626bf7935a165859cbdd
    Closes-Bug: #1399249

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/281349

Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/neutron 8.0.0.0b3

This issue was fixed in the openstack/neutron 8.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/281349
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f860bdd4b4dc7ec5241778f2a8be8f50256b00c8
Submitter: Jenkins
Branch: stable/liberty

commit f860bdd4b4dc7ec5241778f2a8be8f50256b00c8
Author: Ihar Hrachyshka <email address hidden>
Date: Mon Mar 9 18:05:18 2015 +0100

    Add the rebinding chance in _bind_port_if_needed

    Make function _bind_port_if_needed to bind at least one time when the port's
    binding status passed in is already in binding_failed.

    This is the second attempt to introduce the patch (the first one was
    reverted due to regression that broke Ironic), now with proper
    notification sent even when binding attempt failed.

    The patch also fixes several cases when we attempted to notify with a
    binding context that was not committed into database.

    The patch changes _attempt_binding to call _commit_port_binding only
    with the binding final state:
    1. Successful binding: will just call _commit_port_binding.
    2. Unsuccessful binding: will call _commit_port_binding at the final
    attempt to bind the port.
    This is in order to refrain from reverts, with will really complicate
    things even more.

    Co-Authored-By: Yalei Wang <email address hidden>
    Co-Authored-By: Nir Magnezi <email address hidden>
    Co-Authored-By: John Schwarz <email address hidden>
    Change-Id: I437290affd8eb87177d0626bf7935a165859cbdd
    Closes-Bug: #1399249
    (cherry picked from commit 63fe3a418c685ca90077f1dd4c35fd9ccf586fca)

tags: added: in-stable-liberty
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 7.0.4

This issue was fixed in the openstack/neutron 7.0.4 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.