Device not found: exceptions in l3 grenade

Bug #1448148 reported by Baodong (Robert) Li
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Undecided
Kevin Benton
Kilo
Undecided
Unassigned

Bug Description

exceptions have been seen in screen-q-vpn.txt.gz at the gate, and it happens quite often recently:

     2015-04-23 15:55:56.236 ERROR neutron.agent.l3.agent [req-5ea4d9d1-66ab-444c-a66f-c48094f3582d None None] Failed to process compatible router 'aeb00076-1c9e-431d-973f-ce1123c918a7'
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent Traceback (most recent call last):
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent File "/opt/stack/old/neutron/neutron/agent/l3/agent.py", line 452, in _process_router_update
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent self._process_router_if_compatible(router)
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent File "/opt/stack/old/neutron/neutron/agent/l3/agent.py", line 404, in _process_router_if_compatible
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent self._process_added_router(router)
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent File "/opt/stack/old/neutron/neutron/agent/l3/agent.py", line 412, in _process_added_router
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent ri.process(self)
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent File "/opt/stack/old/neutron/neutron/common/utils.py", line 346, in call
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent self.logger(e)
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 85, in __exit__
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent six.reraise(self.type_, self.value, self.tb)
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent File "/opt/stack/old/neutron/neutron/common/utils.py", line 343, in call
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent return func(*args, **kwargs)
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent File "/opt/stack/old/neutron/neutron/agent/l3/router_info.py", line 605, in process
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent self._process_internal_ports()
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent File "/opt/stack/old/neutron/neutron/agent/l3/router_info.py", line 361, in _process_internal_ports
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent self.internal_network_added(p)
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent File "/opt/stack/old/neutron/neutron/agent/l3/router_info.py", line 312, in internal_network_added
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent INTERNAL_DEV_PREFIX)
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent File "/opt/stack/old/neutron/neutron/agent/l3/router_info.py", line 288, in _internal_network_added
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent prefix=prefix)
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent File "/opt/stack/old/neutron/neutron/agent/linux/interface.py", line 264, in plug
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent ns_dev.link.set_up()
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent File "/opt/stack/old/neutron/neutron/agent/linux/ip_lib.py", line 276, in set_up
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent self._as_root([], ('set', self.name, 'up'))
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent File "/opt/stack/old/neutron/neutron/agent/linux/ip_lib.py", line 222, in _as_root
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent use_root_namespace=use_root_namespace)
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent File "/opt/stack/old/neutron/neutron/agent/linux/ip_lib.py", line 69, in _as_root
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent log_fail_as_error=self.log_fail_as_error)
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent File "/opt/stack/old/neutron/neutron/agent/linux/ip_lib.py", line 78, in _execute
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent log_fail_as_error=log_fail_as_error)
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent File "/opt/stack/old/neutron/neutron/agent/linux/utils.py", line 137, in execute
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent raise RuntimeError(m)
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent RuntimeError:
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent Command: ['ip', 'netns', 'exec', u'qrouter-aeb00076-1c9e-431d-973f-ce1123c918a7', 'ip', 'link', 'set', u'qr-91aaad43-1e', 'up']
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent Exit code: 1
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent Stdin:
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent Stdout:
2015-04-23 15:55:56.236 3736 TRACE neutron.agent.l3.agent Stderr: Cannot find device "qr-91aaad43-1e"

Do a search at http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOiBcIkNhbm5vdCBmaW5kIGRldmljZVwiIEFORCBtZXNzYWdlOiBcInFyLVwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDI5ODgzMDUxOTIxfQ==

With the search string: message: "Cannot find device" AND message: "qr-", many occurrences have been seen.

This seems to be the reason for failures of check-grenade-dvm-neutron as seen in this review: https://review.openstack.org/#/c/176041/ although not of the test cases failed.

Changed in neutron:
assignee: nobody → Baodong (Robert) Li (baoli)
Revision history for this message
Baodong (Robert) Li (baoli) wrote :

This seems to be related to the port_delete() method in ovs_neutron_agent.py:

    def port_delete(self, context, **kwargs):
        port_id = kwargs.get('port_id')
        port = self.int_br.get_vif_port_by_id(port_id)
        # If port exists, delete it
        if port:
            self.int_br.delete_port(port.port_name)

This method was added and then removed and recently added again:
commit f87a74bfa83eeb859dfd047719622b54cdb5f68b
commit 294019139d575bd7144cfcc229c98c8497bfbf7c
commit d6a55c17360d1aa8ca91849199987ae71e8600ee

The method indiscriminately remove a port from the ovs bridge regardless if the port is added by the ovs agent or not. This is not right, and will result in tons of race conditions. Also have seen bug fixes that catch the exception and ignores it.

Revision history for this message
Baodong (Robert) Li (baoli) wrote :

The original bug is https://bugs.launchpad.net/neutron/+bug/1333365: Deleting a VM port does not remove security rules associated to VM port in ip tables. I reverted the change and tried to see if the same issue would still exist. It turned out after deleting the VM, the iptables rules associated with it are also removed.

Therefore, I'm going to revert the change. I'll add the original bug fixers so that they can verify.

Revision history for this message
Baodong (Robert) Li (baoli) wrote :

will keep the delete notifiers and topics, but only revert the change in ovs_neutron_agent.py and the unit test.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/178666

Changed in neutron:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/179314

Changed in neutron:
assignee: Baodong (Robert) Li (baoli) → Kevin Benton (kevinbenton)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/179314
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e007167a700aa5b80ecb48adff0ac36bb330591d
Submitter: Jenkins
Branch: master

commit e007167a700aa5b80ecb48adff0ac36bb330591d
Author: Kevin Benton <email address hidden>
Date: Thu Apr 30 17:14:44 2015 -0700

    Don't delete port from bridge on delete_port event

    Commit d6a55c17360d1aa8ca91849199987ae71e8600ee added
    logic to the OVS agent to delete a port from the integration
    bridge when a port was deleted on the Neutron side. However,
    this led to several races where whoever created the initial
    port (e.g. Nova, L3 agent, DHCP agent) would be trying to
    remove the port from the bridge at the same time. These
    would result in ugly exceptions on one side or the other.

    The original commit was trying to address the problem where
    the port would maintain connectivity even though it was removed
    from the integration bridge.

    This patch addresses both cases by removing the iptables rules
    for the deleted port and putting it in the dead VLAN so it loses
    connectivity. However, it still leaves the port attached to the
    integration bridge so the original creator can delete it.

    Related-Bug: #1333365
    Closes-Bug: #1448148
    Change-Id: I7ae7750b7ac7d15325ed9f2d517ca171543b53be

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/187795

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/178666
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0ace88fd4a75ff213dc36fd16c1f8e7080ab7d6d
Submitter: Jenkins
Branch: master

commit 0ace88fd4a75ff213dc36fd16c1f8e7080ab7d6d
Author: Robert Li <email address hidden>
Date: Fri May 8 11:08:45 2015 -0400

    Add VIF_DELETED notification event to Nova

    It's possible to delete a neutron port that is currently associated
    with an instance. When it happens, neutron should notify nova of the
    port deletion event so that Nova can take proper actions.

    Refer to I998b6bb80cc0a81d665b61b8c4a424d7219c666f for the nova patch
    that handles the event.

    Change-Id: Iff88cd12ae18017ef3e776821bcf3ecf3b4f052f
    Related-Bug: #1333365
    Related-Bug: #1448148

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/kilo)

Reviewed: https://review.openstack.org/187795
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=81e043f72135682510727c9fa9bafe7efa676717
Submitter: Jenkins
Branch: stable/kilo

commit 81e043f72135682510727c9fa9bafe7efa676717
Author: Kevin Benton <email address hidden>
Date: Thu Apr 30 17:14:44 2015 -0700

    Don't delete port from bridge on delete_port event

    Commit d6a55c17360d1aa8ca91849199987ae71e8600ee added
    logic to the OVS agent to delete a port from the integration
    bridge when a port was deleted on the Neutron side. However,
    this led to several races where whoever created the initial
    port (e.g. Nova, L3 agent, DHCP agent) would be trying to
    remove the port from the bridge at the same time. These
    would result in ugly exceptions on one side or the other.

    The original commit was trying to address the problem where
    the port would maintain connectivity even though it was removed
    from the integration bridge.

    This patch addresses both cases by removing the iptables rules
    for the deleted port and putting it in the dead VLAN so it loses
    connectivity. However, it still leaves the port attached to the
    integration bridge so the original creator can delete it.

    Conflicts:
     neutron/plugins/openvswitch/agent/ovs_neutron_agent.py
     neutron/tests/unit/plugins/openvswitch/agent/test_ovs_neutron_agent.py
     neutron/tests/unit/plugins/openvswitch/test_ovs_tunnel.py

    Related-Bug: #1333365
    Closes-Bug: #1448148
    Change-Id: I7ae7750b7ac7d15325ed9f2d517ca171543b53be
    (cherry picked from commit e007167a700aa5b80ecb48adff0ac36bb330591d)

tags: added: in-stable-kilo
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (feature/pecan)

Fix proposed to branch: feature/pecan
Review: https://review.openstack.org/196701

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (feature/pecan)

Change abandoned by Kyle Mestery (<email address hidden>) on branch: feature/pecan
Review: https://review.openstack.org/196701
Reason: This is lacking the functional fix [1], so I'll propose a new merge commit which includes that one.

[1] https://review.openstack.org/#/c/196711/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (feature/pecan)

Fix proposed to branch: feature/pecan
Review: https://review.openstack.org/196920

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (feature/pecan)
Download full text (171.5 KiB)

Reviewed: https://review.openstack.org/196920
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7f759c077f8f860c13db92d2ea6b353ef6b70900
Submitter: Jenkins
Branch: feature/pecan

commit 8123144fadd7c5d5e6e56a76ea860512619a2cf6
Author: Moshe Levi <email address hidden>
Date: Sun Jun 28 14:37:14 2015 +0300

    Fix Consolidate sriov agent and driver code

    This patch add mising __init to mech_sriov/mech_driver/
    and update the setup.cfg to the new agent entrypoint

    Trivial Fix

    Change-Id: I53a527081feb78472f496675bbb3c5121d38a14a

commit 8942fccf02e6e179d47582fdb2792a1ca972da21
Author: Assaf Muller <email address hidden>
Date: Mon Jun 29 11:38:51 2015 -0400

    Remove failing SafeFixture tests

    The fixtures 1.3 release attempted to fix the fixtures resource
    leak issue, but failed to do so completely. Our own SafeFixture
    is still needed: The 1.3 release broke our SafeFixture tests,
    but not the usage of SafeFixture itself. This patch removes
    those failing tests for now to unbreak the gate. Jakub reported
    a bug on fixtures 1.3:
    https://bugs.launchpad.net/python-fixtures/+bug/1469759

    We will continue to use SafeFixture until that bug is fixed
    in fixtures, at which point we will be able to require
    fixtures > 1.3.

    Change-Id: I59457c3bb198ff86d5ad55a1e623d008f0034b8f
    Closes-Bug: #1469734

commit 71dffb0a2c1720cd8233a329d32958a0160dd6f5
Author: Kevin Benton <email address hidden>
Date: Mon Jun 29 08:27:41 2015 +0000

    Revert "Removed test_lib module"

    This reverts commit 9a6536de6e1a7fe9b2552adc142e254426b82b6f.

    We pulled all of the plugins out of the tree, many of which still inherit
    from neutron test classes. This change then stated that we no longer
    support testing other plugins. I think this is a bit premature and should
    have been discussed under the subject
    "Neutron plugins can't use neutron plugin unit tests" or something
    similar.

    Change-Id: I68318589f010b731574ea3bfa8df98492bab31fc

commit b20fd81dbd497e058384a0af065dd0f1fdc4c728
Author: Jakub Libosvar <email address hidden>
Date: Fri Jun 5 14:32:51 2015 +0000

    Refactor NetcatTester class

    Following capabilities were added:
       - used transport protocol is passed as a constant instead of bool
       - src port for testing was added
       - connection can be established explicitly
       - change constructor parameters of NetcatTester

    As a part of removing bool for protocol definition
    get_free_namespace_port() was also modified to match the behavior.

    Change-Id: Id2ec322e7f731c05a3754a65411c9a5d8b258126

commit 83e37980dcd0b2bad6d64dd2cb23bcd2891cafca
Author: jingliuqing <email address hidden>
Date: Sat Jun 27 13:41:54 2015 +0800

    Use REST rather than ReST

    Change-Id: I06c9deaab58c5ec13bfeec39fb8fd4b1fe21f42d

commit 1b60df85ba3ad442c2e4e7e52538e1b9a1bf9378
Author: Kevin Benton <email address hidden>
Date: Thu Jun 25 18:34:38 2015 -0700

    Add a double-mock guard to the base test case

    Use mock to patch mock with a check to prevent multiple active
    patches to the...

tags: added: in-feature-pecan
Thierry Carrez (ttx)
Changed in neutron:
milestone: liberty-1 → 7.0.0
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.