linuxbridge and dhcp agents race removing tap

Bug #1611612 reported by Darragh O'Reilly
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
Darragh O'Reilly

Bug Description

When a network is deleted, an exception can happen because the lb-agent tries to removes the dhcp tap from the bridge at about the same time as the dhcp-agent is deleting the tap. The unhandled exception results in the bridge not getting cleaned up and an error and stacktrace in the logs.

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%20%5C%22self.remove_interface%5C%22

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming
    res = self.dispatcher.dispatch(message)
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch
    return self._do_dispatch(endpoint, method, ctxt, args)
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch
    result = func(ctxt, **new_args)
  File "/opt/stack/new/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 803, in network_delete
    self.agent.mgr.delete_bridge(bridge_name)
  File "/opt/stack/new/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 521, in delete_bridge
    self.remove_interface(bridge_name, interface)
  File "/opt/stack/new/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 568, in remove_interface
    if bridge_device.delif(interface_name):
  File "/opt/stack/new/neutron/neutron/agent/linux/bridge_lib.py", line 80, in delif
    return self._brctl(['delif', self.name, interface])
  File "/opt/stack/new/neutron/neutron/agent/linux/bridge_lib.py", line 55, in _brctl
    return ip_wrapper.netns.execute(cmd, run_as_root=True)
  File "/opt/stack/new/neutron/neutron/agent/linux/ip_lib.py", line 876, in execute
    log_fail_as_error=log_fail_as_error, **kwargs)
  File "/opt/stack/new/neutron/neutron/agent/linux/utils.py", line 138, in execute
    raise RuntimeError(msg)
RuntimeError: Exit code: 1; Stdin: ; Stdout: ; Stderr: device tap1aa0d45a-39 is not a slave of brq6d449049-5c

Tags: linuxbridge
tags: added: linuxbridge
Changed in neutron:
assignee: nobody → Darragh O'Reilly (darragh-oreilly)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/353264

description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/353264
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=72720f9aa30169809e41e6dfbafc4e3561716ea5
Submitter: Jenkins
Branch: master

commit 72720f9aa30169809e41e6dfbafc4e3561716ea5
Author: Darragh O'Reilly <email address hidden>
Date: Wed Aug 10 05:58:50 2016 +0000

    lb-agent: handle exception when bridge slave already removed

    An exception can happen when a network is deleted because the
    lb-agent tries to removes the dhcp tap from the bridge at about
    the same time as the dhcp-agent is deleting the tap. The unhandled
    exception means the bridge does not get deleted and a log error.

    Closes-Bug: #1611612
    Change-Id: Ia9a6b5fc49e239769e850e9486454e81e3a4b96f

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 9.0.0.0b3

This issue was fixed in the openstack/neutron 9.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.