race with neutron: can't create bridge with the same name

Bug #1471916 reported by Darragh O'Reilly
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Darragh O'Reilly
Kilo
Fix Released
High
Ihar Hrachyshka

Bug Description

If a node running nova-compute is using the neutron with the linuxbridge agent, and the node is also running other neutron agents like the l3-agent or dhcp-agent (ie the node is acting as compute node and a network node) then races can happen when both nova-compute and the linuxbridge agent try to add the same linux bridge at the same time. This can cause instances to fail to start.

http://logs.openstack.org/12/138512/10/experimental/check-tempest-dsvm-neutron-linuxbridge/e737f60/

http://logs.openstack.org/12/138512/10/experimental/check-tempest-dsvm-neutron-linuxbridge/e737f60/logs/screen-n-cpu.txt.gz#_2015-07-02_03_09_09_131

The neutron l3-agent added a tap tap device:

2015-07-02 03:09:06.614 DEBUG neutron.agent.linux.utils [req-42397477-9e58-4c7a-aa0b-8e54269f9d34 None None] Running command (rootwrap daemon): ['ip', 'link', 'add', 'tapbb2831fd-86', 'type', 'veth', 'peer', 'name', 'qr-bb2831fd-86', 'netns', 'qrouter-048a7c31-08a1-4693-91a8-13e3221c0982'] execute_rootwrap_daemon /opt/stack/new/neutron/neutron/agent/linux/utils.py:101
Command: ['ip', 'link', 'add', u'tapbb2831fd-86', 'type', 'veth', 'peer', 'name', u'qr-bb2831fd-86', 'netns', u'qrouter-048a7c31-08a1-4693-91a8-13e3221c0982']

The linuxbridge-agent detects this device and adds the bridge for it's network:

2015-07-02 03:09:09.119 DEBUG neutron.agent.linux.utils [req-d4f88282-233b-437e-8157-5d134538d206 None None] Running command (rootwrap daemon): ['brctl', 'addbr', 'brq94158651-73'] execute_rootwrap_daemon /opt/stack/new/neutron/neutron/agent/linux/utils.py:101
2015-07-02 03:09:09.122 DEBUG neutron.agent.linux.utils [req-d4f88282-233b-437e-8157-5d134538d206 None None]
Command: ['brctl', 'addbr', u'brq94158651-73']
Exit code: 0

About the same time, nova-compute also tried to add the same bridge for a VIF it was about to create on the same network, but lost the race and so the instance went into an error state:

2015-07-02 03:09:09.130 DEBUG oslo_concurrency.processutils [req-8391bcc1-bc92-48c7-a635-7e3b2f1af76b tempest-ServerRescueNegativeTestJSON-1230044074
tempest-ServerRescueNegativeTestJSON-1737504944] CMD "sudo nova-rootwrap /etc/nova/rootwrap.conf brctl addbr brq94158651-73" returned: 1 in 0.094s exe
cute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:247
2015-07-02 03:09:09.131 DEBUG oslo_concurrency.processutils [req-8391bcc1-bc92-48c7-a635-7e3b2f1af76b tempest-ServerRescueNegativeTestJSON-1230044074
tempest-ServerRescueNegativeTestJSON-1737504944] u'sudo nova-rootwrap /etc/nova/rootwrap.conf brctl addbr brq94158651-73' failed. Not Retrying. execut
e /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:295
2015-07-02 03:09:09.131 DEBUG oslo_concurrency.lockutils [req-8391bcc1-bc92-48c7-a635-7e3b2f1af76b tempest-ServerRescueNegativeTestJSON-1230044074 tem
pest-ServerRescueNegativeTestJSON-1737504944] Lock "lock_bridge" released by "ensure_bridge" :: held 0.096s inner /usr/local/lib/python2.7/dist-packag
es/oslo_concurrency/lockutils.py:262
2015-07-02 03:09:09.131 ERROR nova.compute.manager [req-8391bcc1-bc92-48c7-a635-7e3b2f1af76b tempest-ServerRescueNegativeTestJSON-1230044074 tempest-S
erverRescueNegativeTestJSON-1737504944] [instance: 018793d0-c890-43c7-91ad-5694931cd98c] Instance failed to spawn
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] Traceback (most recent call last):
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] File "/opt/stack/new/nova/nova/compute/mana
ger.py", line 2113, in _build_resources
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] yield resources
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] File "/opt/stack/new/nova/nova/compute/mana
ger.py", line 1985, in _build_and_run_instance
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] block_device_info=block_device_info)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] File "/opt/stack/new/nova/nova/virt/libvirt
/driver.py", line 2435, in spawn
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] block_device_info=block_device_info)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] File "/opt/stack/new/nova/nova/virt/libvirt
/driver.py", line 4454, in _create_domain_and_network
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] self.plug_vifs(instance, network_info)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] File "/opt/stack/new/nova/nova/virt/libvirt
/driver.py", line 619, in plug_vifs
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] self.vif_driver.plug(instance, vif)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] File "/opt/stack/new/nova/nova/virt/libvirt
/vif.py", line 609, in plug
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] func(instance, vif)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] File "/opt/stack/new/nova/nova/virt/libvirt
/vif.py", line 401, in plug_bridge
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] iface)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 252, in inner
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] return f(*args, **kwargs)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] File "/opt/stack/new/nova/nova/network/linux_net.py", line 1602, in ensure_bridge
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] _execute('brctl', 'addbr', bridge, run_as_root=True)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] File "/opt/stack/new/nova/nova/network/linux_net.py", line 1266, in _execute
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] return utils.execute(*cmd, **kwargs)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] File "/opt/stack/new/nova/nova/utils.py", line 229, in execute
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] return processutils.execute(*cmd, **kwargs)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 262, in execute
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] cmd=sanitized_cmd)
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] ProcessExecutionError: Unexpected error while running command.
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] Command: sudo nova-rootwrap /etc/nova/rootwrap.conf brctl addbr brq94158651-73
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] Exit code: 1
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] Stdout: u''
2015-07-02 03:09:09.131 5163 ERROR nova.compute.manager [instance: 018793d0-c890-43c7-91ad-5694931cd98c] Stderr: u"device brq94158651-73 already exists; can't create bridge with the same name\n"

Tags: network
Changed in nova:
assignee: nobody → Darragh O'Reilly (darragh-oreilly)
status: New → In Progress
Revision history for this message
Sean M. Collins (scollins) wrote :
Matt Riedemann (mriedem)
tags: added: network
Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-2 → 12.0.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/273430

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

I believe it's a high impact issue since an instance may fail to boot here. Can I set High importance for the bug?

Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/kilo)

Reviewed: https://review.openstack.org/273430
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3a2f487af0645859ea0a5df4f3620819a36f7025
Submitter: Jenkins
Branch: stable/kilo

commit 3a2f487af0645859ea0a5df4f3620819a36f7025
Author: Darragh O'Reilly <email address hidden>
Date: Mon Jul 6 17:56:52 2015 +0000

    Ignore bridge already exists error when creating bridge

    nova-compute can sometimes race with the neutron linuxbrige agent
    to create bridges, which can cause an instance to fail. This
    patch changes ensure_bridge() to ignore that error and continue.

    Change-Id: I019a5d96442802d89e9a152c0adf6030daab1a61
    Closes-Bug: 1471916
    (cherry picked from commit 4e67c49d31c58c971fa1c9ebf6e333d4340f5ad5)

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/nova 2015.1.4

This issue was fixed in the openstack/nova 2015.1.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

This issue was fixed in the openstack/nova 2015.1.4 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.