nova libvirtError: Unable to add bridge brqxxx-xx port tapxxx-xx: Device or resource busy

Bug #1312016 reported by jiji on 2014-04-24
34
This bug affects 7 people
Affects Status Importance Assigned to Milestone
neutron
Medium
Andreas Scheuring
Kilo
Undecided
Unassigned

Bug Description

Hello: My OpenStack's version is 2013.1.5(G) , plugin is linuxbridge ,os is ubuntu12.04.3 , libvirt-bin is '1.1.1-0ubuntu8.9'

When i launch three instances , two instances is successful, and one of the three is failed to spawn .

I check the log of nova-compute , I found the following errors :

(it's worth noting that:
"libvirtError: Unable to add bridge brq233a5889-2e port tap3f81c08a-39: Device or resource busy")

Somebody in the same problem?

2014-04-24 14:41:58.499 ERROR nova.compute.manager [req-4dc590cc-9a34-460d-8c6a-4efdfb9de456 fd7179d2284247179c70db99ee1842db 4f50d05ffb6b44a29f9b23978e40542b] [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] Instance failed to spawn
2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] Traceback (most recent call last):
2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1119, in _spawn
2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] block_device_info)
2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1539, in spawn
2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] block_device_info)
2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2455, in _create_domain_and_network
2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] domain = self._create_domain(xml, instance=instance)
2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2416, in _create_domain
2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] domain.createWithFlags(launch_flags)
2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 187, in doit
2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] result = proxy_call(self._autowrap, f, *args, **kwargs)
2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 147, in proxy_call
2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] rv = execute(f,*args,**kwargs)
2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 76, in tworker
2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] rv = meth(*args,**kwargs)
2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] File "/usr/lib/python2.7/dist-packages/libvirt.py", line 728, in createWithFlags
2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1]

libvirtError: Unable to add bridge brq233a5889-2e port tap3f81c08a-39: Device or resource busy

2014-04-24 14:41:58.499 60306 TRACE nova.compute.manager [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1]
2014-04-24 14:41:58.680 AUDIT nova.compute.manager [req-4dc590cc-9a34-460d-8c6a-4efdfb9de456 fd7179d2284247179c70db99ee1842db 4f50d05ffb6b44a29f9b23978e40542b] [instance: 496c546b-4afc-4b48-9984-08c42cbe36d1] Terminating instance

jiji (zzww-666) on 2014-04-24
description: updated
tags: added: libvirt
Solly Ross (sross-7) on 2014-05-29
Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
jiji (zzww-666) wrote :

Have a headache, This bug has the serious influence me to create instances .

Changed in nova:
assignee: nobody → jiji (zzww-666)
jiji (zzww-666) on 2014-06-10
description: updated
jiji (zzww-666) on 2014-08-04
Changed in nova:
assignee: jiji (zzww-666) → nobody
Minho Ban (mhban) on 2015-05-18
tags: added: icehouse juno kilo
Minho Ban (mhban) wrote :

I've suffered from this error since a few weeks ago. I found the error is caused by libvirt which is in race condition with Neutron agent (to me it was linuxbridge agent). I know this error also happens not only LB but also OVS agent.

When a VM is created by Nova, port creation and adding it to bridge is executed by libvirt. Just before libvirt try to add port to bridge, it will fail if Neutron (see below) succeed to add the interface to the bridge.

[neutron/plugins/linuxbridge/agent/linuxbridge_neutron_agent.py:372]
    def add_tap_interface(self, network_id, network_type, physical_network,
                          segmentation_id, tap_device_name):
        """Add tap interface.

        If a VIF has been plugged into a network, this function will
        add the corresponding tap device to the relevant bridge.
        """
----------((fold))------------------
        # Check if device needs to be added to bridge
        tap_device_in_bridge = self.get_bridge_for_tap_device(tap_device_name)
        if not tap_device_in_bridge:
            data = {'tap_device_name': tap_device_name,
                    'bridge_name': bridge_name}
            LOG.debug("Adding device %(tap_device_name)s to bridge "
                      "%(bridge_name)s", data)
            if utils.execute(['brctl', 'addif', bridge_name, tap_device_name],
                             run_as_root=True):
                return False
        else:
            data = {'tap_device_name': tap_device_name,
                    'bridge_name': bridge_name}
            LOG.debug("%(tap_device_name)s already exists on bridge "
                      "%(bridge_name)s", data)
        return True

libvirt will return with failure 'EBUSY' returned from brctl command because the interface is already in that bridge.
I have no idea why the Neutron is keep polling status of bridge interfaces and try to add it if it is not in.

Minho Ban (mhban) wrote :

Correction. There have been no report of failure in OVS. It seems this issue only happens only in LinuxBridge agent.

I agree that LB agent should not do the nova's work, which is all that is relevant to tap + qbr plumbing. If the agent loop will be fired right in the middle of nova plugging plugging tap into qbr, it may introduce the race condition you describe.

Changed in neutron:
assignee: nobody → Ihar Hrachyshka (ihar-hrachyshka)
Changed in neutron:
status: New → In Progress
Sean M. Collins (scollins) wrote :
Changed in neutron:
status: In Progress → Fix Committed
status: Fix Committed → In Progress
tags: added: linuxbridge-gate-parity
Assaf Muller (amuller) on 2015-07-02
Changed in neutron:
importance: Undecided → High
tags: removed: grizzly icehouse
tags: removed: libvirt
Changed in neutron:
assignee: Ihar Hrachyshka (ihar-hrachyshka) → Sean M. Collins (scollins)
Changed in neutron:
assignee: Sean M. Collins (scollins) → Darragh O'Reilly (darragh-oreilly)
Changed in neutron:
assignee: Darragh O'Reilly (darragh-oreilly) → Sean M. Collins (scollins)

[1] is touted to be the one fixing the issue, but I can't seem to see it happening in the gate.

[1] https://review.openstack.org/#/c/193485/

Changed in neutron:
importance: High → Medium

Is there a patch targeting Nova? If not, we should not target the project.

no longer affects: nova
Changed in neutron:
assignee: Sean M. Collins (scollins) → Ihar Hrachyshka (ihar-hrachyshka)
tags: added: usability
removed: linuxbridge-gate-parity

How is it a usability issue?..

Changed in neutron:
assignee: Ihar Hrachyshka (ihar-hrachyshka) → Andreas Scheuring (andreas-scheuring)

Change abandoned by Andreas Scheuring (<email address hidden>) on branch: master
Review: https://review.openstack.org/244111
Reason: Will be merged into patchset https://review.openstack.org/#/c/193485/

Reviewed: https://review.openstack.org/193485
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f42ea67995537c7fe3e36447489872b0dcb82dd9
Submitter: Jenkins
Branch: master

commit f42ea67995537c7fe3e36447489872b0dcb82dd9
Author: Andreas Scheuring <email address hidden>
Date: Wed Nov 11 14:03:08 2015 +0100

    lb: avoid doing nova VIF work plumbing tap to qbr

    neutron should rely on nova doing the job instead of trying to 'fix' it.
    'Fixing' it introduces race conditions between lb agent and nova VIF
    driver. Particularly, lb agent can scan for new tap devices in the
    middle of nova plumbing qbr-tap setup, and attempt to do it on its own.
    So if agent is more lucky to plug the tap device into the bridge, nova
    may fail to do the same, getting the following error:

    libvirtError: Unable to add bridge brqxxx-xx port tapxxx-xx: Device or
    resource busy

    This also requires a change in how the port admin_state_up is implemented
    by setting the tap device's link state instead of moving it in or out
    of the bridge.

    Co-Authored-By: Sean M. Collins <email address hidden>
    Co-Authored-By: Darragh O'Reilly <email address hidden>
    Co-Authored-By: Andreas Scheuring <email address hidden>
    Change-Id: I02971103407b4ec11a65218e9ef7e2708915d938
    Closes-Bug: #1312016

Changed in neutron:
status: In Progress → Fix Committed

Can we expect to see this backported to Liberty?

Changed in neutron:
status: Fix Committed → Fix Released

@Jesse: Is there a actual need for this backport? If so, we could give it a try. I mean this was not a totally breaking issue and there may be some risk backporting it. So if there's no actual need, I personally would avoid it.

If you think it's worth a backport, please provide rationale and set liberty-backport-potential tag for the bug.

tags: removed: juno kilo nova

This issue was fixed in the openstack/neutron 8.0.0.0b2 development milestone.

Download full text (3.7 KiB)

Reviewed: https://review.openstack.org/296803
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a8b300ac6d489b91c77fcea12564b9d2d20c933d
Submitter: Jenkins
Branch: stable/kilo

commit a8b300ac6d489b91c77fcea12564b9d2d20c933d
Author: Andreas Scheuring <email address hidden>
Date: Wed Nov 11 14:03:08 2015 +0100

    lb: avoid doing nova VIF work plumbing tap to qbr

    neutron should rely on nova doing the job instead of trying to 'fix' it.
    'Fixing' it introduces race conditions between lb agent and nova VIF
    driver. Particularly, lb agent can scan for new tap devices in the
    middle of nova plumbing qbr-tap setup, and attempt to do it on its own.
    So if agent is more lucky to plug the tap device into the bridge, nova
    may fail to do the same, getting the following error:

    libvirtError: Unable to add bridge brqxxx-xx port tapxxx-xx: Device or
    resource busy

    This also requires a change in how the port admin_state_up is implemented
    by setting the tap device's link state instead of moving it in or out
    of the bridge.

    Conflicts:
     neutron/common/constants.py
     neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py
     neutron/tests/unit/plugins/ml2/drivers/linuxbridge/agent/test_linuxbridge_neutron_agent.py

    Co-Authored-By: Sean M. Collins <email address hidden>
    Co-Authored-By: Darragh O'Reilly <email address hidden>
    Co-Authored-By: Andreas Scheuring <email address hidden>
    Closes-Bug: #1312016
    (cherry picked from commit f42ea67995537c7fe3e36447489872b0dcb82dd9)
    (cherry picked from commit eb61b837f70906aea07e4fd2290afa24f1341da8)

    ===

    Also squashed the following follow up fix:

    lb: Correct String formatting to get rid of logged ValueError

    The following error is caused by a missing String formatting in the
    linuxbridge agent:
    "ValueError: unsupported format character 'a' (0x61) at index 90
    Logged from file linuxbridge_neutron_agent.py, line 447"

    In addition a duplicated word in the log text has been fixed.

    Change-Id: I587f1165fc7084dc9c4806149b65652f6e27b14e
    (cherry picked from commit 1f86d8687b2781f0c287ee656f3cbc65aaa4b5e4)

    ===

    Also squashed in:

    Only ensure admin state on ports that exist

    The linux bridge agent was calling ensure_port_admin state
    unconditionally on ports in treat_devices_added_or_updated.
    This would cause it to throw an error on interfaces that
    didn't exist so it would restart the entire processing loop.

    If another port was being updated in the same loop before this
    one, that port would experience a port status life-cycle of
    DOWN->BUILD->ACTIVE->BUILD->ACTIVE
                       ^ <--- Exception in unrelated port causes cycle
                              to start over again.

    This causes the bug below because the first active transition will
    cause Nova to boot the VM. At this point tempest tests expect the
    ports that belong to the VM to be in the ACTIVE state so it filters
    Neutron port list calls with "status=ACTIVE". Therefore...

Read more...

tags: added: in-stable-kilo
Download full text (3.6 KiB)

Reviewed: https://review.openstack.org/296783
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4cb90623193bd6826e279129e993e0ceaf4a1816
Submitter: Jenkins
Branch: stable/liberty

commit 4cb90623193bd6826e279129e993e0ceaf4a1816
Author: Andreas Scheuring <email address hidden>
Date: Wed Nov 11 14:03:08 2015 +0100

    lb: avoid doing nova VIF work plumbing tap to qbr

    neutron should rely on nova doing the job instead of trying to 'fix' it.
    'Fixing' it introduces race conditions between lb agent and nova VIF
    driver. Particularly, lb agent can scan for new tap devices in the
    middle of nova plumbing qbr-tap setup, and attempt to do it on its own.
    So if agent is more lucky to plug the tap device into the bridge, nova
    may fail to do the same, getting the following error:

    libvirtError: Unable to add bridge brqxxx-xx port tapxxx-xx: Device or
    resource busy

    This also requires a change in how the port admin_state_up is implemented
    by setting the tap device's link state instead of moving it in or out
    of the bridge.

    Conflicts:
     neutron/common/constants.py
     neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py
     neutron/tests/unit/plugins/ml2/drivers/linuxbridge/agent/test_linuxbridge_neutron_agent.py

    Co-Authored-By: Sean M. Collins <email address hidden>
    Co-Authored-By: Darragh O'Reilly <email address hidden>
    Co-Authored-By: Andreas Scheuring <email address hidden>
    Closes-Bug: #1312016
    (cherry picked from commit f42ea67995537c7fe3e36447489872b0dcb82dd9)

    ===

    Also squashed in the following follow up fix:

    lb: Correct String formatting to get rid of logged ValueError

    The following error is caused by a missing String formatting in the
    linuxbridge agent:
    "ValueError: unsupported format character 'a' (0x61) at index 90
    Logged from file linuxbridge_neutron_agent.py, line 447"

    In addition a duplicated word in the log text has been fixed.

    Change-Id: I587f1165fc7084dc9c4806149b65652f6e27b14e
    (cherry picked from commit 1f86d8687b2781f0c287ee656f3cbc65aaa4b5e4)

    ===

    Also squashed in:

    Only ensure admin state on ports that exist

    The linux bridge agent was calling ensure_port_admin state
    unconditionally on ports in treat_devices_added_or_updated.
    This would cause it to throw an error on interfaces that
    didn't exist so it would restart the entire processing loop.

    If another port was being updated in the same loop before this
    one, that port would experience a port status life-cycle of
    DOWN->BUILD->ACTIVE->BUILD->ACTIVE
                       ^ <--- Exception in unrelated port causes cycle
                              to start over again.

    This causes the bug below because the first active transition will
    cause Nova to boot the VM. At this point tempest tests expect the
    ports that belong to the VM to be in the ACTIVE state so it filters
    Neutron port list calls with "status=ACTIVE". Therefore tempest would
    not get any ports back and assume there was some...

Read more...

tags: added: in-stable-liberty

This issue was fixed in the openstack/neutron 2015.1.4 release.

This issue was fixed in the openstack/neutron 7.1.0 release.

This issue was fixed in the openstack/neutron 2015.1.4 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers