tempest fails with No IPv4 addresses found

Bug #1523638 reported by Sean M. Collins
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Kevin Benton
Kilo
New
Undecided
Unassigned
tempest
Fix Released
Undecided
Kevin Benton

Bug Description

http://logs.openstack.org/42/250542/7/check/gate-tempest-dsvm-neutron-linuxbridge/3a00f8b/logs/testr_results.html.gz

Traceback (most recent call last):
  File "tempest/test.py", line 113, in wrapper
    return f(self, *func_args, **func_kwargs)
  File "tempest/scenario/test_network_basic_ops.py", line 550, in test_subnet_details
    self._setup_network_and_servers(dns_nameservers=[initial_dns_server])
  File "tempest/scenario/test_network_basic_ops.py", line 123, in _setup_network_and_servers
    floating_ip = self.create_floating_ip(server)
  File "tempest/scenario/manager.py", line 842, in create_floating_ip
    port_id, ip4 = self._get_server_port_id_and_ip4(thing)
  File "tempest/scenario/manager.py", line 821, in _get_server_port_id_and_ip4
    "No IPv4 addresses found in: %s" % ports)
  File "/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/unittest2/case.py", line 845, in assertNotEqual
    raise self.failureException(msg)
AssertionError: 0 == 0 : No IPv4 addresses found in: []

tags: added: linuxbridge
Henry Gessau (gessau)
tags: added: gate-failure
Changed in neutron:
importance: Undecided → High
summary: - tempest test_subnet_details fails with No IPv4 addresses found
+ tempest fails with No IPv4 addresses found
Revision history for this message
Sean M. Collins (scollins) wrote :

http://logs.openstack.org/42/250542/7/check/gate-tempest-dsvm-neutron-linuxbridge/3a00f8b/logs/screen-q-agt.txt.gz?#_2015-12-04_18_18_02_896

2015-12-04 18:18:02.896 DEBUG neutron.plugins.ml2.drivers.linuxbridge.agent.linuxbridge_neutron_agent [req-d13b0a54-2efb-4577-83b1-6d44ea35b21b None None] Tap device: tap8912d290-97 does not exist on this host, skipped add_tap_interface /opt/stack/new/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py:413

Revision history for this message
Sean M. Collins (scollins) wrote :
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Looks like this is happened only once in the gate queue in the last 7 days. If I could figure out how to use the latest logstash I'd be happy to share the results of my queries, but apparently I am too cretin to understand the new GUI.

Changed in neutron:
status: New → Confirmed
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :
Revision history for this message
Sean M. Collins (scollins) wrote :
Download full text (3.8 KiB)

For some reason it appears that Tempest is deleting the server, and then attempting to use it after deleting it.

        Body: None
    Response - Headers: {'content-type': 'application/json', 'connection': 'close', 'content-length': '0', 'vary': 'X-OpenStack-Nova-API-Version', 'date': 'Tue, 08 Dec 2015 16:08:16 GMT', 'x-openstack-nova-api-version': '2.1', 'status': '202', 'x-compute-request-id': 'req-1b37ee82-5b66-453d-a9fb-4efc180d9425'}
        Body:
2015-12-08 16:08:16,405 32424 INFO [tempest_lib.common.rest_client] Request (TestSecurityGroupsBasicOps:_run_cleanups): 404 GET http://127.0.0.1:8774/v2.1/000f400f025d48c38dada36cffcd7473/servers/8e527171-7125-4585-b896-6d1b86e74481 0.111s
2015-12-08 16:08:16,406 32424 DEBUG [tempest_lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
        Body: None
    Response - Headers: {'content-type': 'application/json; charset=UTF-8', 'connection': 'close', 'content-length': '111', 'vary': 'X-OpenStack-Nova-API-Version', 'date': 'Tue, 08 Dec 2015 16:08:16 GMT', 'x-openstack-nova-api-version': '2.1', 'status': '404', 'x-compute-request-id': 'req-962533cc-2852-4f37-85f1-353339a493e8'}
        Body: {"itemNotFound": {"message": "Instance 8e527171-7125-4585-b896-6d1b86e74481 could not be found.", "code": 404}}
}}}

Traceback (most recent call last):
  File "tempest/scenario/test_security_groups_basic_ops.py", line 167, in setUp
    self._deploy_tenant(self.primary_tenant)
  File "tempest/scenario/test_security_groups_basic_ops.py", line 305, in _deploy_tenant
    self._set_access_point(tenant)
  File "tempest/scenario/test_security_groups_basic_ops.py", line 273, in _set_access_point
    self._assign_floating_ips(tenant, server)
  File "tempest/scenario/test_security_groups_basic_ops.py", line 279, in _assign_floating_ips
    client=tenant.manager.floating_ips_client)
  File "tempest/scenario/manager.py", line 842, in create_floating_ip
    port_id, ip4 = self._get_server_port_id_and_ip4(thing)
  File "tempest/scenario/manager.py", line 821, in _get_server_port_id_and_ip4
    "No IPv4 addresses found in: %s" % ports)
  File "/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/unittest2/case.py", line 845, in assertNotEqual
    raise self.failureException(msg)
AssertionError: 0 == 0 : No IPv4 addresses found in: []

2015-12-08 16:08:10.995 INFO nova.compute.manager [req-6adb7b1c-a09b-4acf-9773-808ddf72502d tempest-TestSecurityGroupsBasicOps-2044567041 tempest-TestSecurityGroupsBasicOps-376420564] [instance: 8e527171-7125-4585-b896-6d1b86e74481] Terminating instance

2015-12-08 16:08:11.873 INFO nova.compute.manager [req-6adb7b1c-a09b-4acf-9773-808ddf72502d tempest-TestSecurityGroupsBasicOps-2044567041 tempest-TestSecurityGroupsBasicOps-376420564] [instance: 8e527171-7125-4585-b896-6d1b86e74481] Took 0.88 seconds to destroy the instance on the hypervisor.

2015-12-08 16:08:16.402 INFO nova.api.openstack.wsgi [req-962533cc-2852-4f37-85f1-353339a493e8 tempest-TestSecurityGroupsBasicOps-2044567041 tempest-TestSecurityGroupsBasicOps-376420564] HTTP exception thrown: Instance 8e527171-7125-4585-b896-...

Read more...

Revision history for this message
Sean M. Collins (scollins) wrote :

Looks like this is happening during the cleanup step?

2015-12-08 16:08:16,405 32424 INFO [tempest_lib.common.rest_client] Request (TestSecurityGroupsBasicOps:_run_cleanups): 404 GET http://127.0.0.1:8774/v2.1/000f400f025d48c38dada36cffcd7473/servers/8e527171-7125-4585-b896-6d1b86e74481 0.111s
2015-12-08 16:08:16,406 32424 DEBUG [tempest_lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
        Body: None
    Response - Headers: {'content-type': 'application/json; charset=UTF-8', 'connection': 'close', 'content-length': '111', 'vary': 'X-OpenStack-Nova-API-Version', 'date': 'Tue, 08 Dec 2015 16:08:16 GMT', 'x-openstack-nova-api-version': '2.1', 'status': '404', 'x-compute-request-id': 'req-962533cc-2852-4f37-85f1-353339a493e8'}
        Body: {"itemNotFound": {"message": "Instance 8e527171-7125-4585-b896-6d1b86e74481 could not be found.", "code": 404}}
}}}

Traceback (most recent call last):
  File "tempest/scenario/test_security_groups_basic_ops.py", line 167, in setUp
    self._deploy_tenant(self.primary_tenant)
  File "tempest/scenario/test_security_groups_basic_ops.py", line 305, in _deploy_tenant
    self._set_access_point(tenant)
  File "tempest/scenario/test_security_groups_basic_ops.py", line 273, in _set_access_point
    self._assign_floating_ips(tenant, server)
  File "tempest/scenario/test_security_groups_basic_ops.py", line 279, in _assign_floating_ips
    client=tenant.manager.floating_ips_client)
  File "tempest/scenario/manager.py", line 842, in create_floating_ip
    port_id, ip4 = self._get_server_port_id_and_ip4(thing)
  File "tempest/scenario/manager.py", line 821, in _get_server_port_id_and_ip4
    "No IPv4 addresses found in: %s" % ports)
  File "/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/unittest2/case.py", line 845, in assertNotEqual
    raise self.failureException(msg)
AssertionError: 0 == 0 : No IPv4 addresses found in: []

Revision history for this message
Sean M. Collins (scollins) wrote :

Adding tempest to the list of affected projects. It's possible that the tempest code is holding onto a reference to an instance after it has been deleted and attempts to use it again.

Changed in neutron:
assignee: nobody → Sean M. Collins (scollins)
Revision history for this message
Dmitry Ratushnyy (dmitry-ratushnyy) wrote :

Sean, I don't think that this happening during setup. I get this error on booting servers.

I've got fails on gate ( changeset https://review.openstack.org/#/c/248355/ )

tempest.scenario.test_arp_poisoning.TestArpPoisoning.test_restart_neutron_agent[compute,id-f51200bd-d909-4ca6-8a1f-3ef92cf0cb78,network]
----------------------------------------------------------------------------------------------------------------------------------------
Captured traceback:
~~~~~~~~~~~~~~~~~~~
     Traceback (most recent call last):
       File "tempest/test.py", line 113, in wrapper
         return f(self, *func_args, **func_kwargs)
       File "tempest/scenario/test_arp_poisoning.py", line 255, in test_restart_neutron_agent
         attacker_vm_ip, target_vm_ip = self._boot_servers()
       File "tempest/scenario/test_arp_poisoning.py", line 101, in _boot_servers
         target_avail_zone)
       File "tempest/scenario/test_arp_poisoning.py", line 74, in _boot_server
         CONF.network.public_network_id)
       File "tempest/scenario/manager.py", line 848, in create_floating_ip
         port_id, ip4 = self._get_server_port_id_and_ip4(thing)
       File "tempest/scenario/manager.py", line 827, in _get_server_port_id_and_ip4
         "No IPv4 addresses found in: %s" % ports)
       File "/opt/stack/new/tempest/.tox/full/local/lib/python2.7/site-packages/unittest2/case.py", line 845, in assertNotEqual
         raise self.failureException(msg)
     AssertionError: 0 == 0 : No IPv4 addresses found in: []

Revision history for this message
Sean M. Collins (scollins) wrote :

Right - because it appears to be a "use after free" inside of Tempest, where tempest has a handle to a compute instance, deletes it, but then later tries to use it again. In the logs of Nova we see the REST call from Tempest to delete the server, then later a REST call to get information about the instance, where Nova returns a 404.

Revision history for this message
Dmitry Ratushnyy (dmitry-ratushnyy) wrote :

But in case above, fail is not after deleting instance, but after booting new one.
And right after boot it tries to get IpV4 address. There is no delete step.
Or I missing something?

Revision history for this message
Sean M. Collins (scollins) wrote :

@Dmitry - that appears to be a different bug, that has the same error message.

Revision history for this message
Dmitry Ratushnyy (dmitry-ratushnyy) wrote :

@Sean
Do you think that is tempest or neutron test? I can fire another one for my failures on gate

Revision history for this message
Andreas Scheuring (andreas-scheuring) wrote :
Download full text (3.6 KiB)

I also had some occurences of this issue and tried to figure out the problem. Just want to share my observations. I refer to [1]

Failing test: test_dualnet_multi_prefix_dhcpv6_stateless
What was happening: For the test 2 instances are required. The first instance was set up successfully with (prepare_server). The error happened while was processing the second one. It has been created, but creating the floating ip failed, as querying the port resulted in an empty list [2] [3]. This is the failing assertion [4]. Right after that the cleanup starts.

Having a look at the flow with "tap3b689f82-b8" of the failing instance "7141979-cae6-40d3-9ca6-2e8ac6b45b63" [5]
2016-01-29 21:58:47.823 [q-agt] Tap device has been added and detected by agent loop [7]
2016-01-29 21:58:49.089 [q-svc] Device details requested from agent [12]
2016-01-29 21:58:52.241 [q-svc] Neutron server got informed, that the device is up [9]
2016-01-29 21:58:52.817 [q-agt] full agent resync triggerred [13]
2016-01-29 21:58:53.976 [q-svc] Device details got requested again (due to agent resync)[11]
2016-01-29 21:58:56,049 [console] tempest test fails as query on port did not return anything useful
2016-01-29 21:58:56,287 [console] Delete has been triggered [8]
2016-01-29 21:58:56.942 [q-agt] Tap disappeared (agent was in the midst of processing it) [6]
2016-01-29 21:59:10.396 [q-svc] Neutron server got informed, that device is down [10]
--> found nothing that caught my attention... the agent resync was triggered due to bug 1532171 as a parallel running testcase deleted an instance...

The query that returns nothing is:

http://127.0.0.1:9696/v2.0/ports?device_id=a7141979-cae6-40d3-9ca6-2e8ac6b45b63&status=ACTIVE&fixed_ip=None
One of the 3 query attributes must have caused the result to be nothing. There should be at 2 ports being returned!
My proposal would be to extend the logging before that assertion is made and print out a list of all available ports to see why this query is failing...

[1] http://logs.openstack.org/18/246318/31/gate/gate-tempest-dsvm-neutron-linuxbridge/15b91f4/console.html.gz
[2] https://github.com/openstack/tempest/blob/master/tempest/scenario/manager.py#L847
[3] http://logs.openstack.org/18/246318/31/gate/gate-tempest-dsvm-neutron-linuxbridge/15b91f4/console.html.gz#_2016-01-29_22_12_04_625
[4] https://github.com/openstack/tempest/blob/master/tempest/scenario/manager.py#L825
[5] http://logs.openstack.org/18/246318/31/gate/gate-tempest-dsvm-neutron-linuxbridge/15b91f4/logs/screen-n-cpu.txt.gz#_2016-01-29_21_58_46_036
[6] http://logs.openstack.org/18/246318/31/gate/gate-tempest-dsvm-neutron-linuxbridge/15b91f4/logs/screen-q-agt.txt.gz#_2016-01-29_21_58_56_924
[7] http://logs.openstack.org/18/246318/31/gate/gate-tempest-dsvm-neutron-linuxbridge/15b91f4/logs/screen-q-agt.txt.gz#_2016-01-29_21_58_47_823
[8] http://logs.openstack.org/18/246318/31/gate/gate-tempest-dsvm-neutron-linuxbridge/15b91f4/console.html.gz#_2016-01-29_22_12_04_625
[9] http://logs.openstack.org/18/246318/31/gate/gate-tempest-dsvm-neutron-linuxbridge/15b91f4/logs/screen-q-svc.txt.gz#_2016-01-29_21_58_52_241
[10] http://logs.openstack.org/18/246318/31/gate/gate-tempest-dsvm-neutron-linu...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tempest (master)

Fix proposed to branch: master
Review: https://review.openstack.org/274588

Changed in tempest:
assignee: nobody → Andreas Scheuring (andreas-scheuring)
status: New → In Progress
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This hasn't occurred on the gate queue for the past 7 days. Check/experimental failures might as well be self inflicted.

Changed in neutron:
importance: High → Critical
importance: Critical → High
Revision history for this message
Andreas Scheuring (andreas-scheuring) wrote :
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Andreas: I was talking about the gate queue, which at the time I made the statement was returning no errors.

Yesterday a trace popped up (Feb 3)

http://logs.openstack.org/70/274570/3/gate/gate-tempest-dsvm-neutron-linuxbridge/a52584e/

So you can object my statement as much as you want, but that still remains true at the time I made it :)

Revision history for this message
Kevin Benton (kevinbenton) wrote :

The root cause of this is that the Linux bridge agent was exploding during the setup loop on an unrelated port. This would cause the port we are working with to go back into the BUILD state on the server after already being ACTIVE and telling nova to the VM. So then the list ports call from tempest that filtered on ACTIVE would not see the port.

Revision history for this message
Kevin Benton (kevinbenton) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/276519

Changed in neutron:
assignee: Sean M. Collins (scollins) → Kevin Benton (kevinbenton)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tempest (master)

Fix proposed to branch: master
Review: https://review.openstack.org/276527

Changed in tempest:
assignee: Andreas Scheuring (andreas-scheuring) → Kevin Benton (kevinbenton)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tempest (master)

Change abandoned by Andreas Scheuring (<email address hidden>) on branch: master
Review: https://review.openstack.org/274588
Reason: bug closed by https://review.openstack.org/#/c/276519/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/276519
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=96c67e22f9cba2ea0e7fb3ba2a63e4905e48c1a4
Submitter: Jenkins
Branch: master

commit 96c67e22f9cba2ea0e7fb3ba2a63e4905e48c1a4
Author: Kevin Benton <email address hidden>
Date: Thu Feb 4 13:49:42 2016 -0800

    Only ensure admin state on ports that exist

    The linux bridge agent was calling ensure_port_admin state
    unconditionally on ports in treat_devices_added_or_updated.
    This would cause it to throw an error on interfaces that
    didn't exist so it would restart the entire processing loop.

    If another port was being updated in the same loop before this
    one, that port would experience a port status life-cycle of
    DOWN->BUILD->ACTIVE->BUILD->ACTIVE
                       ^ <--- Exception in unrelated port causes cycle
                              to start over again.

    This causes the bug below because the first active transition will
    cause Nova to boot the VM. At this point tempest tests expect the
    ports that belong to the VM to be in the ACTIVE state so it filters
    Neutron port list calls with "status=ACTIVE". Therefore tempest would
    not get any ports back and assume there was some kind of error with
    the port and bail.

    This patch just makes sure the admin state call is skipped if the port
    doesn't exist and it includes a basic unit test to prevent a regression.

    Closes-Bug: #1523638
    Change-Id: I5330c6111cbb20bf45aec9ade7e30d34e8dd16ca

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tempest (master)

Reviewed: https://review.openstack.org/276527
Committed: https://git.openstack.org/cgit/openstack/tempest/commit/?id=1d0c1dca74a4a1ab90c03c8fc675fe55fb2feccf
Submitter: Jenkins
Branch: master

commit 1d0c1dca74a4a1ab90c03c8fc675fe55fb2feccf
Author: Kevin Benton <email address hidden>
Date: Thu Feb 4 14:30:08 2016 -0800

    Emit warning when instances have ports not ACTIVE

    This changes the tempest logic to request all ports rather than
    just ACTIVE ports for a server and then filters them locally so
    we can log an warning message when a server has ports not in the
    ACTIVE state. This will help debug cases in the future where the
    Neutron port status is in an unstable state due to agent wiring
    errors.

    Change-Id: I979a06688a5dfecaaef5e7e4a85cb8494095c754
    Closes-Bug: #1523638

Changed in tempest:
status: In Progress → Fix Released
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/neutron 8.0.0.0b3

This issue was fixed in the openstack/neutron 8.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/296946

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/liberty)

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: stable/liberty
Review: https://review.openstack.org/296946
Reason: Squashed into https://review.openstack.org/#/c/296783/2

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/kilo)
Download full text (3.7 KiB)

Reviewed: https://review.openstack.org/296803
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a8b300ac6d489b91c77fcea12564b9d2d20c933d
Submitter: Jenkins
Branch: stable/kilo

commit a8b300ac6d489b91c77fcea12564b9d2d20c933d
Author: Andreas Scheuring <email address hidden>
Date: Wed Nov 11 14:03:08 2015 +0100

    lb: avoid doing nova VIF work plumbing tap to qbr

    neutron should rely on nova doing the job instead of trying to 'fix' it.
    'Fixing' it introduces race conditions between lb agent and nova VIF
    driver. Particularly, lb agent can scan for new tap devices in the
    middle of nova plumbing qbr-tap setup, and attempt to do it on its own.
    So if agent is more lucky to plug the tap device into the bridge, nova
    may fail to do the same, getting the following error:

    libvirtError: Unable to add bridge brqxxx-xx port tapxxx-xx: Device or
    resource busy

    This also requires a change in how the port admin_state_up is implemented
    by setting the tap device's link state instead of moving it in or out
    of the bridge.

    Conflicts:
     neutron/common/constants.py
     neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py
     neutron/tests/unit/plugins/ml2/drivers/linuxbridge/agent/test_linuxbridge_neutron_agent.py

    Co-Authored-By: Sean M. Collins <email address hidden>
    Co-Authored-By: Darragh O'Reilly <email address hidden>
    Co-Authored-By: Andreas Scheuring <email address hidden>
    Closes-Bug: #1312016
    (cherry picked from commit f42ea67995537c7fe3e36447489872b0dcb82dd9)
    (cherry picked from commit eb61b837f70906aea07e4fd2290afa24f1341da8)

    ===

    Also squashed the following follow up fix:

    lb: Correct String formatting to get rid of logged ValueError

    The following error is caused by a missing String formatting in the
    linuxbridge agent:
    "ValueError: unsupported format character 'a' (0x61) at index 90
    Logged from file linuxbridge_neutron_agent.py, line 447"

    In addition a duplicated word in the log text has been fixed.

    Change-Id: I587f1165fc7084dc9c4806149b65652f6e27b14e
    (cherry picked from commit 1f86d8687b2781f0c287ee656f3cbc65aaa4b5e4)

    ===

    Also squashed in:

    Only ensure admin state on ports that exist

    The linux bridge agent was calling ensure_port_admin state
    unconditionally on ports in treat_devices_added_or_updated.
    This would cause it to throw an error on interfaces that
    didn't exist so it would restart the entire processing loop.

    If another port was being updated in the same loop before this
    one, that port would experience a port status life-cycle of
    DOWN->BUILD->ACTIVE->BUILD->ACTIVE
                       ^ <--- Exception in unrelated port causes cycle
                              to start over again.

    This causes the bug below because the first active transition will
    cause Nova to boot the VM. At this point tempest tests expect the
    ports that belong to the VM to be in the ACTIVE state so it filters
    Neutron port list calls with "status=ACTIVE". Therefore...

Read more...

tags: added: in-stable-kilo
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)
Download full text (3.6 KiB)

Reviewed: https://review.openstack.org/296783
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4cb90623193bd6826e279129e993e0ceaf4a1816
Submitter: Jenkins
Branch: stable/liberty

commit 4cb90623193bd6826e279129e993e0ceaf4a1816
Author: Andreas Scheuring <email address hidden>
Date: Wed Nov 11 14:03:08 2015 +0100

    lb: avoid doing nova VIF work plumbing tap to qbr

    neutron should rely on nova doing the job instead of trying to 'fix' it.
    'Fixing' it introduces race conditions between lb agent and nova VIF
    driver. Particularly, lb agent can scan for new tap devices in the
    middle of nova plumbing qbr-tap setup, and attempt to do it on its own.
    So if agent is more lucky to plug the tap device into the bridge, nova
    may fail to do the same, getting the following error:

    libvirtError: Unable to add bridge brqxxx-xx port tapxxx-xx: Device or
    resource busy

    This also requires a change in how the port admin_state_up is implemented
    by setting the tap device's link state instead of moving it in or out
    of the bridge.

    Conflicts:
     neutron/common/constants.py
     neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py
     neutron/tests/unit/plugins/ml2/drivers/linuxbridge/agent/test_linuxbridge_neutron_agent.py

    Co-Authored-By: Sean M. Collins <email address hidden>
    Co-Authored-By: Darragh O'Reilly <email address hidden>
    Co-Authored-By: Andreas Scheuring <email address hidden>
    Closes-Bug: #1312016
    (cherry picked from commit f42ea67995537c7fe3e36447489872b0dcb82dd9)

    ===

    Also squashed in the following follow up fix:

    lb: Correct String formatting to get rid of logged ValueError

    The following error is caused by a missing String formatting in the
    linuxbridge agent:
    "ValueError: unsupported format character 'a' (0x61) at index 90
    Logged from file linuxbridge_neutron_agent.py, line 447"

    In addition a duplicated word in the log text has been fixed.

    Change-Id: I587f1165fc7084dc9c4806149b65652f6e27b14e
    (cherry picked from commit 1f86d8687b2781f0c287ee656f3cbc65aaa4b5e4)

    ===

    Also squashed in:

    Only ensure admin state on ports that exist

    The linux bridge agent was calling ensure_port_admin state
    unconditionally on ports in treat_devices_added_or_updated.
    This would cause it to throw an error on interfaces that
    didn't exist so it would restart the entire processing loop.

    If another port was being updated in the same loop before this
    one, that port would experience a port status life-cycle of
    DOWN->BUILD->ACTIVE->BUILD->ACTIVE
                       ^ <--- Exception in unrelated port causes cycle
                              to start over again.

    This causes the bug below because the first active transition will
    cause Nova to boot the VM. At this point tempest tests expect the
    ports that belong to the VM to be in the ACTIVE state so it filters
    Neutron port list calls with "status=ACTIVE". Therefore tempest would
    not get any ports back and assume there was some...

Read more...

tags: added: in-stable-liberty
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 2015.1.4

This issue was fixed in the openstack/neutron 2015.1.4 release.

Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/neutron 7.1.0

This issue was fixed in the openstack/neutron 7.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 2015.1.4

This issue was fixed in the openstack/neutron 2015.1.4 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.