BadRequest: Multiple possible networks found, use a Network ID to be more specific.

Bug #1251448 reported by Joe Gordon
84
This bug affects 17 people
Affects Status Importance Assigned to Milestone
tempest
Won't Fix
Undecided
Unassigned

Bug Description

Gate (only neutron based) is peridocally failing with the following error:

"BadRequest: Multiple possible networks found, use a Network ID to be more specific. "

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiIHBvc3NpYmxlIG5ldHdvcmtzIGZvdW5kLCB1c2UgYSBOZXR3b3JrIElEIHRvIGJlIG1vcmUgc3BlY2lmaWMuIChIVFRQIDQwMClcIiBBTkQgZmlsZW5hbWU6XCJjb25zb2xlLmh0bWxcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNDMyMDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxMzg0NDY2ODA0Mjg2LCJtb2RlIjoiIiwiYW5hbHl6ZV9maWVsZCI6IiJ9

query:

message:" possible networks found, use a Network ID to be more specific. (HTTP 400)" AND filename:"console.html"

Example: http://logs.openstack.org/75/54275/3/check/check-tempest-devstack-vm-neutron-pg/61a2974/console.html

Failure breakdown by job:
 check-tempest-devstack-vm-neutron-pg 34%
 check-tempest-devstack-vm-neutron 24%
 gate-tempest-devstack-vm-neutron 10%
 gate-tempest-devstack-vm-neutron-pg 5%

Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

I am not entirely sure about the root cause of this issue.
The manifestation is simple: there is a network which should not be there, which is probably not deleted when another test completes.

I will perform some traging and then add more comments.

Changed in neutron:
assignee: nobody → Salvatore Orlando (salvatore-orlando)
Revision history for this message
Salvatore Orlando (salvatore-orlando) wrote :

High as we're travelling at an average of 15 hits per day.

Changed in neutron:
importance: Undecided → High
Revision history for this message
Anita Kuno (anteaya) wrote :
Changed in neutron:
assignee: Salvatore Orlando (salvatore-orlando) → Maru Newby (maru)
Revision history for this message
Matt Riedemann (mriedem) wrote :

Are these jobs running with allow_tenant_isolation=false? I run tempest with openvswitch and neutron in house and most of the time when I've seen that error it's because of tenant isolation.

Revision history for this message
Miguel Lavalle (minsel) wrote :
Download full text (9.1 KiB)

Marun,

Per our conversation of earlier today, I did some digging around this bug. Here's the summary of what I found, so you can pick up from here tomorrow:

1) I worked on the execution whose console log is: http://logs.openstack.org/31/57331/2/check/check-tempest-devstack-vm-neutron/7dee1b8/console.html

2) You can access the whole set of log files at: http://logs.openstack.org/31/57331/2/check/check-tempest-devstack-vm-neutron/7dee1b8/

3) All the executions that I looked at exhibit the same pattern: tempest.scenario.test_network_basic_ops.TestNetworkBasicOps fails deleting a subnet because it has ports still active. After that, several tests suites fail with "Multiple possible networks found, use a Network ID to be more specific". In the case of the execution I analyzed, the id of the subnet is cb9e145b-294c-44d8-a603-a653164c61f4, as can bee seen below (from the console log):

======================================================================
2013-11-21 20:32:14.521 | FAIL: tearDownClass (tempest.scenario.test_network_basic_ops.TestNetworkBasicOps)
2013-11-21 20:32:14.521 | tearDownClass (tempest.scenario.test_network_basic_ops.TestNetworkBasicOps)
2013-11-21 20:32:14.522 | ----------------------------------------------------------------------
2013-11-21 20:32:14.522 | _StringException: Traceback (most recent call last):
2013-11-21 20:32:14.522 | File "tempest/scenario/manager.py", line 248, in tearDownClass
2013-11-21 20:32:14.522 | thing.delete()
2013-11-21 20:32:14.522 | File "tempest/api/network/common.py", line 73, in delete
2013-11-21 20:32:14.523 | self.client.delete_subnet(self.id)
2013-11-21 20:32:14.523 | File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 111, in with_params
2013-11-21 20:32:14.523 | ret = self.function(instance, *args, **kwargs)
2013-11-21 20:32:14.523 | File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 379, in delete_subnet
2013-11-21 20:32:14.523 | return self.delete(self.subnet_path % (subnet))
2013-11-21 20:32:14.523 | File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 1232, in delete
2013-11-21 20:32:14.524 | headers=headers, params=params)
2013-11-21 20:32:14.524 | File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 1221, in retry_request
2013-11-21 20:32:14.524 | headers=headers, params=params)
2013-11-21 20:32:14.524 | File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 1164, in do_request
2013-11-21 20:32:14.524 | self._handle_fault_response(status_code, replybody)
2013-11-21 20:32:14.525 | File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 1134, in _handle_fault_response
2013-11-21 20:32:14.525 | exception_handler_v20(status_code, des_error_body)
2013-11-21 20:32:14.525 | File "/opt/stack/new/python-neutronclient/neutronclient/v2_0/client.py", line 96, in exception_handler_v20
2013-11-21 20:32:14.525 | message=msg)
2013-11-21 20:32:14.525 | NeutronClientException: 409-{u'NeutronError': {u'message': u'Unable to complete operation on subnet cb9e145b-294c-44d8-a603-a653164c61f4. One or m...

Read more...

Revision history for this message
Miguel Lavalle (minsel) wrote :

marun,

There is a follow up question: why is test_network_basic_ops.TestNetworkBasicOps finding active ports in the subnet when it attempts to delete it? If it is deleting the subnet, it has obviously already deleted the router, so there shouldn't be any active ports. It seems to me we are facing a race condition. A few weeks ago, when I was testing the network isolation code, I found the same situation. So I added a clean-up ports method (https://gist.github.com/miguellavalle/7595562).

i think this race condition is the root cause of this bug, compounded by the disappearing tenant isolation described in my previous comment.

Cheers

Revision history for this message
Miguel Lavalle (minsel) wrote :
Download full text (6.4 KiB)

Digging deeper in this bug:

1) I confirmed tenant isolation is not enabled as seen in the console file:

2013-11-21 20:10:55.923 | 2013-11-21 20:10:55 # Allows test cases to create/destroy tenants and users. This option
2013-11-21 20:10:55.929 | 2013-11-21 20:10:55 # enables isolated test cases and better parallel execution,
2013-11-21 20:10:55.944 | 2013-11-21 20:10:55 # but also requires that OpenStack Identity API admin credentials
2013-11-21 20:10:55.967 | 2013-11-21 20:10:55 # are known.
2013-11-21 20:10:55.974 | 2013-11-21 20:10:55 allow_tenant_isolation = False

2) The test uses the next piece of code to remove the router interface and then delete the subnet:

    def delete(self):
        for router_id in self._router_ids.copy():
            body = dict(subnet_id=self.id)
            self.client.remove_interface_router(router_id, body=body)
            self._router_ids.remove(router_id)
        self.client.delete_subnet(self.id)

3) In the q-svc log file the remove_interface_router operation seems to execute successfully in the following lines:

2013-11-21 20:28:07.908 5630 DEBUG routes.middleware [-] Matched PUT /routers/def040d0-31da-419a-96fa-12c851263710/remove_router_interface.json __call__ /usr/lib/python2.7/dist-packages/routes/middleware.py:100
2013-11-21 20:28:07.908 5630 DEBUG routes.middleware [-] Route path: '/routers/:(id)/remove_router_interface.:(format)', defaults: {'action': u'remove_router_interface', 'controller': <wsgify at 67971344 wrapping <function resource at 0x40d0488>>} __call__ /usr/lib/python2.7/dist-packages/routes/middleware.py:102
2013-11-21 20:28:07.908 5630 DEBUG routes.middleware [-] Match dict: {'action': u'remove_router_interface', 'controller': <wsgify at 67971344 wrapping <function resource at 0x40d0488>>, 'id': u'def040d0-31da-419a-96fa-12c851263710', 'format': u'json'} __call__ /usr/lib/python2.7/dist-packages/routes/middleware.py:103
2013-11-21 20:28:07.973 5630 DEBUG neutron.plugins.ml2.plugin [-] Deleting port ffd197fe-4267-446d-87a0-5374fe918e20 delete_port /opt/stack/new/neutron/neutron/plugins/ml2/plugin.py:630
2013-11-21 20:28:07.997 5630 DEBUG neutron.plugins.ml2.drivers.mech_agent [-] Unbinding port ffd197fe-4267-446d-87a0-5374fe918e20 on network 1997dcd4-63fe-4345-a9ae-d325877ddc65 unbind_port /opt/stack/new/neutron/neutron/plugins/ml2/drivers/mech_agent.py:90
2013-11-21 20:28:08.000 5630 DEBUG neutron.plugins.ml2.plugin [-] Calling base delete_port delete_port /opt/stack/new/neutron/neutron/plugins/ml2/plugin.py:647
2013-11-21 20:28:08.007 5630 DEBUG neutron.db.db_base_plugin_v2 [-] Delete allocated IP 10.100.0.1 (1997dcd4-63fe-4345-a9ae-d325877ddc65/cb9e145b-294c-44d8-a603-a653164c61f4) _delete_ip_allocation /opt/stack/new/neutron/neutron/db/db_base_plugin_v2.py:444
2013-11-21 20:28:08.009 5630 DEBUG neutron.db.db_base_plugin_v2 [-] 10.100.0.1 (cb9e145b-294c-44d8-a603-a653164c61f4) is not recycled _delete_port /opt/stack/new/neutron/neutron/db/db_base_plugin_v2.py:1449

4) A few lines below in the q-svc log file , the subnet delete operation can be seen failing:

2013-11-21 20:28:08.164 5630 DEBUG routes.middleware [-] Matched DELETE /subnets/cb9e145b-294c-44d8-a603-a653164c61f4.j...

Read more...

tags: added: neutron-parallel
tags: removed: neutron-parallel
Revision history for this message
Miguel Lavalle (minsel) wrote :
Download full text (3.3 KiB)

Some more debugging log data from q-svc.... The ports that the router had:

[
     {
          "status": "ACTIVE",
          "external_gateway_info": {
               "network_id": "470c942f-333f-4b3f-bdbe-07c4115e86aa",
               "enable_snat": true
          },
          "enable_snat": true,
          "name": "router-smoke--tempest-2074116590",
          "gw_port_id": "bb16dad6-efa2-437a-8964-75cb7e427d27",
          "admin_state_up": true,
          "tenant_id": "3599764d63444bfcab8a35613402cb49",
          "gw_port": {
               "status": "ACTIVE",
               "binding:host_id": "devstack-precise-check-rax-ord-714619.slave.openstack.org",
               "allowed_address_pairs": [],
               "extra_dhcp_opts": [],
               "device_owner": "network:router_gateway",
               "fixed_ips": [
                    {
                         "subnet_id": "04dc9fbb-4b2d-402d-b859-d39973b4f594",
                         "ip_address": "172.24.4.228"
                    }
               ],
               "id": "bb16dad6-efa2-437a-8964-75cb7e427d27",
               "security_groups": [],
               "device_id": "def040d0-31da-419a-96fa-12c851263710",
               "subnet": {
                    "cidr": "172.24.4.224/28",
                    "gateway_ip": "172.24.4.225",
                    "id": "04dc9fbb-4b2d-402d-b859-d39973b4f594"
               },
               "name": "",
               "admin_state_up": true,
               "network_id": "470c942f-333f-4b3f-bdbe-07c4115e86aa",
               "tenant_id": "",
               "binding:vif_type": "ovs",
               "binding:capabilities": {
                    "port_filter": true
               },
               "mac_address": "fa:16:3e:89:a1:9e"
          },
          "_interfaces": [
               {
                    "status": "ACTIVE",
                    "binding:host_id": "devstack-precise-check-rax-ord-714619.slave.openstack.org",
                    "allowed_address_pairs": [],
                    "extra_dhcp_opts": [],
                    "device_owner": "network:router_interface",
                    "fixed_ips": [
                         {
                              "subnet_id": "cb9e145b-294c-44d8-a603-a653164c61f4",
                              "ip_address": "10.100.0.1"
                         }
                    ],
                    "id": "ffd197fe-4267-446d-87a0-5374fe918e20",
                    "security_groups": [],
                    "device_id": "def040d0-31da-419a-96fa-12c851263710",
                    "subnet": {
                         "cidr": "10.100.0.0/24",
                         "gateway_ip": "10.100.0.1",
                         "id": "cb9e145b-294c-44d8-a603-a653164c61f4"
                    },
                    "name": "",
                    "admin_state_up": true,
                    "network_id": "1997dcd4-63fe-4345-a9ae-d325877ddc65",
                    "tenant_id": "3599764d63444bfcab8a35613402cb49",
                    "binding:vif_type": "ovs",
                    "binding:capabilities": {
                         "port_filter": true
                    }, ...

Read more...

Revision history for this message
Anita Kuno (anteaya) wrote :
Revision history for this message
Jerry Zhao (zhaoxinyu) wrote :

It happens in every single run in my CI environment unfortunately.

Revision history for this message
Anita Kuno (anteaya) wrote :

mlavalle feels this bug is related to https://bugs.launchpad.net/tempest/+bug/1259282 which has a patch up for review: https://review.openstack.org/#/c/61234/

Miguel Lavalle (minsel)
Changed in neutron:
assignee: Maru Newby (maru) → Miguel Lavalle (minsel)
Revision history for this message
Miguel Lavalle (minsel) wrote :

After merging fix https://review.openstack.org/#/c/61234/ this bug hasn't shown up in the last 5 days. However, this bug is likely to show up from time to time every time a tempest test in the gate fails to delete networks it created. This happens because:

1) As of this writing (12/16/2013) neutron tempest tests in the gate are working without tenant and network isolation. In other words, all the tests execute under the demo tenant created by DevStack

2) DevStack creates a default network (name private) for the demo tenant

3) If a test creates other networks and fails to execute it's tearDownClass method, the newly created networks will be left alive in the demo tenant

4) As a consequence of previous point, any other test that subsequently creates a server without explicitly specifying which networks it should be connected to, is going to get message: " multiple possible networks found, use a Network ID to be more specific. (HTTP 400)"

This bug will only completely go away when we enable Tempest tenant and network isolation for Neutron in the gate. In the meantime, my proposal is to lower the importance of this bug and I commit to keep an eye on it until network isolation is enabled

Revision history for this message
Anita Kuno (anteaya) wrote :

I asked Sean Dague about lowering the importance and he said he would like to keep it at critical until the bug is at zero.

So do we have a blueprint or bug report for Tempest tenant and network isolation so that we can link them to this bug report?

Revision history for this message
Miguel Lavalle (minsel) wrote :

I developed and merged the code for network isolation back in September: https://review.openstack.org/#/c/45578/

This code is not executing in trunk gate jobs because of the many Neutron bugs we've been wrestling to submission over the past few months. As soon as parallel testing is enabled, network isolation will be also enabled

Revision history for this message
Mark McClain (markmcclain) wrote :
Changed in neutron:
status: New → Triaged
milestone: none → icehouse-rc1
tags: added: havana-backport-potential tempest
Changed in neutron:
milestone: icehouse-rc1 → none
no longer affects: neutron
Revision history for this message
Matthew Treinish (treinish) wrote :

So the solution to this is to enable tenant isolation on the havana jobs which isn't really an option. That's because this whole cycle one of the big pushes has been to enable tenant isolation with neutron stably. Those improvements have also fixed the unisolated case which really isn't back-portable. So we're just going to have to live with the occasional failure on the stable havana jobs

Changed in tempest:
milestone: none → havana-stable
status: New → Won't Fix
Revision history for this message
Sergey Levchenko (slevchenko) wrote :

This error is actual for any tempest test that is executed on environments where shared networks are in use, not just for neutron related tests. As a workaround for this issue we have created patches for json and xml servers_client.py located in a "./tempest/tempest/services/compute/xml" and "./tempest/tempest/services/compute/json" accordingly.

These patches change servers_client.py create_server() logic to select first available network (if no network is specified in function's arguments) instead of implying that only one network is available for a tenant.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

I don't see this as a heat issue at all. Heat is deleting things and accepting Neutron's answer that they are in fact deleted. Reassigning to Neutron.

affects: heat → neutron
no longer affects: neutron
Revision history for this message
György Szombathelyi (gyurco) wrote :

What about skipping shared networks in tempest, when no fixed network is specified?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.