test_verify_duplicate_network_nics[id-1678d144-ed74-43f8-8e57-ab10dbf9b3c2] race fails during port deletion on postgresql job

Bug #1691176 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Matt Riedemann

Bug Description

Seen here:

http://logs.openstack.org/59/383859/22/experimental/gate-tempest-dsvm-neutron-pg-full-ubuntu-xenial/b2f29e9/console.html.gz#_2017-04-21_14_20_13_438646

NOTE: THIS DOES NOT APPEAR TO BE CAUSED BY POSTGRESQL AT ALL, IT'S A TIMING ISSUE

This test is creating two networks and a subnet on each network, and then booting a server with those two networks.

The test fails on teardown of the network because the dhcp port is already gone when the network delete request is being processed, which is racing with the asynchronous subnet delete on the same network (tempest cleans up in a LIFO order, so the subnet is created after the network and deleted before the network).

Notes from debugging the logs:

aee62d13-f5f2-4aec-8fec-abd56c99eda1 - tempest-ServersTestJSON-927337740 - 19.80.0.0/24

a144b4fb-5a53-4f52-baad-07811e528c81 - tempest-ServersTestJSON-1149271515 - 19.86.0.0/24

create server with these requested networks:

"networks": [{"uuid": "aee62d13-f5f2-4aec-8fec-abd56c99eda1"},
             {"uuid": "a144b4fb-5a53-4f52-baad-07811e528c81"},
             {"uuid": "aee62d13-f5f2-4aec-8fec-abd56c99eda1"}]

server IPs:

{"addresses": {"tempest-ServersTestJSON-927337740": [
   {"version": 4, "addr": "19.80.0.6"},
   {"version": 4, "addr": "19.80.0.3"}],
        "tempest-ServersTestJSON-1149271515": [
   {"version": 4, "addr": "19.86.0.6"}]}}

delete failure on the first network:

req: 2017-04-21 13:53:59,064 28262 404 DELETE https://15.184.65.212:9696/v2.0/networks/aee62d13-f5f2-4aec-8fec-abd56c99eda1

res:

{
   'status':'404',
   u'content-length':'132',
   u'server':'Apache/2.4.18 (Ubuntu)',
   'content-location':'https://15.184.65.212:9696/v2.0/networks/aee62d13-f5f2-4aec-8fec-abd56c99eda1',
   u'date':'Fri, 21 Apr 2017 13:53:59 GMT',
   u'content-type':'application/json',
   u'x-openstack-request-id':'req-8f94b291-ced4-4c88-9783-ceb4b04eb320',
   u'connection':'close'
}

{"NeutronError": {"message": "Port b48cc7a3-d81f-49f4-b87c-97958192897c could not be found.",
                  "type": "PortNotFound", "detail": ""}}

The problem port:

b48cc7a3-d81f-49f4-b87c-97958192897c

The port is created here:

2017-04-21 13:53:14.088 17266 Allocated IP 19.80.0.2 (aee62d13-f5f2-4aec-8fec-abd56c99eda1/2b0869e7-f15b-4030-a745-6a1782b57878/b48cc7a3-d81f-49f4-b87c-97958192897c)

[Port(admin_state_up=True,allowed_address_pairs=[],binding=PortBinding,binding_levels=[],
created_at=2017-04-21T13:53:13Z,data_plane_status=<?>,description='',
device_id='dhcpef44eb91-ca69-52e5-a9b5-bb0cff2edb74-aee62d13-f5f2-4aec-8fec-abd56c99eda1',
device_owner='network:dhcp',dhcp_options=[],distributed_binding=None,dns=None,fixed_ips=[IPAllocation],
id=b48cc7a3-d81f-49f4-b87c-97958192897c,mac_address=fa:16:3e:0d:e9:1d,name='',
network_id=aee62d13-f5f2-4aec-8fec-abd56c99eda1,project_id='0ca1e45b37a042b9ba3b88e442526972',
qos_policy_id=None,revision_number=6,security=PortSecurity(b48cc7a3-d81f-49f4-b87c-97958192897c),
security_group_ids=set([]),status='DOWN',updated_at=2017-04-21T13:53:14Z)]

Subnet is being deleted here:

2017-04-21 13:53:57,680 28262 INFO [tempest.lib.common.rest_client] Request (ServersTestJSON:_run_cleanups): 204 DELETE https://15.184.65.212:9696/v2.0/subnets/2b0869e7-f15b-4030-a745-6a1782b57878

Port is being deleted here:

http://logs.openstack.org/59/383859/22/experimental/gate-tempest-dsvm-neutron-pg-full-ubuntu-xenial/b2f29e9/logs/screen-q-svc.txt.gz#_2017-04-21_13_53_57_986

2017-04-21 13:53:57.986 17266 DEBUG neutron.plugins.ml2.plugin [req-893ff18c-5732-42e6-b7ef-bddf9c42d9e7 - -] Deleting port b48cc7a3-d81f-49f4-b87c-97958192897c _pre_delete_port /opt/stack/new/neutron/neutron/plugins/ml2/plugin.py:1410

network delete is trying to delete the port here:

http://logs.openstack.org/59/383859/22/experimental/gate-tempest-dsvm-neutron-pg-full-ubuntu-xenial/b2f29e9/logs/screen-q-svc.txt.gz#_2017-04-21_13_53_58_412

2017-04-21 13:53:58.412 17264 DEBUG neutron.plugins.ml2.plugin [req-8f94b291-ced4-4c88-9783-ceb4b04eb320 tempest-ServersTestJSON-1475565165 tempest-ServersTestJSON-1475565165] Deleting port b48cc7a3-d81f-49f4-b87c-97958192897c _pre_delete_port /opt/stack/new/neutron/neutron/plugins/ml2/plugin.py:1410

2017-04-21 13:53:58.988 17264 DEBUG neutron.plugins.ml2.plugin [req-8f94b291-ced4-4c88-9783-ceb4b04eb320 tempest-ServersTestJSON-1475565165 tempest-ServersTestJSON-1475565165] Calling delete_port for b48cc7a3-d81f-49f4-b87c-97958192897c owned by network:dhcp delete_port /opt/stack/new/neutron/neutron/plugins/ml2/plugin.py:1469

2017-04-21 13:53:59.061 17264 INFO neutron.pecan_wsgi.hooks.translation [req-8f94b291-ced4-4c88-9783-ceb4b04eb320 tempest-ServersTestJSON-1475565165 tempest-ServersTestJSON-1475565165] DELETE failed (client error): The resource could not be found.

Changed in neutron:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

There are a number of failures related to this. It looks like running on Postgres introduces different timing dynamics and that leads to exposure of underlying races when cleaning up networks. For instance, in this run, I see the following sequence of events:

http://logs.openstack.org/periodic/periodic-tempest-dsvm-neutron-pg-full-ubuntu-xenial/bf54195/logs/screen-q-svc.txt.gz#_May_07_06_43_15_123097 - subnet deleted

http://logs.openstack.org/periodic/periodic-tempest-dsvm-neutron-pg-full-ubuntu-
xenial/bf54195/logs/screen-q-svc.txt.gz#_May_07_06_43_15_578966 - port deleted

http://logs.openstack.org/periodic/periodic-tempest-dsvm-neutron-pg-full-ubuntu-xenial/bf54195/logs/screen-q-svc.txt.gz#_May_07_06_43_15_907385 - network deleted

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/465125

Changed in neutron:
assignee: nobody → Matt Riedemann (mriedem)
status: Confirmed → In Progress
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :
Changed in neutron:
assignee: Matt Riedemann (mriedem) → Kevin Benton (kevinbenton)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/465125
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=37b84c4c7375302e331fd4d779a732fa18cb09c8
Submitter: Jenkins
Branch: master

commit 37b84c4c7375302e331fd4d779a732fa18cb09c8
Author: Matt Riedemann <email address hidden>
Date: Tue May 16 13:35:52 2017 -0400

    Handle PortNotFound when deleting a network

    Tempest has some tests which fail during network cleanup
    because they are racing with the subnet delete which cleans
    up the neutron:dhcp owned port, which the network delete code
    also tries to delete.

    This change handles the PortNotFound for these types of auto-deleted
    ports so we don't fail to delete the network when the ports are
    already gone.

    Change-Id: I54ebf2be14e82f288a9bf177967f1136b72b289e
    Closes-Bug: #1691176

Changed in neutron:
status: In Progress → Fix Released
Changed in neutron:
assignee: Kevin Benton (kevinbenton) → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.0.0b2

This issue was fixed in the openstack/neutron 11.0.0.0b2 development milestone.

tags: added: neutron-proactive-backport-potential
tags: removed: neutron-proactive-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.