test_verify_duplicate_network_nics[id-1678d144-ed74-43f8-8e57-ab10dbf9b3c2] race fails during port deletion on postgresql job
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
High
|
Matt Riedemann |
Bug Description
Seen here:
NOTE: THIS DOES NOT APPEAR TO BE CAUSED BY POSTGRESQL AT ALL, IT'S A TIMING ISSUE
This test is creating two networks and a subnet on each network, and then booting a server with those two networks.
The test fails on teardown of the network because the dhcp port is already gone when the network delete request is being processed, which is racing with the asynchronous subnet delete on the same network (tempest cleans up in a LIFO order, so the subnet is created after the network and deleted before the network).
Notes from debugging the logs:
aee62d13-
a144b4fb-
create server with these requested networks:
"networks": [{"uuid": "aee62d13-
server IPs:
{"addresses": {"tempest-
{"version": 4, "addr": "19.80.0.6"},
{"version": 4, "addr": "19.80.0.3"}],
{"version": 4, "addr": "19.86.0.6"}]}}
delete failure on the first network:
req: 2017-04-21 13:53:59,064 28262 404 DELETE https:/
res:
{
'status':'404',
u'content-
u'server'
'content-
u'date':'Fri, 21 Apr 2017 13:53:59 GMT',
u'content-
u'x-
u'connection
}
{"NeutronError": {"message": "Port b48cc7a3-
The problem port:
b48cc7a3-
The port is created here:
2017-04-21 13:53:14.088 17266 Allocated IP 19.80.0.2 (aee62d13-
[Port(admin_
created_
device_
device_
id=b48cc7a3-
network_
qos_policy_
security_
Subnet is being deleted here:
2017-04-21 13:53:57,680 28262 INFO [tempest.
Port is being deleted here:
2017-04-21 13:53:57.986 17266 DEBUG neutron.
network delete is trying to delete the port here:
2017-04-21 13:53:58.412 17264 DEBUG neutron.
2017-04-21 13:53:58.988 17264 DEBUG neutron.
2017-04-21 13:53:59.061 17264 INFO neutron.
Changed in neutron: | |
status: | New → Confirmed |
importance: | Undecided → High |
Changed in neutron: | |
assignee: | Matt Riedemann (mriedem) → Kevin Benton (kevinbenton) |
Changed in neutron: | |
assignee: | Kevin Benton (kevinbenton) → Matt Riedemann (mriedem) |
tags: | added: neutron-proactive-backport-potential |
tags: | removed: neutron-proactive-backport-potential |
There are a number of failures related to this. It looks like running on Postgres introduces different timing dynamics and that leads to exposure of underlying races when cleaning up networks. For instance, in this run, I see the following sequence of events:
http:// logs.openstack. org/periodic/ periodic- tempest- dsvm-neutron- pg-full- ubuntu- xenial/ bf54195/ logs/screen- q-svc.txt. gz#_May_ 07_06_43_ 15_123097 - subnet deleted
http:// logs.openstack. org/periodic/ periodic- tempest- dsvm-neutron- pg-full- ubuntu- bf54195/ logs/screen- q-svc.txt. gz#_May_ 07_06_43_ 15_578966 - port deleted
xenial/
http:// logs.openstack. org/periodic/ periodic- tempest- dsvm-neutron- pg-full- ubuntu- xenial/ bf54195/ logs/screen- q-svc.txt. gz#_May_ 07_06_43_ 15_907385 - network deleted