Race condition in subnet and segment delete: The segment is still bound with port(s)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Medium
|
Bence Romsics |
Bug Description
The HOT template below may expose a race condition and by that make stack deletion fail. On the neutron API this means that a segment delete fails with "The segment is still bound with port(s)". The reproduction uses a HOT template but I don't think this problem is Heat specific. Rather I think it depends on quick succession of API calls, which Heat does rather well.
Configuration:
ml2_conf.ini
[ml2]
mechanism_drivers = openvswitch,
tenant_
[ml2_type_vlan]
network_vlan_ranges = physnet0:
sriov_agent.ini
[sriov_nic]
physical_
ovs_agent.ini
[ovs]
bridge_mappings = physnet0:
dhcp_agent.ini
[DEFAULT]
interface_driver = openvswitch
Reproduction:
# make this config live
$ sudo systemctl restart devstack@neutron-*
# create stack
$ openstack stack create -t ~/bug-segment-
2020-05-14 14:23:23Z [s0]: CREATE_IN_PROGRESS Stack CREATE started
2020-05-14 14:23:24Z [s0.net0]: CREATE_IN_PROGRESS state changed
2020-05-14 14:23:24Z [s0.net0]: CREATE_COMPLETE state changed
2020-05-14 14:23:24Z [s0.segment0]: CREATE_IN_PROGRESS state changed
2020-05-14 14:23:25Z [s0.segment0]: CREATE_COMPLETE state changed
2020-05-14 14:23:25Z [s0.subnet0]: CREATE_IN_PROGRESS state changed
2020-05-14 14:23:26Z [s0.subnet0]: CREATE_COMPLETE state changed
2020-05-14 14:23:26Z [s0]: CREATE_COMPLETE Stack CREATE completed successfully
CREATE_COMPLETE
# wait until the dhcp port is created and it becomes ACTIVE
$ openstack stack resource show s0 net0 -f value -c physical_
+------
| ID | Name | MAC Address | Fixed IP Addresses | Status |
+------
| 8cf8f188-
+------
# the dhcp port is not created by heat of course
$ openstack stack resource list s0
+------
| resource_name | physical_
+------
| net0 | 80ad5941-
| subnet0 | 376d2581-
| segment0 | 641c8c60-
+------
# stack delete fails
$ openstack stack delete s0 --yes --wait
2020-05-14 14:37:10Z [s0]: DELETE_IN_PROGRESS Stack DELETE started
2020-05-14 14:37:10Z [s0.subnet0]: DELETE_IN_PROGRESS state changed
2020-05-14 14:37:11Z [s0.subnet0]: DELETE_COMPLETE state changed
2020-05-14 14:37:11Z [s0.segment0]: DELETE_IN_PROGRESS state changed
2020-05-14 14:37:11Z [s0.segment0]: DELETE_FAILED ConflictException: resources.segment0: ConflictException: 409: Client Error for url: http://
2020-05-14 14:37:11Z [s0]: DELETE_FAILED Resource DELETE failed: ConflictException: resources.segment0: ConflictException: 409: Client Error for url: http://
Stack s0 DELETE_FAILED·
Unable to delete 1 of the 1 stacks.
# during that heat-engine logged this
$ sudo journalctl -u devstack@h-eng -f | egrep -w ERROR
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
máj 14 14:37:11 devstack1 heat-engine[12508]: ERROR heat.engine.
# a few seconds later a second delete succeeds
$ openstack stack delete s0 --yes --wait
2020-05-14 14:24:26Z [s0]: DELETE_IN_PROGRESS Stack DELETE started
I have an idea what the root cause is. I'll describe that in a comment.
Changed in neutron: | |
assignee: | Bence Romsics (bence-romsics) → Lajos Katona (lajos-katona) |
Changed in neutron: | |
assignee: | Lajos Katona (lajos-katona) → Bence Romsics (bence-romsics) |
The HOT template:
heat_template_ version: newton
description: >
Reproduce a race condition bug: When deleting this template segment
delete may fail because subnet delete triggers async deletion of
dhcp ports, but heat may try to delete the segment earlier while
it still has dhcp ports on it and that will fail.
parameters:
physnet_sriov:
type: string
default: physnet1
physnet_ovs:
type: string
default: physnet0
vlan:
type: number
default: 200
resources:
net0:
provider: network_ type: vlan
provider: segmentation_ id:
get_ param: vlan
provider: physical_ network:
get_ param: physnet_sriov
type: OS::Neutron::Net
properties:
value_specs:
segment0: :Segment
get_resource: net0 network: ion_id:
type: OS::Neutron:
properties:
network:
network_type: vlan
physical_
get_param: physnet_ovs
segmentat
get_param: vlan
subnet0:
get_resource: net0
type: OS::Neutron::Subnet
properties:
enable_dhcp: true
network_id:
cidr: 10.0.4.0/24
depends_on:
- segment0