Comment 3 for bug 1878632

Revision history for this message
Bence Romsics (bence-romsics) wrote :

So about what I think the root cause is:

During the stack deletion neutron may delete dhcp ports in two ways:

* triggered by subnet deletion
* triggered by network deletion

AFAICT subnet deletion first leads to an rpc message to the dhcp agent, which deconfigures the subnet deleted, sends another rpc message back to the server, which finally deletes the dhcp port. Please see the message back to neutron-server here:

https://opendev.org/openstack/neutron/src/commit/cb55643a0695ebc5b41f50f6edb1546bcc676b71/neutron/api/rpc/handlers/dhcp_rpc.py#L239-L249

This is usually quite quick, but it does not and cannot happen synchronously in the subnet delete API transaction. Simply polling for the subnet to be gone is not enough. Plus unless a client starts having assumptions about which ports are going to be autodeleted by neutron, that client does have no way to tell when these ports are gone.

Another place where we auto-delete dhcp ports is in network delete:

https://opendev.org/openstack/neutron/src/commit/cb55643a0695ebc5b41f50f6edb1546bcc676b71/neutron/db/db_base_plugin_v2.py#L485-L495

This of course has to be synchronous, otherwise we'd be breaking the consistency of our object model (can't have ports with dangling networks).

But we don't do the same in segment delete:

https://opendev.org/openstack/neutron/src/commit/cb55643a0695ebc5b41f50f6edb1546bcc676b71/neutron/plugins/ml2/db.py#L327-L328

Please also note the 'depends_on' in the HOT template (i.e. the subnet depends_on the segment). Some may argue that the need for this is another bug, but I'm not addressing that here.

With this depends_on heat deletes the subnet first and the segment second, but since the dhcp ports are usually deleted a bit slower, segment deletion fails with "The segment is still bound with port(s)". A second try a few seconds later usually succeeds.

Since the cause is timing dependent, the reproduction is not guaranteed to work every time, but for me practically it does.

For a fix I think we should auto-delete dhcp ports in segment delete. I'll work on a patch.