[OVN Octavia Provider] Unable to delete Load Balancer with PENDING_DELETE

Bug #1936959 reported by Brian Haley
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Unassigned

Bug Description

While attempting to delete a Load Balancer the provisioning status is moved to PENDING_DELETE and remains that way, blocking the deletion process to finalize.

The following tracebacks were found on the logs regarding that specific lb:

2021-07-17 13:49:26.131 19 INFO octavia.api.v2.controllers.load_balancer [req-b8b3cbd8-3014-4c45-9680-d4c67346ed1c - 1e38d4dfbfb7427787725df69fabc22b - default default] Sending delete Load Balancer 19d8e465-c704-40a9-b1fd-5b0824408e5d to provider ovn
2021-07-17 13:49:26.139 19 DEBUG ovn_octavia_provider.helper [-] Handling request lb_delete with info {'id': '19d8e465-c704-40a9-b1fd-5b0824408e5d', 'cascade': True} request_handler /usr/lib/python3.6/site-packages/ovn_octavia_provider/helper.py:303
2021-07-17 13:49:26.196 19 ERROR ovn_octavia_provider.helper [-] Exception occurred during deletion of loadbalancer: RuntimeError: dictionary changed size during iteration
2021-07-17 13:49:26.196 19 ERROR ovn_octavia_provider.helper Traceback (most recent call last):
2021-07-17 13:49:26.196 19 ERROR ovn_octavia_provider.helper File "/usr/lib/python3.6/site-packages/ovn_octavia_provider/helper.py", line 907, in lb_delete
2021-07-17 13:49:26.196 19 ERROR ovn_octavia_provider.helper status = self._lb_delete(loadbalancer, ovn_lb, status)
2021-07-17 13:49:26.196 19 ERROR ovn_octavia_provider.helper File "/usr/lib/python3.6/site-packages/ovn_octavia_provider/helper.py", line 960, in _lb_delete
2021-07-17 13:49:26.196 19 ERROR ovn_octavia_provider.helper for ls in self._find_lb_in_table(ovn_lb, 'Logical_Switch'):
2021-07-17 13:49:26.196 19 ERROR ovn_octavia_provider.helper File "/usr/lib/python3.6/site-packages/ovn_octavia_provider/helper.py", line 289, in _find_lb_in_table
2021-07-17 13:49:26.196 19 ERROR ovn_octavia_provider.helper return [item for item in self.ovn_nbdb_api.tables[table].rows.values()
2021-07-17 13:49:26.196 19 ERROR ovn_octavia_provider.helper File "/usr/lib/python3.6/site-packages/ovn_octavia_provider/helper.py", line 289, in <listcomp>
2021-07-17 13:49:26.196 19 ERROR ovn_octavia_provider.helper return [item for item in self.ovn_nbdb_api.tables[table].rows.values()
2021-07-17 13:49:26.196 19 ERROR ovn_octavia_provider.helper File "/usr/lib64/python3.6/_collections_abc.py", line 761, in __iter__
2021-07-17 13:49:26.196 19 ERROR ovn_octavia_provider.helper for key in self._mapping:
2021-07-17 13:49:26.196 19 ERROR ovn_octavia_provider.helper RuntimeError: dictionary changed size during iteration
2021-07-17 13:49:26.196 19 ERROR ovn_octavia_provider.helper
2021-07-17 13:49:26.446 13 DEBUG octavia.common.keystone [req-267feb7e-2235-43d9-bec8-88ff532b9019 - 1e38d4dfbfb7427787725df69fabc22b - default default] Request path is / and it does not require keystone authentication process_request /usr/lib/python3.6/site-packages/octavia/common/keystone.py:77
2021-07-17 13:49:26.554 19 DEBUG ovn_octavia_provider.helper [-] Updating status to octavia: {'loadbalancers': [{'id': '19d8e465-c704-40a9-b1fd-5b0824408e5d', 'provisioning_status': 'ERROR', 'operating_status': 'ERROR'}], 'listeners': [{'id': '0806594a-4ed7-4889-81fa-6fd8d02b0d80', 'provisioning_status': 'DELETED', 'operating_status': 'OFFLINE'}], 'pools': [{'id': 'b8a98db0-6d2e-4745-b533-d2eb3548d1b9', 'provisioning_status': 'DELETED'}], 'members': [{'id': '08464181-728b-425a-b690-d3eb656f7e0a', 'provisioning_status': 'DELETED'}]} _update_status_to_octavia /usr/lib/python3.6/site-packages/ovn_octavia_provider/helper.py:32

The problem here is that using rows.values() is inherently racy as if there are multiple threads running this can happen eventually.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ovn-octavia-provider (master)
Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ovn-octavia-provider (master)

Reviewed: https://review.opendev.org/c/openstack/ovn-octavia-provider/+/801517
Committed: https://opendev.org/openstack/ovn-octavia-provider/commit/fab03e7c6d3f61afab2fb71ef70efa603f393de2
Submitter: "Zuul (22348)"
Branch: master

commit fab03e7c6d3f61afab2fb71ef70efa603f393de2
Author: Brian Haley <email address hidden>
Date: Tue Jul 20 12:56:50 2021 -0400

    Fix race condition retrieving logical router rows

    Using rows.values() via the ovsdbapp API is inherently
    racy as if there are multiple threads an add/delete can
    interfere, triggering a RuntimeError (dictionary changed
    size during iteration).

    In one case we now directly use lookup() as we were
    looking for a specific logical router.

    In the other two cases we create a new class to do the
    operation in a transaction, making it idempotent, since
    both need to iterate the returned list.

    Also changed the IDL notify code to use a frozen row to
    similary avoid any possible race condition with there.

    Change-Id: Id4e15867b61925fa157c9e81750d8c5f63ad48a5
    Closes-bug: #1936959

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ovn-octavia-provider 1.1.1

This issue was fixed in the openstack/ovn-octavia-provider 1.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ovn-octavia-provider (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/ovn-octavia-provider/+/810488

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ovn-octavia-provider (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/ovn-octavia-provider/+/810489

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ovn-octavia-provider (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/ovn-octavia-provider/+/810570

Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in neutron:
assignee: Brian Haley (brian-haley) → nobody
tags: added: timeout-abandon
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ovn-octavia-provider (stable/victoria)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/ovn-octavia-provider/+/810489
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ovn-octavia-provider (stable/ussuri)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/ovn-octavia-provider/+/810570
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ovn-octavia-provider (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/ovn-octavia-provider/+/827077

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ovn-octavia-provider (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/ovn-octavia-provider/+/810488
Committed: https://opendev.org/openstack/ovn-octavia-provider/commit/44db6c64229569a71472255d35b805f0198f68b3
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 44db6c64229569a71472255d35b805f0198f68b3
Author: Brian Haley <email address hidden>
Date: Tue Jul 20 12:56:50 2021 -0400

    Fix race condition retrieving logical router rows

    Using rows.values() via the ovsdbapp API is inherently
    racy as if there are multiple threads an add/delete can
    interfere, triggering a RuntimeError (dictionary changed
    size during iteration).

    In one case we now directly use lookup() as we were
    looking for a specific logical router.

    In the other two cases we create a new class to do the
    operation in a transaction, making it idempotent, since
    both need to iterate the returned list.

    Also changed the IDL notify code to use a frozen row to
    similary avoid any possible race condition with there.

    Change-Id: Id4e15867b61925fa157c9e81750d8c5f63ad48a5
    Closes-bug: #1936959
    (cherry picked from commit fab03e7c6d3f61afab2fb71ef70efa603f393de2)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ovn-octavia-provider (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/ovn-octavia-provider/+/827077
Committed: https://opendev.org/openstack/ovn-octavia-provider/commit/ae16e697cdbaf11af903f47ac0a5e51ce90cd6bb
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit ae16e697cdbaf11af903f47ac0a5e51ce90cd6bb
Author: Fernando Royo <email address hidden>
Date: Mon Jan 31 14:13:29 2022 +0100

    Fix race condition retrieving logical router rows

    Using rows.values() via the ovsdbapp API is inherently
    racy as if there are multiple threads an add/delete can
    interfere, triggering a RuntimeError (dictionary changed
    size during iteration).

    In one case we now directly use lookup() as we were
    looking for a specific logical router.

    In the other two cases we create a new class to do the
    operation in a transaction, making it idempotent, since
    both need to iterate the returned list.

    Also changed the IDL notify code to use a frozen row to
    similary avoid any possible race condition with there.

    Closes-bug: #1936959
    (cherry picked from commit fab03e7c6d3f61afab2fb71ef70efa603f393de2)
    DependsOn: https://review.opendev.org/c/openstack/devstack/+/827155
    Change-Id: Icd4c83b4f63a90380e8532ec68120dc200f26062

tags: added: in-stable-ussuri
tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ovn-octavia-provider (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/ovn-octavia-provider/+/810489
Committed: https://opendev.org/openstack/ovn-octavia-provider/commit/3a3a1a6624def05d5ceb14f3cc2f0892e9d1e832
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 3a3a1a6624def05d5ceb14f3cc2f0892e9d1e832
Author: Brian Haley <email address hidden>
Date: Tue Jul 20 12:56:50 2021 -0400

    Fix race condition retrieving logical router rows

    Using rows.values() via the ovsdbapp API is inherently
    racy as if there are multiple threads an add/delete can
    interfere, triggering a RuntimeError (dictionary changed
    size during iteration).

    In one case we now directly use lookup() as we were
    looking for a specific logical router.

    In the other two cases we create a new class to do the
    operation in a transaction, making it idempotent, since
    both need to iterate the returned list.

    Also changed the IDL notify code to use a frozen row to
    similary avoid any possible race condition with there.

    Conflicts:
        ovn_octavia_provider/tests/unit/test_helper.py

    Change-Id: Id4e15867b61925fa157c9e81750d8c5f63ad48a5
    Closes-bug: #1936959
    (cherry picked from commit fab03e7c6d3f61afab2fb71ef70efa603f393de2)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ovn-octavia-provider 1.0.1

This issue was fixed in the openstack/ovn-octavia-provider 1.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn train-eol

This issue was fixed in the openstack/networking-ovn train-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ovn-octavia-provider ussuri-eol

This issue was fixed in the openstack/ovn-octavia-provider ussuri-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ovn-octavia-provider victoria-eom

This issue was fixed in the openstack/ovn-octavia-provider victoria-eom release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.