neutron with ovn returns Conflict on security group rules delete

Bug #1933638 reported by Gregory Thiemonge
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Rodolfo Alonso

Bug Description

This issue was caught in an Octavia CI job (https://zuul.opendev.org/t/openstack/build/9cb24aa49cbb47e6abeb580e5d5ec6f0/logs)

During the deletion of a load balancer, Octavia deletes security group rules in neutron. It seems that Octavia is trying to delete many times the same security group rule, and it sometimes receives a Conflict exception while the exception message explains that the security group rule doesn't exist:

Jun 22 12:36:00.226969 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 neutron-server[88368]: INFO neutron.wsgi [None req-e2baf119-3462-4af0-8b08-1e35cf0ba6d2 admin admin] 199.19.213.147,199.19.213.147 "DELETE /v2.0/security-group-rules/ec7d4cb6-a872-4709-854a-efaca7527822 HTTP/1.1" status: 204 len: 173 time: 0.0580175
Jun 22 12:36:00.228298 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 neutron-server[88367]: INFO neutron.api.v2.resource [None req-42ecd2c1-85d8-4ea6-b9ec-b98af458a8aa admin admin] delete failed (client error): The resource could not be found.
Jun 22 12:36:00.229361 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 neutron-server[88367]: INFO neutron.wsgi [None req-42ecd2c1-85d8-4ea6-b9ec-b98af458a8aa admin admin] 199.19.213.147,199.19.213.147 "DELETE /v2.0/security-group-rules/ec7d4cb6-a872-4709-854a-efaca7527822 HTTP/1.1" status: 404 len: 361 time: 0.0507255
Jun 22 12:36:00.230639 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 neutron-server[88368]: DEBUG neutron.api.rpc.handlers.resources_rpc [None req-e2baf119-3462-4af0-8b08-1e35cf0ba6d2 admin admin] Pushing event deleted for resources: {'SecurityGroupRule': ['ID=ec7d4cb6-a872-4709-854a-efaca7527822,revision_number=None']} {{(pid=88368) push /opt/stack/neutron/neutron/api/rpc/handlers/resources_rpc.py:237}}
Jun 22 12:36:00.230973 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 neutron-server[88367]: DEBUG neutron_lib.callbacks.manager [None req-92d0b84a-6fce-40e8-8c6c-f08c4c362481 admin admin] Callback neutron.plugins.ml2.drivers.ovn.mech_driver.mech_driver.OVNMechanismDriver._process_sg_rule_notification-750270 raised Security group rule ec7d4cb6-a872-4709-854a-efaca7527822 does not exist {{(pid=88367) _notify_loop /usr/local/lib/python3.8/dist-packages/neutron_lib/callbacks/manager.py:209}}
Jun 22 12:36:00.231248 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 neutron-server[88367]: DEBUG neutron_lib.callbacks.manager [None req-92d0b84a-6fce-40e8-8c6c-f08c4c362481 admin admin] Notify callbacks [] for security_group_rule, abort_delete {{(pid=88367) _notify_loop /usr/local/lib/python3.8/dist-packages/neutron_lib/callbacks/manager.py:192}}
Jun 22 12:36:00.231444 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 neutron-server[88368]: DEBUG oslo_concurrency.lockutils [None req-e2baf119-3462-4af0-8b08-1e35cf0ba6d2 admin admin] Lock "event-dispatch" released by "neutron.plugins.ml2.ovo_rpc._ObjectChangeHandler.dispatch_events" :: held 0.008s {{(pid=88368) inner /usr/local/lib/python3.8/dist-packages/oslo_concurrency/lockutils.py:367}}
Jun 22 12:36:00.231819 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 neutron-server[88367]: INFO neutron.api.v2.resource [None req-92d0b84a-6fce-40e8-8c6c-f08c4c362481 admin admin] delete failed (client error): There was a conflict when trying to complete your request.
Jun 22 12:36:00.232958 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 neutron-server[88367]: INFO neutron.wsgi [None req-92d0b84a-6fce-40e8-8c6c-f08c4c362481 admin admin] 199.19.213.147,199.19.213.147 "DELETE /v2.0/security-group-rules/ec7d4cb6-a872-4709-854a-efaca7527822 HTTP/1.1" status: 409 len: 588 time: 0.0547035

In the octavia logs, we received:

Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker Traceback (most recent call last):
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker File "/usr/local/lib/python3.8/dist-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker result = task.execute(**arguments)
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker File "/opt/stack/octavia/octavia/controller/worker/v1/tasks/network_tasks.py", line 519, in execute
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker self.network_driver.update_vip(loadbalancer, for_delete=True)
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker File "/opt/stack/octavia/octavia/network/drivers/neutron/allowed_address_pairs.py", line 622, in update_vip
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker self._update_security_group_rules(load_balancer,
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker File "/opt/stack/octavia/octavia/network/drivers/neutron/allowed_address_pairs.py", line 207, in _update_security_group_rules
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker self.neutron_client.delete_security_group_rule(rule_id)
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker File "/usr/local/lib/python3.8/dist-packages/neutronclient/v2_0/client.py", line 1043, in delete_security_group_rule
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker return self.delete(self.security_group_rule_path %
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker File "/usr/local/lib/python3.8/dist-packages/neutronclient/v2_0/client.py", line 352, in delete
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker return self.retry_request("DELETE", action, body=body,
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker File "/usr/local/lib/python3.8/dist-packages/neutronclient/v2_0/client.py", line 333, in retry_request
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker return self.do_request(method, action, body=body,
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker File "/usr/local/lib/python3.8/dist-packages/neutronclient/v2_0/client.py", line 297, in do_request
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker self._handle_fault_response(status_code, replybody, resp)
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker File "/usr/local/lib/python3.8/dist-packages/neutronclient/v2_0/client.py", line 272, in _handle_fault_response
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker exception_handler_v20(status_code, error_body)
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker File "/usr/local/lib/python3.8/dist-packages/neutronclient/v2_0/client.py", line 90, in exception_handler_v20
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker raise client_exc(message=error_message,
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker neutronclient.common.exceptions.Conflict: Security Group Rule ec7d4cb6-a872-4709-854a-efaca7527822 cannot perform before_delete due to Callback neutron.plugins.ml2.drivers.ovn.mech_driver.mech_driver.OVNMechanismDriver._process_sg_rule_notification-750270 failed with "Security group rule ec7d4cb6-a872-4709-854a-efaca7527822 does not exist".
Jun 22 12:36:00.263676 nested-virt-ubuntu-focal-vexxhost-ca-ymq-1-0025224760 octavia-worker[127003]: ERROR octavia.controller.worker.v1.controller_worker Neutron server returns request_ids: ['req-92d0b84a-6fce-40e8-8c6c-f08c4c362481']

I think we should have received a NotFound exception (that we catch in Octavia) instead of this Conflict exception.

This issue has appeared after the switch to ML2/OVN.

Miguel Lavalle (minsel)
Changed in neutron:
importance: Undecided → Medium
status: New → Triaged
tags: added: ovn
summary: - neutronclient returns Conflict on security group rules delete
+ neutron with ovn returns Conflict on security group rules delete
Revision history for this message
Akihiro Motoki (amotoki) wrote :

As Miguel triaged, it looks specific to OVN mechanism driver. The error message returned to octavia says:
---
Conflict: Security Group Rule ec7d4cb6-a872-4709-854a-efaca7527822 cannot perform
before_delete due to Callback neutron.plugins.ml2.drivers.ovn.mech_driver.mech_driver.OVNMechanismDriver._process_sg_rule_notification-750270
failed with "Security group rule ec7d4cb6-a872-4709-854a-efaca7527822 does not exist".
---

Looking at neutron/db/securitygroups_db.py, when a callback fails, ext_sg.SecurityGroupRuleInUse is always raised to the API layer, so 409 Conflict is returned.

We need to investigate why the OVN mechanism driver BEFORE_DELETE callback fails due to "Security group rule ec7d4cb6-a872-4709-854a-efaca7527822 does not exist".

Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
Revision history for this message
Akihiro Motoki (amotoki) wrote :

Note from the neutron meeting Jun 29 [1]: octavia CI deletes 2 or 3 times the same SG rule, the first one is ok, the 2nd one gets a 409 and sometimes there's 3rd one that gets a 404.

[1] https://meetings.opendev.org/meetings/networking/2021/networking.2021-06-29-14.00.log.html#l-103

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/798718

Changed in neutron:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/798718
Committed: https://opendev.org/openstack/neutron/commit/6a74cd76fd006d7c88575fc7ae39a3b499ac7d54
Submitter: "Zuul (22348)"
Branch: master

commit 6a74cd76fd006d7c88575fc7ae39a3b499ac7d54
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Jun 29 16:48:26 2021 +0000

    [OVN] Do not fail when processing SG rule deletion

    When a security group rule deletion command is issued, before executing
    the database operations, a "BEFORE_DELETE" event is raised.

    The OVN handler attending to this event should not fail if the security
    group rule does not exist; the database transaction [1] will in case of
    not finding it, raising the correct exception and HTTP 404 error:

      Jun 29 16:58:28 dev20 neutron-server[8820]: INFO neutron.wsgi [None \
        req-1821ec9f-2439-420b-80eb-1138896de865 demo admin] 192.168.10.70 \
        "GET /v2.0/security-group-rules/missing_sg_rule_example HTTP/1.1" \
        status: 404 len: 348 time: 0.0352871

    [1]https://github.com/openstack/neutron/blob/6196c0873b4af9abd9055bf92478ff05ae090104/neutron/db/securitygroups_db.py#L858-L868

    Change-Id: I58f6e5b309e089f6681d2c4bbff4ff7fda96435f
    Closes-Bug: #1933638

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/799209

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/799210

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/799372

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/799209
Committed: https://opendev.org/openstack/neutron/commit/58e9da409d157a66b9cd9920a37a97dc4bd64264
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 58e9da409d157a66b9cd9920a37a97dc4bd64264
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Jun 29 16:48:26 2021 +0000

    [OVN] Do not fail when processing SG rule deletion

    When a security group rule deletion command is issued, before executing
    the database operations, a "BEFORE_DELETE" event is raised.

    The OVN handler attending to this event should not fail if the security
    group rule does not exist; the database transaction [1] will in case of
    not finding it, raising the correct exception and HTTP 404 error:

      Jun 29 16:58:28 dev20 neutron-server[8820]: INFO neutron.wsgi [None \
        req-1821ec9f-2439-420b-80eb-1138896de865 demo admin] 192.168.10.70 \
        "GET /v2.0/security-group-rules/missing_sg_rule_example HTTP/1.1" \
        status: 404 len: 348 time: 0.0352871

    [1]https://github.com/openstack/neutron/blob/6196c0873b4af9abd9055bf92478ff05ae090104/neutron/db/securitygroups_db.py#L858-L868

    Conflicts:
          neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py

    Change-Id: I58f6e5b309e089f6681d2c4bbff4ff7fda96435f
    Closes-Bug: #1933638
    (cherry picked from commit 6a74cd76fd006d7c88575fc7ae39a3b499ac7d54)

tags: added: in-stable-wallaby
tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/799210
Committed: https://opendev.org/openstack/neutron/commit/32c6a39a8cf35c75800c35ac5fe6b15ca99e3c34
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 32c6a39a8cf35c75800c35ac5fe6b15ca99e3c34
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Jun 29 16:48:26 2021 +0000

    [OVN] Do not fail when processing SG rule deletion

    When a security group rule deletion command is issued, before executing
    the database operations, a "BEFORE_DELETE" event is raised.

    The OVN handler attending to this event should not fail if the security
    group rule does not exist; the database transaction [1] will in case of
    not finding it, raising the correct exception and HTTP 404 error:

      Jun 29 16:58:28 dev20 neutron-server[8820]: INFO neutron.wsgi [None \
        req-1821ec9f-2439-420b-80eb-1138896de865 demo admin] 192.168.10.70 \
        "GET /v2.0/security-group-rules/missing_sg_rule_example HTTP/1.1" \
        status: 404 len: 348 time: 0.0352871

    [1]https://github.com/openstack/neutron/blob/6196c0873b4af9abd9055bf92478ff05ae090104/neutron/db/securitygroups_db.py#L858-L868

    Conflicts:
          neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py

    Change-Id: I58f6e5b309e089f6681d2c4bbff4ff7fda96435f
    Closes-Bug: #1933638
    (cherry picked from commit 6a74cd76fd006d7c88575fc7ae39a3b499ac7d54)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/799372
Committed: https://opendev.org/openstack/neutron/commit/84ed85c7f1b7ea267521d2165ef3dd9b8423ec09
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 84ed85c7f1b7ea267521d2165ef3dd9b8423ec09
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Jun 29 16:48:26 2021 +0000

    [OVN] Do not fail when processing SG rule deletion

    When a security group rule deletion command is issued, before executing
    the database operations, a "BEFORE_DELETE" event is raised.

    The OVN handler attending to this event should not fail if the security
    group rule does not exist; the database transaction [1] will in case of
    not finding it, raising the correct exception and HTTP 404 error:

      Jun 29 16:58:28 dev20 neutron-server[8820]: INFO neutron.wsgi [None \
        req-1821ec9f-2439-420b-80eb-1138896de865 demo admin] 192.168.10.70 \
        "GET /v2.0/security-group-rules/missing_sg_rule_example HTTP/1.1" \
        status: 404 len: 348 time: 0.0352871

    [1]https://github.com/openstack/neutron/blob/6196c0873b4af9abd9055bf92478ff05ae090104/neutron/db/securitygroups_db.py#L858-L868

    Conflicts:
          neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py

    Change-Id: I58f6e5b309e089f6681d2c4bbff4ff7fda96435f
    Closes-Bug: #1933638
    (cherry picked from commit 6a74cd76fd006d7c88575fc7ae39a3b499ac7d54)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 16.4.0

This issue was fixed in the openstack/neutron 16.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 17.2.0

This issue was fixed in the openstack/neutron 17.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 18.1.0

This issue was fixed in the openstack/neutron 18.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 19.0.0.0rc1

This issue was fixed in the openstack/neutron 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn train-eol

This issue was fixed in the openstack/networking-ovn train-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.