Neutron server log file full of tracebacks in neutron-tempest-plugin-ovn job

Bug #2019449 reported by Miguel Lavalle
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Miguel Lavalle

Bug Description

Since we merged https://review.opendev.org/c/openstack/neutron/+/876716, the Neutron server log in the neutron-tempest-plugin-ovn job is full of two types of tracebacks. The first type is

May 02 11:29:20.790754 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager [None req-f256cbd3-ea20-49e8-a9a0-0299fc7f34f3 tempest-SecGroupRulesQuotaTest-217177324 tempest-SecGroupRulesQuotaTest-217177324-project-member] Error during notification for neutron.plugins.ml2.drivers.ovn.mech_driver.mech_driver.OVNMechanismDriver._process_sg_rule_after_del_notification-316503 security_group_rule, after_delete: KeyError: 'rule'
May 02 11:29:20.790754 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager Traceback (most recent call last):
May 02 11:29:20.790754 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/dist-packages/neutron_lib/callbacks/manager.py", line 181, in _notify_loop
May 02 11:29:20.790754 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager callback(resource, event, trigger, payload=payload)
May 02 11:29:20.790754 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 463, in _process_sg_rule_after_del_notification
May 02 11:29:20.790754 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager sg_rule = payload.metadata['rule']
May 02 11:29:20.790754 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager KeyError: 'rule'
May 02 11:29:20.790754 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager

This happens because:

1) The patch mentioned above now publishes AFTER_DELETE events for security group rules that include the rule deleted: https://review.opendev.org/c/openstack/neutron/+/876716/5/neutron/db/securitygroups_db.py#293. The same patch added an event handler that expects to find the deleted security group rule in the payload: https://review.opendev.org/c/openstack/neutron/+/876716/5/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#459

2) However, previous to this patch, we were already publishing AFTER_DELETE events for security group rules that don't include the security group rule: https://github.com/openstack/neutron/blob/47d4ec4e99d5aae62656c88206eb6a77f70d4a8b/neutron/db/securitygroups_db.py#L901-L908. As a consequence, the new handler added by https://review.opendev.org/c/openstack/neutron/+/876716 produces the KeyError exception shown above when trying to access the security group rule in the payload when processing the preexisting event.

The second type of traceback is:

May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager [None req-4f41bf6e-d3c6-42a8-a987-b062a9a6b375 admin admin] Error during notification for neutron.plugins.ml2.drivers.ovn.mech_driver.mech_driver.OVNMechanismDriver._process_sg_rule_after_del_notification-316503 security_group_rule, after_delete: neutron.extensions.securitygroup.SecurityGroupNotFound: Security group ae8107d1-0399-4491-8535-4aee2c9de5d0 does not exist
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager Traceback (most recent call last):
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/dist-packages/neutron_lib/callbacks/manager.py", line 181, in _notify_loop
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager callback(resource, event, trigger, payload=payload)
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 464, in _process_sg_rule_after_del_notification
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager self._ovn_client.delete_security_group_rule(context, sg_rule)
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py", line 2413, in delete_security_group_rule
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager self._process_security_group_rule(rule, is_add_acl=False)
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py", line 2401, in _process_security_group_rule
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager ovn_acl.update_acls_for_security_group(
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/opt/stack/neutron/neutron/common/ovn/acl.py", line 294, in update_acls_for_security_group
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager sg = plugin.get_security_group(admin_context, security_group_id)
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/dist-packages/neutron_lib/db/api.py", line 226, in wrapped
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager return f_with_retry(*args, **kwargs,
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/dist-packages/neutron_lib/db/api.py", line 142, in wrapped
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager setattr(e, '_RETRY_EXCEEDED', True)
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager self.force_reraise()
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager raise self.value
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/dist-packages/neutron_lib/db/api.py", line 138, in wrapped
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager return f(*args, **kwargs)
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/dist-packages/oslo_db/api.py", line 154, in wrapper
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager ectxt.value = e.inner_exc
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager self.force_reraise()
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager raise self.value
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/dist-packages/oslo_db/api.py", line 142, in wrapper
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager return f(*args, **kwargs)
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/dist-packages/neutron_lib/db/api.py", line 190, in wrapped
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager context_reference.session.rollback()
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager self.force_reraise()
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager raise self.value
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/usr/local/lib/python3.8/dist-packages/neutron_lib/db/api.py", line 184, in wrapped
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager return f(*dup_args, **dup_kwargs)
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/opt/stack/neutron/neutron/db/securitygroups_db.py", line 209, in get_security_group
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager ret = self._make_security_group_dict(self._get_security_group(
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager File "/opt/stack/neutron/neutron/db/securitygroups_db.py", line 226, in _get_security_group
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager raise ext_sg.SecurityGroupNotFound(id=id)
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager neutron.extensions.securitygroup.SecurityGroupNotFound: Security group ae8107d1-0399-4491-8535-4aee2c9de5d0 does not exist
May 02 11:29:23.889104 np0033912622 neutron-server[100796]: ERROR neutron_lib.callbacks.manager

This happens because the default security groups that we create for every project include two rules that point to the security group they belong to as the remote security group. As a consequence, when the event handler added by https://review.opendev.org/c/openstack/neutron/+/876716 tries to delete these type of rules, it doesn't find their security group, because it was deleted right before the event being handled was published: https://review.opendev.org/c/openstack/neutron/+/876716/5/neutron/db/securitygroups_db.py#281.

For now, I have pushed a test patch that fixes both tracebacks: https://review.opendev.org/c/openstack/neutron/+/883047. I will report the result of zuul in a follow up comment.

However, in revisiting https://review.opendev.org/c/openstack/neutron/+/876716 and the associated bug https://bugs.launchpad.net/neutron/+bug/2008712, a few questions came to my mind:

1) In https://bugs.launchpad.net/neutron/+bug/2008712, it is reported that the bug affected the OVS mechanism driver and not OVN's. Why did we end up fixing OVN's? In fact, PS1 attempted to fix the bug for OVS here https://review.opendev.org/c/openstack/neutron/+/876716/1/neutron/api/rpc/handlers/securitygroups_rpc.py. Why was this approach abandoned?

2) It seems to me that in https://review.opendev.org/c/openstack/neutron/+/876716 we violate the semantics of the AFTER_DELETE event for security group rules. We delete a security group and yet we publish a security group rule deletion event. Even more, the event carries in its payload the deleted rule, when this type of event normally doesn't carry the deleted resource, because it was already deleted. In fact, https://review.opendev.org/c/openstack/neutron/+/876716/1/neutron/api/rpc/handlers/securitygroups_rpc.py tried to fix the bug processing a security group AFTER_DELETE event, which seems logically consistent to me. Again, why did we abandon this approach?

Revision history for this message
Miguel Lavalle (minsel) wrote :

Since all the recent executions in https://zuul.opendev.org/t/openstack/builds?job_name=neutron-tempest-plugin-ovn&project=openstack%2Fneutron&branch=master&pipeline=gate&result=SUCCESS&skip=0 contain the problem described in this bug, I'm marking it as confirmed

Changed in neutron:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Miguel Lavalle (minsel)
Revision history for this message
Miguel Lavalle (minsel) wrote (last edit ):

Yes, https://review.opendev.org/c/openstack/neutron/+/883047 remobe both types of tracebacks as is

Revision history for this message
Lajos Katona (lajos-katona) wrote :

Hmmm, this is my leftover, thanks for the bug

Revision history for this message
Lajos Katona (lajos-katona) wrote :

I try to answer the questions.

1.) Why I changed the OVN s-g code? When we ended in the final version of the OVS fix, the idea was to make it work as the OVS version works. see my comment: https://review.opendev.org/c/openstack/neutron/+/876716/comment/6f29dbf1_f568eb4c/

2.) True, I have to check why PS1 was not ok.

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
yatin (yatinkarel) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/883582
Committed: https://opendev.org/openstack/neutron/commit/ebc0658d5566ce527cb1104968d247db10edf3db
Submitter: "Zuul (22348)"
Branch: master

commit ebc0658d5566ce527cb1104968d247db10edf3db
Author: Brian Haley <email address hidden>
Date: Fri May 19 14:45:53 2023 +0000

    Revert "Delete sg rule which remote is the deleted sg"

    This reverts commit 63584957203ec9f5ba165177978213c3909f81f0.

    Reason for revert: This is generating a lot of
    "SecurityGroupNotFound" errors in neutron-server.log in
    the tempest-integrated-networking job.

    Closes-Bug: #2019449
    Related-Bug: #2008712
    Change-Id: I077fe87435f61bd29d5c1efc979c2adebca26181

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/neutron/+/883927

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/neutron/+/884205

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/884206

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron/+/884207

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/883927
Committed: https://opendev.org/openstack/neutron/commit/d3fb57617d56acb312cef18d32881dd608967313
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit d3fb57617d56acb312cef18d32881dd608967313
Author: Brian Haley <email address hidden>
Date: Fri May 19 14:45:53 2023 +0000

    Revert "Delete sg rule which remote is the deleted sg"

    This reverts commit 63584957203ec9f5ba165177978213c3909f81f0.

    Reason for revert: This is generating a lot of
    "SecurityGroupNotFound" errors in neutron-server.log in
    the tempest-integrated-networking job.

    Closes-Bug: #2019449
    Related-Bug: #2008712
    Change-Id: I077fe87435f61bd29d5c1efc979c2adebca26181
    (cherry picked from commit ebc0658d5566ce527cb1104968d247db10edf3db)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/884205
Committed: https://opendev.org/openstack/neutron/commit/8bf6f7f03c04d90d6953d9cd1a286d82a01fc040
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 8bf6f7f03c04d90d6953d9cd1a286d82a01fc040
Author: Brian Haley <email address hidden>
Date: Fri May 19 14:45:53 2023 +0000

    Revert "Delete sg rule which remote is the deleted sg"

    This reverts commit 63584957203ec9f5ba165177978213c3909f81f0.

    Reason for revert: This is generating a lot of
    "SecurityGroupNotFound" errors in neutron-server.log in
    the tempest-integrated-networking job.

    Closes-Bug: #2019449
    Related-Bug: #2008712
    Change-Id: I077fe87435f61bd29d5c1efc979c2adebca26181
    (cherry picked from commit ebc0658d5566ce527cb1104968d247db10edf3db)
    (cherry picked from commit d3fb57617d56acb312cef18d32881dd608967313)

tags: added: in-stable-zed
tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/884206
Committed: https://opendev.org/openstack/neutron/commit/8d30f9223a56f34822494dc45c4beb3f0fcf16c4
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 8d30f9223a56f34822494dc45c4beb3f0fcf16c4
Author: Brian Haley <email address hidden>
Date: Fri May 19 14:45:53 2023 +0000

    Revert "Delete sg rule which remote is the deleted sg"

    This reverts commit 63584957203ec9f5ba165177978213c3909f81f0.

    Reason for revert: This is generating a lot of
    "SecurityGroupNotFound" errors in neutron-server.log in
    the tempest-integrated-networking job.

    Closes-Bug: #2019449
    Related-Bug: #2008712
    Change-Id: I077fe87435f61bd29d5c1efc979c2adebca26181
    (cherry picked from commit ebc0658d5566ce527cb1104968d247db10edf3db)
    (cherry picked from commit d3fb57617d56acb312cef18d32881dd608967313)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/884207
Committed: https://opendev.org/openstack/neutron/commit/20172ca6dc5dac5792cb4dca710dc078cc061d08
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 20172ca6dc5dac5792cb4dca710dc078cc061d08
Author: Brian Haley <email address hidden>
Date: Fri May 19 14:45:53 2023 +0000

    Revert "Delete sg rule which remote is the deleted sg"

    This reverts commit 63584957203ec9f5ba165177978213c3909f81f0.

    Reason for revert: This is generating a lot of
    "SecurityGroupNotFound" errors in neutron-server.log in
    the tempest-integrated-networking job.

    Closes-Bug: #2019449
    Related-Bug: #2008712
    Change-Id: I077fe87435f61bd29d5c1efc979c2adebca26181
    (cherry picked from commit ebc0658d5566ce527cb1104968d247db10edf3db)
    (cherry picked from commit d3fb57617d56acb312cef18d32881dd608967313)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/884505

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 22.0.1

This issue was fixed in the openstack/neutron 22.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 21.1.1

This issue was fixed in the openstack/neutron 21.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 20.3.1

This issue was fixed in the openstack/neutron 20.3.1 release.

Changed in neutron:
status: Fix Released → In Progress
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/887692
Committed: https://opendev.org/openstack/neutron/commit/9e79bd7ff4be716ce7b1d5b14fee1212b42dfde1
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 9e79bd7ff4be716ce7b1d5b14fee1212b42dfde1
Author: elajkat <email address hidden>
Date: Fri May 26 12:04:49 2023 +0200

    Delete sg rule which remote is the deleted sg

    Based on bug #2008712 if we have a security-group which
    is the remote group of a 2nd security-group, the backend
    never deletes the rule of the 2nd group which
    remote_group_id is the original security-group.
    By AFTER_DELETE event for each rule that has the
    security_group_id as remote_group_id, we can make the
    mech drivers do their work and delete these rules in the
    backend.

    One version of this fix was merged:
    https://review.opendev.org/q/I207ecf7954b06507e03cb16b502ceb6e2807e0e7
    and reverted due to #2019449:
    https://review.opendev.org/q/I077fe87435f61bd29d5c1efc979c2adebca26181

    This patch is based on
    https://review.opendev.org/c/openstack/neutron/+/876716/1

    Conflicts in (due to notify to publish changes):
    * neutron/db/securitygroups_db.py
    * neutron/tests/unit/db/test_securitygroups_db.py

    Closes-Bug: #2008712
    Related-Bug: #2019449
    Change-Id: I9e8ddfa26c5402fefd573b0e2ea5f3a57983ca35
    (cherry picked from commit 67a0b0728788207cee27adb586880fabb8da6f25)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 23.0.0.0b3

This issue was fixed in the openstack/neutron 23.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron xena-eom

This issue was fixed in the openstack/neutron xena-eom release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.