Missing ip rule causes FIP removal to fail

Bug #2030804 reported by Adam Oswick
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Brian Haley

Bug Description

Summary
-------
If the ip rule associated with a FIP is somehow lost or deleted, when Neutron L3 agent goes to remove the rule it will error and cause the entire FIP removal process to fail.

High level description
----------------------
Rather than erroring if an ip rule that should exist is no longer present, https://opendev.org/openstack/neutron/src/commit/c453813d0664259c4da0d132f224be2eebe70072/neutron/agent/l3/dvr_local_router.py#L216-L227 should handle this gracefully with a warning.

Pre-conditions
--------------
- Neutron DVR mode is enabled
- Subnets are created and attached to a router with an external gateway
- A VM is created on the aforementioned subnet and a FIP is associated with it

Step-by-step reproduction steps
-------------------------------
- Within the qrouter network namespace, run 'ip rule del $FIXED_IP lookup 16'
- Disassociate the FIP from the VM and monitor Neutron L3 agent logs for errors

Expected output
---------------
Neutron L3 agent logs that the ip rule didn't exist and then continues as normal.

Actual output
-------------
Neutron L3 agent throws an "pyroute2.netlink.exceptions.NetLinkError: (2, 'No such file or directory')" exception and does not complete FIP removal from the host.

Version
-------
- OpenStack Zed

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/890827

Changed in neutron:
status: New → In Progress
Revision history for this message
Adam Oswick (adamoswick) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/891236

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Adam <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/890827
Reason: Abandoned as Brian suggested an alternative on https://review.opendev.org/c/openstack/neutron/+/891236

Changed in neutron:
importance: Undecided → Medium
assignee: nobody → Brian Haley (brian-haley)
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

Why the "reproduction steps" includes a manual operation on the namespace? The question here is how this IP rule was deleted before the FIP is disassociated. Do you know the root reason? When did you find this problem? Can we reproduce this issue without manual steps?

Regards.

Revision history for this message
Adam Oswick (adamoswick) wrote :

Hi Rodolfo,

| Why the "reproduction steps" includes a manual operation on the namespace?

I've been unable to work out why the ip rule disappears (or isn't created in the first place). Deleting the ip rule therefore allows us to simulate the result even if we don't know the original root cause.

| Do you know the root reason?

Not at the moment.

| When did you find this problem?

We've been seeing this problem occasionally since we first built our clouds (Yoga iirc).

| Can we reproduce this issue without manual steps?

Not at the moment. We just have to wait for it to occur in our environments.

While obviously it would be good to find the root cause of the issue here (why is the ip rule deleted or missing in the first place), my thought was that as ip rules are not directly controlled by Neutron, it is better for Neutron to more gracefully handle scenarios where they have changed outside of its control.

At some point, I can have another go at trying to find the root cause of this issue. However, even with that identified and resolved, I still feel like it would be a good idea for Neutron to handle these missing ip rules (and other resources) more gracefully.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/891236
Committed: https://opendev.org/openstack/neutron/commit/16875b5f92731a9cf2d7e819d406bfcc442339f3
Submitter: "Zuul (22348)"
Branch: master

commit 16875b5f92731a9cf2d7e819d406bfcc442339f3
Author: Brian Haley <email address hidden>
Date: Fri Aug 11 17:05:49 2023 -0400

    Catch non-existent entry failures better in ip_lib

    The privileged/agent/linux/ip_lib.py code was not always
    catching "entry does not exist" type errors when deleting
    entries, and most of the callers were not catching it either,
    which could lead to random failures.

    Add code in the IP route, rule and bridge fdb code to catch
    these errors and not raise on them, other exceptions will
    still be raised.

    Also fixed delete_neigh_entry() to not raise when the
    given namespace does not exist to make it like all the
    other calls in the file.

    Added or modified functional tests for above cases.

    Change-Id: I083649ab1b9a9057ee276a7f3ba069eb667db870
    Closes-bug: #2030804

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2023.2)

Fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/neutron/+/901062

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/neutron/+/901063

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/neutron/+/901073

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/901074

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/901063
Committed: https://opendev.org/openstack/neutron/commit/87f7b9a46ca4cdb4dd09c1dd90b365689eaa49cb
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 87f7b9a46ca4cdb4dd09c1dd90b365689eaa49cb
Author: Brian Haley <email address hidden>
Date: Fri Aug 11 17:05:49 2023 -0400

    Catch non-existent entry failures better in ip_lib

    The privileged/agent/linux/ip_lib.py code was not always
    catching "entry does not exist" type errors when deleting
    entries, and most of the callers were not catching it either,
    which could lead to random failures.

    Add code in the IP route, rule and bridge fdb code to catch
    these errors and not raise on them, other exceptions will
    still be raised.

    Also fixed delete_neigh_entry() to not raise when the
    given namespace does not exist to make it like all the
    other calls in the file.

    Added or modified functional tests for above cases.

    Change-Id: I083649ab1b9a9057ee276a7f3ba069eb667db870
    Closes-bug: #2030804
    (cherry picked from commit 16875b5f92731a9cf2d7e819d406bfcc442339f3)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/901073
Committed: https://opendev.org/openstack/neutron/commit/61ad633a331bf72e4081e0a12abb4463478efa7c
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 61ad633a331bf72e4081e0a12abb4463478efa7c
Author: Brian Haley <email address hidden>
Date: Fri Aug 11 17:05:49 2023 -0400

    Catch non-existent entry failures better in ip_lib

    The privileged/agent/linux/ip_lib.py code was not always
    catching "entry does not exist" type errors when deleting
    entries, and most of the callers were not catching it either,
    which could lead to random failures.

    Add code in the IP route, rule and bridge fdb code to catch
    these errors and not raise on them, other exceptions will
    still be raised.

    Also fixed delete_neigh_entry() to not raise when the
    given namespace does not exist to make it like all the
    other calls in the file.

    Added or modified functional tests for above cases.

    Conflicts:
      neutron/privileged/agent/linux/ip_lib.py
      neutron/tests/unit/privileged/agent/linux/test_ip_lib.py

    Change-Id: I083649ab1b9a9057ee276a7f3ba069eb667db870
    Closes-bug: #2030804
    (cherry picked from commit 16875b5f92731a9cf2d7e819d406bfcc442339f3)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/901062
Committed: https://opendev.org/openstack/neutron/commit/9196b612de6010e5cadbcc68c4791855ad144885
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit 9196b612de6010e5cadbcc68c4791855ad144885
Author: Brian Haley <email address hidden>
Date: Fri Aug 11 17:05:49 2023 -0400

    Catch non-existent entry failures better in ip_lib

    The privileged/agent/linux/ip_lib.py code was not always
    catching "entry does not exist" type errors when deleting
    entries, and most of the callers were not catching it either,
    which could lead to random failures.

    Add code in the IP route, rule and bridge fdb code to catch
    these errors and not raise on them, other exceptions will
    still be raised.

    Also fixed delete_neigh_entry() to not raise when the
    given namespace does not exist to make it like all the
    other calls in the file.

    Added or modified functional tests for above cases.

    Change-Id: I083649ab1b9a9057ee276a7f3ba069eb667db870
    Closes-bug: #2030804
    (cherry picked from commit 16875b5f92731a9cf2d7e819d406bfcc442339f3)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/901074
Committed: https://opendev.org/openstack/neutron/commit/2f7ecb95139b75653fdb198c7047c46afc191bf0
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 2f7ecb95139b75653fdb198c7047c46afc191bf0
Author: Brian Haley <email address hidden>
Date: Fri Aug 11 17:05:49 2023 -0400

    Catch non-existent entry failures better in ip_lib

    The privileged/agent/linux/ip_lib.py code was not always
    catching "entry does not exist" type errors when deleting
    entries, and most of the callers were not catching it either,
    which could lead to random failures.

    Add code in the IP route, rule and bridge fdb code to catch
    these errors and not raise on them, other exceptions will
    still be raised.

    Also fixed delete_neigh_entry() to not raise when the
    given namespace does not exist to make it like all the
    other calls in the file.

    Added or modified functional tests for above cases.

    Conflicts:
      neutron/privileged/agent/linux/ip_lib.py

    Change-Id: I083649ab1b9a9057ee276a7f3ba069eb667db870
    Closes-bug: #2030804
    (cherry picked from commit 16875b5f92731a9cf2d7e819d406bfcc442339f3)

tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 24.0.0.0b1

This issue was fixed in the openstack/neutron 24.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron yoga-eom

This issue was fixed in the openstack/neutron yoga-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 21.2.1

This issue was fixed in the openstack/neutron 21.2.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.