[ovn] Using ovsdb-client for MAC_Binding could theoretically block indefinitely

Bug #1948891 reported by Terry Wilson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Unassigned

Bug Description

As a workaround for saving massive amounts of memory by not monitoring the MAC_Binding table, we recently switched to using ovsdb-client for deleting FIP MAC_Binding entries. ovsdb-client will open a new connection every time it is called. There are some cases where it might block the worker longer than we would expect.

1) If ovsdb-server is busy and the connection takes a long time, ovsdb-client will wait up to 2 minutes before giving up. (tested by just setting iptables to drop packets to ovsdb-server port)

2) If ovsdb-server connects, but just doesn't respond (maybe a cable is cut), ovsdb-client will wait forever for the response. (tested by just having nc listen on the ovsdb-server port)

A quick workaround would be to pass --timeout to ovsdb-client, possibly with the ovsdb_connection_timeout. There is also a daemon mode for ovsdb-client.

I also have a patch [1] waiting in python-ovs that will allow us to use the existing Idl connection to craft arbitrary transact operations like we pass to ovsdb-client. This will lose the overhead of making a new connection with each FIP delete.

[1] https://patchwork.ozlabs<email address hidden>/

tags: added: ovn
Changed in neutron:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/816698

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/816698
Committed: https://opendev.org/openstack/neutron/commit/7874c576013928c036dca4d9c0a38e5b8ae06bb4
Submitter: "Zuul (22348)"
Branch: master

commit 7874c576013928c036dca4d9c0a38e5b8ae06bb4
Author: Daniel Alvarez Sanchez <email address hidden>
Date: Thu Nov 4 14:34:32 2021 +0100

    [ovn] Add timeout option to ovsdb-client command

    Today, we invoke ovsdb-client to cleanup the MAC_Binding entries
    without specifying any timeout. This can lead to workers blocking
    forever if there's an issue with the connection to the server.

    This patch is adding a timeout parameter to the command line to
    prevent this condition.

    Closes-Bug: #1948891
    Related-Bug: #1946318

    Signed-off-by: Daniel Alvarez Sanchez <email address hidden>
    Change-Id: Id393cbec31dd64a795e85d756b7b843c9dfc59f3

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron/+/818796

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/818799

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/818800

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/818801

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/818796
Committed: https://opendev.org/openstack/neutron/commit/6b270bd6f86784e16c9d79339d6726b74a75ca70
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 6b270bd6f86784e16c9d79339d6726b74a75ca70
Author: Daniel Alvarez Sanchez <email address hidden>
Date: Thu Nov 4 14:34:32 2021 +0100

    [ovn] Add timeout option to ovsdb-client command

    Today, we invoke ovsdb-client to cleanup the MAC_Binding entries
    without specifying any timeout. This can lead to workers blocking
    forever if there's an issue with the connection to the server.

    This patch is adding a timeout parameter to the command line to
    prevent this condition.

    Closes-Bug: #1948891
    Related-Bug: #1946318

    Signed-off-by: Daniel Alvarez Sanchez <email address hidden>
    Change-Id: Id393cbec31dd64a795e85d756b7b843c9dfc59f3
    (cherry picked from commit 7874c576013928c036dca4d9c0a38e5b8ae06bb4)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/818800
Committed: https://opendev.org/openstack/neutron/commit/37333d3788dc0ab3ba510d9c655b9e8d1fc1a7ff
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 37333d3788dc0ab3ba510d9c655b9e8d1fc1a7ff
Author: Daniel Alvarez Sanchez <email address hidden>
Date: Thu Nov 4 14:34:32 2021 +0100

    [ovn] Add timeout option to ovsdb-client command

    Today, we invoke ovsdb-client to cleanup the MAC_Binding entries
    without specifying any timeout. This can lead to workers blocking
    forever if there's an issue with the connection to the server.

    This patch is adding a timeout parameter to the command line to
    prevent this condition.

    Closes-Bug: #1948891
    Related-Bug: #1946318

    Conflicts:
      neutron/tests/unit/plugins/ml2/drivers/ovn/mech_driver/test_mech_driver.py

    Signed-off-by: Daniel Alvarez Sanchez <email address hidden>
    Change-Id: Id393cbec31dd64a795e85d756b7b843c9dfc59f3
    (cherry picked from commit 7874c576013928c036dca4d9c0a38e5b8ae06bb4)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/818799
Committed: https://opendev.org/openstack/neutron/commit/f37e0be349f3492672badd43af05b5952610c40d
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit f37e0be349f3492672badd43af05b5952610c40d
Author: Daniel Alvarez Sanchez <email address hidden>
Date: Thu Nov 4 14:34:32 2021 +0100

    [ovn] Add timeout option to ovsdb-client command

    Today, we invoke ovsdb-client to cleanup the MAC_Binding entries
    without specifying any timeout. This can lead to workers blocking
    forever if there's an issue with the connection to the server.

    This patch is adding a timeout parameter to the command line to
    prevent this condition.

    Closes-Bug: #1948891
    Related-Bug: #1946318

    Signed-off-by: Daniel Alvarez Sanchez <email address hidden>
    Change-Id: Id393cbec31dd64a795e85d756b7b843c9dfc59f3
    (cherry picked from commit 7874c576013928c036dca4d9c0a38e5b8ae06bb4)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/818801
Committed: https://opendev.org/openstack/neutron/commit/15e2da82c27f64b0fabcd6b4d8b33be9da769ac9
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 15e2da82c27f64b0fabcd6b4d8b33be9da769ac9
Author: Daniel Alvarez Sanchez <email address hidden>
Date: Thu Nov 4 14:34:32 2021 +0100

    [ovn] Add timeout option to ovsdb-client command

    Today, we invoke ovsdb-client to cleanup the MAC_Binding entries
    without specifying any timeout. This can lead to workers blocking
    forever if there's an issue with the connection to the server.

    This patch is adding a timeout parameter to the command line to
    prevent this condition.

    Closes-Bug: #1948891
    Related-Bug: #1946318

    Conflicts:
        neutron/tests/unit/plugins/ml2/drivers/ovn/mech_driver/test_mech_driver.py

    Signed-off-by: Daniel Alvarez Sanchez <email address hidden>
    Change-Id: Id393cbec31dd64a795e85d756b7b843c9dfc59f3
    (cherry picked from commit 7874c576013928c036dca4d9c0a38e5b8ae06bb4)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 19.1.0

This issue was fixed in the openstack/neutron 19.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 17.3.0

This issue was fixed in the openstack/neutron 17.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 18.2.0

This issue was fixed in the openstack/neutron 18.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 20.0.0.0rc1

This issue was fixed in the openstack/neutron 20.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn train-eol

This issue was fixed in the openstack/networking-ovn train-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron ussuri-eol

This issue was fixed in the openstack/neutron ussuri-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.