[ovn] ml2/ovn may time out connecting to ovsdb server and stays dead in the water

Bug #1926653 reported by Flavio Fernandes
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Flavio Fernandes

Bug Description

Right now, the IDL connections between ml2/ovn are not resilient
enough when connecting. It doesn't make sense to give up on that
since the ml2/ovn is useless w/out that access.

If ovsdb-server is slow and takes more than timeout seconds, everything
reconnecting after partial downloads and starting over is not going to
make things better. That is particularly likely to happen when the OVN
DB is very large.

This work is also tracked under Bugzilla:
https://bugzilla.redhat.com/1955271

Changed in neutron:
assignee: nobody → Flavio Fernandes (ffernand)
Changed in neutron:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/788535
Committed: https://opendev.org/openstack/neutron/commit/9c3c718e244c60effdbdfb62628456bc9a6a5add
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 9c3c718e244c60effdbdfb62628456bc9a6a5add
Author: Terry Wilson <email address hidden>
Date: Wed Apr 21 16:35:22 2021 +0000

    Don't ever give up trying to connect to OVN DBs

    It doesn't really make since to give up connectimg to the OVN DBs
    since we can't do anything without them. If ovsdb-server is slow
    and takes more than timeout seconds, everything reconnecting after
    partial downloads and starting over is not going to make things
    better.

    We can change the behavior in ovsdbapp, but doing it without
    making non-backportable API changes isn't easily doable.
    The plan would be, after merging this back through stable, to
    modify ovsdbapp to allow setting different connection and txn
    timeouts and being able to disable the timeout in wait_for_change
    with a value of -1. Then in the main branch of neutron we can
    use that going forward.

    Closes-Bug: #1926653

    Change-Id: Ia9e23113fdeebf0b99085da200c3d61b71567d36
    (cherry picked from commit 39ccc0d6d6995b8d50baecc523dd30be034669a9)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/788534
Committed: https://opendev.org/openstack/neutron/commit/ddf20886f86b3b9414e8a2a1144b0442b6ab9000
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit ddf20886f86b3b9414e8a2a1144b0442b6ab9000
Author: Terry Wilson <email address hidden>
Date: Wed Apr 21 16:35:22 2021 +0000

    Don't ever give up trying to connect to OVN DBs

    It doesn't really make since to give up connectimg to the OVN DBs
    since we can't do anything without them. If ovsdb-server is slow
    and takes more than timeout seconds, everything reconnecting after
    partial downloads and starting over is not going to make things
    better.

    We can change the behavior in ovsdbapp, but doing it without
    making non-backportable API changes isn't easily doable.
    The plan would be, after merging this back through stable, to
    modify ovsdbapp to allow setting different connection and txn
    timeouts and being able to disable the timeout in wait_for_change
    with a value of -1. Then in the main branch of neutron we can
    use that going forward.

    Closes-Bug: #1926653

    Change-Id: Ia9e23113fdeebf0b99085da200c3d61b71567d36
    (cherry picked from commit 39ccc0d6d6995b8d50baecc523dd30be034669a9)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/788596
Committed: https://opendev.org/openstack/neutron/commit/f7292de52ebc4aa2673187bc645e448b72b33e6a
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit f7292de52ebc4aa2673187bc645e448b72b33e6a
Author: Terry Wilson <email address hidden>
Date: Wed Apr 21 16:35:22 2021 +0000

    Don't ever give up trying to connect to OVN DBs

    It doesn't really make since to give up connectimg to the OVN DBs
    since we can't do anything without them. If ovsdb-server is slow
    and takes more than timeout seconds, everything reconnecting after
    partial downloads and starting over is not going to make things
    better.

    We can change the behavior in ovsdbapp, but doing it without
    making non-backportable API changes isn't easily doable.
    The plan would be, after merging this back through stable, to
    modify ovsdbapp to allow setting different connection and txn
    timeouts and being able to disable the timeout in wait_for_change
    with a value of -1. Then in the main branch of neutron we can
    use that going forward.

    Closes-Bug: #1926653

    Conflicts:
    neutron/tests/functional/plugins/ml2/drivers/ovn/mech_driver/ovsdb/test_impl_idl.py

    Change-Id: Ia9e23113fdeebf0b99085da200c3d61b71567d36
    (cherry picked from commit 39ccc0d6d6995b8d50baecc523dd30be034669a9)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 16.3.2

This issue was fixed in the openstack/neutron 16.3.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 17.1.2

This issue was fixed in the openstack/neutron 17.1.2 release.

Changed in neutron:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 18.1.0

This issue was fixed in the openstack/neutron 18.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn train-eol

This issue was fixed in the openstack/networking-ovn train-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.