neutron-server and metadata agent not reconnecting to OVSDB on ovsdb-server failover

Bug #1772656 reported by Daniel Alvarez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-ovn
Fix Released
High
Daniel Alvarez

Bug Description

When ovsdb-server is promoted to another controller, the connection to the VIP should be restarted by the clients (in this case neutron server and metadata agent). However, the disconnection is not detected and we're not getting any notifications from OVSDB.

For neutron-server when an API request comes in and a worker tries to execute the transaction, it will fail after ovsdb_connection_timeout seconds and a reconnection will take place for that worker.
In the case of the metadata agent, there's no event that triggers a reconnection so, for example, we wouldn't detect that a new VIF has been plugged to our chassis and won't provision metadata in that compute node. ie., we can't boot VM's after a failover.

A workaround we've found so far is to adjust the ovsdb_probe_interval to make the clients send probes to ovsdb server and detect the disconnection. Proper fix would be to detect the socket disconnection from OVS python IDL (ovn-controller detects the disconnection even with no probes at all).

Changed in networking-ovn:
assignee: nobody → Daniel Alvarez (dalvarezs)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (master)

Fix proposed to branch: master
Review: https://review.openstack.org/569977

Changed in networking-ovn:
status: New → In Progress
Changed in networking-ovn:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/569984

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.openstack.org/569977
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=90c2a1c26f5ec276a1154648538efd373e458afa
Submitter: Zuul
Branch: master

commit 90c2a1c26f5ec276a1154648538efd373e458afa
Author: Daniel Alvarez <email address hidden>
Date: Tue May 22 14:49:52 2018 +0200

    Set ovsdb probe interval to 1 minute on the client side

    When a failover of ovsdb-server occurs, neither neutron-server nor
    ovn-metadata-agent detects the disconnection. Even though this should
    be fixed in OVS Python IDL, this patch is adding a default probe
    interval from the client side to 60 seconds. This way, when the
    connection to the VIP is lost, clients will reconnect and the service
    will be back up again.

    Change-Id: Idec3344136a85d73969ed8e11d358b66c14d9a4d
    Closes-Bug: #1772656
    Signed-off-by: Daniel Alvarez <email address hidden>

Changed in networking-ovn:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (stable/queens)

Change abandoned by Daniel Alvarez (<email address hidden>) on branch: stable/queens
Review: https://review.openstack.org/569984

tags: added: networking-ovn-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/571802

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 5.0.0.0b2

This issue was fixed in the openstack/networking-ovn 5.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (stable/queens)

Reviewed: https://review.openstack.org/569984
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=7a262b6a455c3060f1e54077e8ce21cb4c724413
Submitter: Zuul
Branch: stable/queens

commit 7a262b6a455c3060f1e54077e8ce21cb4c724413
Author: Daniel Alvarez <email address hidden>
Date: Tue May 22 14:49:52 2018 +0200

    Set ovsdb probe interval to 1 minute on the client side

    When a failover of ovsdb-server occurs, neither neutron-server nor
    ovn-metadata-agent detects the disconnection. Even though this should
    be fixed in OVS Python IDL, this patch is adding a default probe
    interval from the client side to 60 seconds. This way, when the
    connection to the VIP is lost, clients will reconnect and the service
    will be back up again.

    Change-Id: Idec3344136a85d73969ed8e11d358b66c14d9a4d
    Closes-Bug: #1772656
    Signed-off-by: Daniel Alvarez <email address hidden>
    (cherry picked from commit 90c2a1c26f5ec276a1154648538efd373e458afa)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 4.0.3

This issue was fixed in the openstack/networking-ovn 4.0.3 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (stable/pike)

Change abandoned by Terry Wilson (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/571802
Reason: pike was so long ago

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.