Failing over OVN dbs can cause original controller to permanently lose connection

Bug #1930926 reported by Terry Wilson
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Terry Wilson

Bug Description

When failing over OVN DB servers from one server to another, the server which originally hosts the VIP doesn't notice the connection is gone and doesn't reconnect without restarting the neutron api service.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/794892

Changed in neutron:
status: New → In Progress
Hongbin Lu (hongbin.lu)
tags: added: ovn
Changed in neutron:
importance: Undecided → Medium
Changed in neutron:
assignee: nobody → Terry Wilson (otherwiseguy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/794892
Committed: https://opendev.org/openstack/neutron/commit/65cce351d74a9a637fdb2a9d5e0e63445dda9ea9
Submitter: "Zuul (22348)"
Branch: master

commit 65cce351d74a9a637fdb2a9d5e0e63445dda9ea9
Author: Terry Wilson <email address hidden>
Date: Fri Jun 4 19:47:36 2021 +0000

    Use TCP keepalives for ovsdb connections

    When failing over OVN DB servers from one server to another, the
    server which originally hosted the VIP doesn't notice the connection
    is gone and doesn't reconnect. Ultimately, this is something that
    needs to be fixed in python-ovs, but setting the SO_KEEPALIVE socket
    option avoids the issue. This also has the benefit that the client
    doesn't need to send 'echo' requests, which can time out on an
    overloaded ovsdb-server, which causes a disconnection which then
    adds even more load on the ovsdb-server as it has to send the entire
    db contents over the wire after the connection.

    Closes-Bug: #1930926
    Change-Id: Ie0205785cab307c132fbe409588739685cade7c0

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/795472

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/795613

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/795614

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/795472
Committed: https://opendev.org/openstack/neutron/commit/f1cd2a1cb8993fce20464f474c0b9048600e1eac
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit f1cd2a1cb8993fce20464f474c0b9048600e1eac
Author: Terry Wilson <email address hidden>
Date: Fri Jun 4 19:47:36 2021 +0000

    Use TCP keepalives for ovsdb connections

    When failing over OVN DB servers from one server to another, the
    server which originally hosted the VIP doesn't notice the connection
    is gone and doesn't reconnect. Ultimately, this is something that
    needs to be fixed in python-ovs, but setting the SO_KEEPALIVE socket
    option avoids the issue. This also has the benefit that the client
    doesn't need to send 'echo' requests, which can time out on an
    overloaded ovsdb-server, which causes a disconnection which then
    adds even more load on the ovsdb-server as it has to send the entire
    db contents over the wire after the connection.

    Closes-Bug: #1930926
    Change-Id: Ie0205785cab307c132fbe409588739685cade7c0
    (cherry picked from commit 65cce351d74a9a637fdb2a9d5e0e63445dda9ea9)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/795613
Committed: https://opendev.org/openstack/neutron/commit/3fb7d0b34aef0d99d06ca86324c76c4e2a793b26
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 3fb7d0b34aef0d99d06ca86324c76c4e2a793b26
Author: Terry Wilson <email address hidden>
Date: Fri Jun 4 19:47:36 2021 +0000

    Use TCP keepalives for ovsdb connections

    When failing over OVN DB servers from one server to another, the
    server which originally hosted the VIP doesn't notice the connection
    is gone and doesn't reconnect. Ultimately, this is something that
    needs to be fixed in python-ovs, but setting the SO_KEEPALIVE socket
    option avoids the issue. This also has the benefit that the client
    doesn't need to send 'echo' requests, which can time out on an
    overloaded ovsdb-server, which causes a disconnection which then
    adds even more load on the ovsdb-server as it has to send the entire
    db contents over the wire after the connection.

    Closes-Bug: #1930926
    Change-Id: Ie0205785cab307c132fbe409588739685cade7c0
    (cherry picked from commit 65cce351d74a9a637fdb2a9d5e0e63445dda9ea9)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/795614
Committed: https://opendev.org/openstack/neutron/commit/073d6d7eeaf1b4ae0bd0c326956888eaaed168f7
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 073d6d7eeaf1b4ae0bd0c326956888eaaed168f7
Author: Terry Wilson <email address hidden>
Date: Fri Jun 4 19:47:36 2021 +0000

    Use TCP keepalives for ovsdb connections

    When failing over OVN DB servers from one server to another, the
    server which originally hosted the VIP doesn't notice the connection
    is gone and doesn't reconnect. Ultimately, this is something that
    needs to be fixed in python-ovs, but setting the SO_KEEPALIVE socket
    option avoids the issue. This also has the benefit that the client
    doesn't need to send 'echo' requests, which can time out on an
    overloaded ovsdb-server, which causes a disconnection which then
    adds even more load on the ovsdb-server as it has to send the entire
    db contents over the wire after the connection.

    Conflicts: neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py

    Closes-Bug: #1930926
    Change-Id: Ie0205785cab307c132fbe409588739685cade7c0
    (cherry picked from commit 65cce351d74a9a637fdb2a9d5e0e63445dda9ea9)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 16.4.0

This issue was fixed in the openstack/neutron 16.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 17.2.0

This issue was fixed in the openstack/neutron 17.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 18.1.0

This issue was fixed in the openstack/neutron 18.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 19.0.0.0rc1

This issue was fixed in the openstack/neutron 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn train-eol

This issue was fixed in the openstack/networking-ovn train-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.