[SRU] neutron ovsdbapp ssl connection stuck in OSError error loop

Bug #2018405 reported by Edward Hope-Morley
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openvswitch (Ubuntu)
Invalid
Undecided
Unassigned
Jammy
Fix Released
Undecided
Unassigned
Kinetic
Invalid
Undecided
Unassigned
Lunar
Invalid
Undecided
Unassigned
Mantic
Invalid
Undecided
Unassigned

Bug Description

[Impact]
terminated/closed ssl connections between openstack agents and the ovn-central databases are not properly handled and this patch fixes that.

[Test Plan]
* deploy Openstack with OVN
* abruptly terminate connections between neutron-server and/or neutron-ovn-metadata-agent and ovn-central NB/SB database servers.
* ensure that connections are re-established and the errors below no not appear.

[Regression Potential]
No regressions are anticipated as a result of using this patch.

---------------------------------------------------------------------

We are running Openstack Yoga on both Ubuntu Focal and Ubuntu Jammy and both environments are experiencing the following errors in neutron-server and neutron-ovn-metadata-agent:

2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection [req-302f6be6-5b72-49b4-9132-d9ec39370fd5 - - - - -] [Errno 107] Transport endpoint is not connected: OSError: [Errno 107] Transport endpoint is not connected
2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection Traceback (most recent call last):
2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/connection.py", line 108, in run
2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection self.idl.run()
2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/lib/python3/dist-packages/ovs/db/idl.py", line 433, in run
2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection self._session.run()
2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/lib/python3/dist-packages/ovs/jsonrpc.py", line 519, in run
2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection error = self.stream.connect()
2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/lib/python3/dist-packages/ovs/stream.py", line 824, in connect
2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection self.socket.do_handshake()
2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/lib/python3/dist-packages/eventlet/green/ssl.py", line 311, in do_handshake
2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection return self._call_trampolining(
2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/lib/python3/dist-packages/eventlet/green/ssl.py", line 157, in _call_trampolining
2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection return func(*a, **kw)
2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/lib/python3.10/ssl.py", line 1337, in do_handshake
2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection self._check_connected()
2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection File "/usr/lib/python3.10/ssl.py", line 1119, in _check_connected
2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection self.getpeername()
2023-05-02 00:01:05.026 2146189 ERROR ovsdbapp.backend.ovs_idl.connection OSError: [Errno 107] Transport endpoint is not connected

This reconnect error happens forever after an initial ssl disconnect like:

2023-04-12 19:15:40.281 2146189 ERROR ovsdbapp.backend.ovs_idl.transaction [req-7d895085-c4e1-43b9-a36f-808e86ef5caa f3797dd627d24377a7d0a1330595aed9 e4b04ca58d734ed0aa29e306adad4f79 - 93f9c955065f4cb38b9a8f1b98eedb92 93f9c955065f4cb38b9a8f1b98eedb92] Traceback (most recent
 call last):
  File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/connection.py", line 131, in run
    txn.results.put(txn.do_commit())
  File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 114, in do_commit
    self.api.idl.run()
  File "/usr/lib/python3/dist-packages/ovs/db/idl.py", line 433, in run
    self._session.run()
  File "/usr/lib/python3/dist-packages/ovs/jsonrpc.py", line 519, in run
    error = self.stream.connect()
  File "/usr/lib/python3/dist-packages/ovs/stream.py", line 824, in connect
    self.socket.do_handshake()
  File "/usr/lib/python3/dist-packages/eventlet/green/ssl.py", line 311, in do_handshake
    return self._call_trampolining(
  File "/usr/lib/python3/dist-packages/eventlet/green/ssl.py", line 157, in _call_trampolining
    return func(*a, **kw)
  File "/usr/lib/python3.10/ssl.py", line 1342, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLZeroReturnError: TLS/SSL connection has been closed (EOF) (_ssl.c:997)

Versions:
  focal:
    ii python3-openvswitch 2.17.2-0ubuntu0.22.04.2~cloud0 all Python 3 bindings for Open vSwitch
    ii python3-ovsdbapp 1.1.0-0ubuntu2 all library for creating OVSDB applications - Python 3.x
  jammy:
    ii python3-openvswitch 2.17.3-0ubuntu0.22.04.2 all Python 3 bindings for Open vSwitch
    ii python3-ovsdbapp 1.15.1-0ubuntu2 all library for creating OVSDB applications - Python 3.x

Tags: patch
Revision history for this message
Edward Hope-Morley (hopem) wrote :

This looks to have been recently fixed in https://github.com/openvswitch/ovs/commit/b456b1a02f629c2438ef2c3f247f35c8712f12c6 so need to get that backported to 2.17.x

Revision history for this message
Edward Hope-Morley (hopem) wrote :
Revision history for this message
Edward Hope-Morley (hopem) wrote :
Revision history for this message
Edward Hope-Morley (hopem) wrote :
Revision history for this message
Edward Hope-Morley (hopem) wrote :
summary: - neutron ovsdbapp ssl connection stuck in OSError error loop
+ [SRU] neutron ovsdbapp ssl connection stuck in OSError error loop
description: updated
Revision history for this message
Edward Hope-Morley (hopem) wrote :

the jammy sru is based on https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/2003060 which is yet to land.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "lp2018405-mantic.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

Thanks for the debdiffs Edward. The proposed changes look good in general, I'd like to ask you to add some DEP-3 headers [1] to your patches, such as Origin (link to the upstream commits) and Bug-Ubuntu (link to this bug). Could you please add them?

And a nitpick, could you please start your changelog entry with uppercase? Instead of "backport" use "Backport". This is just a very minor detail :)

[1] https://dep-team.pages.debian.net/deps/dep3/

Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

Moreover, I think you can try to improve your "Where problems could occur"/"Regression potential" section in the bug description. For instance, the patch is adding more exceptions to be handled, maybe another problem (not the one initially considered) might raise the same exception and it might be treated in an unexpected way. I am pretty sure upstream thought about that before merging the changes but this section is exactly to elaborate on all of that, and if a regression happens we can at least have a roughly idea of what happened. We wouldn't release an update if we think it could regress users, so this is more of an exercise of what could hypothetically happen.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

As I understand it there is an upcoming 2.17.7 release of openvswitch and that will contain this and https://github.com/openvswitch/ovs/commit/111c7be3193e15e2acf8af8ceb74a1177a95806d that we are also waiting for (although they are unrelated). Therefore we are ok to wait for that point release so that we only need upgrade once.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hi Edward,

Thanks for the reply. IIUC you won't be needing a sponsorship for this bug anymore, correct? If that's indeed the case, could you please unsubscribe ~ubuntu-sponsors from it? Otherwise, Lucas' comments/request still apply. Thanks!

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I unsubscribed ~ubuntu-sponsors.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Once 2.17.7 is released we will have this fix in Ubuntu - https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/2025323

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

Bug 2025323 is Fix Released on Jammy.
Updating this bug accordingly, per comment #13.

Changed in openvswitch (Ubuntu Jammy):
status: New → Fix Released
Changed in openvswitch (Ubuntu Kinetic):
status: New → Invalid
Changed in openvswitch (Ubuntu Lunar):
status: New → Invalid
Changed in openvswitch (Ubuntu Mantic):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.