OpenSSL.SSL.SysCallError: (111, 'ECONNREFUSED') and Connection thread stops
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
Fix Released
|
Undecided
|
Unassigned | ||
Ussuri |
Fix Released
|
Undecided
|
Unassigned | ||
Victoria |
Fix Released
|
Undecided
|
Unassigned | ||
ovsdbapp |
Fix Released
|
Undecided
|
Unassigned | ||
python-ovsdbapp (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
Unassigned | ||
Groovy |
Fix Released
|
Undecided
|
Unassigned | ||
Hirsute |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
If ovsdb-server is down for a while and we are connecting via SSL, python-ovs will raise
OpenSSL.
instead of just returning an error type. If this goes on for a bit, then the Connection thread will exit and be unrecoverable without restarting neutron-server.
+++++++
SRU:
[Impact]
Any intermittent connection issues between neutron-server and ovsdb nb/sb resulted in neutron-server not handling any more ovsdb transactions due to improper exception handling during reconnections. This further creates failures in post commit updates of resources and results in neutron/ovn db inconsistencies.
This fix catches the exceptions and retries to connect to ovsdb.
[Test plan]
* Deploy bionic-ussuri with neutron-server and ovn-central as HA using juju charms.
* Launch few instances and check if instances are in active state
* Simulated the network communication issues by modifying iptables related to ports 6641 6643 6644 16642
- On ovn-central/0, Dropping packets from ovn-central/2 and neutron-server/2
- On ovn-central/1, Dropping packets from ovn-central/2 and neutron-server/2
- On ovn-central/2, Dropping packets from ovn-central/0, ovn-central/1, neutron-server/0, neutron-server/1
DROP_PKTS_
DROP_PKTS_
for ip in $DROP_PKTS_
for ip in $DROP_PKTS_
* After a minute, drop the new REJECT rules added.
* Launch around 5 new VMs (5 to ensure some post creations to be landed on neutron-server/2) and look for Timeout Exceptions on neutron-server/2
If there are any Timeout exceptions, the neutron-server ovsdb connections are stale and not handling any more ovsdb transactions.
No Timeout exceptions and any port status updates from ovsdb implies neutron-server is successful in reconnection and started handling updates.
[Where problems could occur]
The fix passed the upstream zuul gates (tempest tests etc) and the patch just adds reconnection tries to ovsdbapp. The fix increases the reconnection attempts for every 4 minutes (3 min connection timeout + 1 min sleep) until the connection is successful. I dont see any regressions can happen with this change.
description: | updated |
tags: | added: sts |
Changed in python-ovsdbapp (Ubuntu Hirsute): | |
status: | New → Fix Released |
description: | updated |
Reviewed: https:/ /review. opendev. org/752092 /git.openstack. org/cgit/ openstack/ ovsdbapp/ commit/ ?id=83cf7aa6c81 f1b2341b2bba1fe 156047fa5d29f6
Committed: https:/
Submitter: Zuul
Branch: master
commit 83cf7aa6c81f1b2 341b2bba1fe1560 47fa5d29f6
Author: Terry Wilson <email address hidden>
Date: Tue Sep 15 13:42:08 2020 -0500
Don't give up when an Exception happens in idl.run
It's possible that idl.run() could have a bug where it raises an
Exception for an extended period of time while ovsdb-server is
down, but recover once ovsdb-server comes back up. Specifically,
python-ovs currently doesn't properly catch an exception when the
socket type is 'ssl' that it catches for other protocols.
Change-Id: Ia068650d2db3d5 d8642771a6df5a2 60d692aea20
Closes-Bug: #1895727