Neutron API is not available when a specific ovn-central unit is down
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Neutron API Charm |
New
|
Undecided
|
Unassigned |
Bug Description
Neutron API is not available when a specific ovn-central unit is down. Running any network-related CLI commands fails as follows:
```
$ openstack network list
HttpException: 503: Server Error for url: https:/
```
Affected OpenStack release: Focal-Yoga.
The problem is not reproducible on Focal-Ussuri.
Steps to reproduce:
1. Find out what is the first IP address of the ovn-central unit that you are going to shut down in the next step. Important: this must be a first IP address in the list.
```
$ juju ssh neutron-api/0 sudo grep ovn_nb_connection /etc/neutron/
ovn_nb_connection = ssl:172.
```
In the above example, the first IP is 172.27.81.187.
2. Shut down the ovn-central unit that holds the IP from the previous step.
3. Restart neutron-server service on neutron-api unit
```
juju ssh neutron-api/0 'sudo systemctl restart neutron-server'
```
4. Access Neutron API and confirm it is not available now. This is not an expected outcome. The expected behavior is Neutron API still being available.
```
$ openstack network list
HttpException: 503: Server Error for url: https:/
```
As a validation that the problem is only showing up when the first of the ovn-central units listed in ml2_conf.ini is down, run the following steps:
1. Move the IP address of downed ovn-central unit to the end of the list in ml2_conf.ini on neutron-api unit, for example:
BEFORE:
ovn_nb_connection = ssl:172.
[...]
ovn_sb_connection = ssl:172.
AFTER:
ovn_nb_connection = ssl:172.
[...]
ovn_sb_connection = ssl:172.
Then restart neutron-server service on the neutron-api unit. The problem should now go away and the Neutron API is now available.
Attached please see the neutron-server.log from the neutron-api unit. Look for the message "Unrecoverable error: please check log for details.: ValueError: non-zero flags not allowed in calls to send() on <class 'eventlet.
I'm subscribing field-critical as this problem is encountered on a customer deployment. Customer shuts down random control node (one of three) and expects that the OpenStack API is still available. The issue is blocking the handover of the cloud.