Failover of loadbalancer fails when Amphora master is missing

Bug #1899964 reported by Hemanth Nakkina on 2020-10-15
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Status tracked in Victoria
Train
High
Unassigned
Ussuri
High
Unassigned
Victoria
Undecided
Unassigned
octavia (Ubuntu)
Undecided
Unassigned
Focal
High
Unassigned
Groovy
Undecided
Unassigned

Bug Description

[Impact]
(from storyboard desciption) Currently if taskflow process is interrupted (during create/update/failover - node is rebooted or service is restarted) - loadbalancer will stuck in PENDING state.
Taskflow provides persistence module which allows to save flows state for recovery https://docs.openstack.org/taskflow/latest/user/persistence.html
Otherwise partially created/updated/deleted resources should be moved to ERROR state when service is up again. (like it is done in Cinder)

[Test Case]

* deploy Openstack with Octavia and 2 compute hosts e.g. ./generate-bundle.sh --use-stable-charms --release train --octavia --num-compute 3
* juju config octavia loadbalancer-topology=ACTIVE_STANDBY
* create ubuntu vm and install apache2 (i.e. listen port 80)
* create loadbalancer with vm as member and floating ip for LB vip
* test connection with: nc -vz LB_FIP 80
* openstack loadbalancer amphora list
* get amphora master vm uuid: openstack loadbalancer amphora show -c compute_id -f value <master>
* openstack server show -c "OS-EXT-SRV-ATTR:host" -f value <master uuid>
* poweroff compute host from previous step
* openstack loadbalancer failover LB_UUID
* wait a few seconds
* openstack loadbalancer amphora list
* Wait until you have one BACKUP and one MASTER
* Test connection with: nc -vz LB_FIP 80

[Regression Potential]
While new failovers have been proven to work properly, this will resolve existing failed failovers which will require setting the LB state from PENDING_UPDATE to ERROR in the database prior to triggering a new failover.

------------------------------------------------------------------------

Tried to failover a loadbalancer that has missing entries of amphora master.
The loadbalancer went to ERROR state.

OpenStack version: Train

The fix is available in upstream as part of the Octavia Failover refactor patches in Train
https://review.opendev.org/#/q/status:merged+project:openstack/octavia+branch:stable/train+topic:failover-refactor

Verified with the upstream patches and it worked.

Changed in octavia (Ubuntu):
status: New → Fix Released
Edward Hope-Morley (hopem) wrote :
Edward Hope-Morley (hopem) wrote :
Nicolas Bock (nicolasbock) wrote :

Verified with the following steps:

1. Create Train deployment with Octavia

./generate-bundle.sh --defaults --use-stable-charms --release train --octavia --num-compute 2

2. Configure Octavia

juju config octavia loadbalancer-topology=ACTIVE_STANDBY
juju config octavia spare-pool-size=2

2. Create Cirros VMs

./tools/instance_launch.sh 2 cirros

3. Create fake webserver on VMs (See https://code.launchpad.net/~nicolasbock/stsstack-bundles/+git/stsstack-bundles/+merge/392344)

./tools/run_fake_webserver.sh

4. Create load balancer

./tools/create_octavia_lb.sh

5. Test load balancer with

curl LB_FIP

6. Shut down a nova-compute that is hosting one of the amphorae. This
   will break the load balancer due to https://storyboard.openstack.org/#!/story/2003084

7. Install SRU in octavia unit

8. Check load balancer with

openstack loadbalancer list

9. Verify operation with step 5.

description: updated
description: updated
tags: added: sts-sru-needed
Corey Bryant (corey.bryant) wrote :

I'd like to see if we can pick these up in an upstream stable release so I've proposed new upstream releases at https://review.opendev.org/#/c/758606/.

Corey Bryant (corey.bryant) wrote :

octavia 6.1.0 and 5.0.3 will be included in the following SRUs for Ubuntu:
https:pad.lv/1900477
https:pad.lv/1900476

Changed in octavia (Ubuntu Focal):
status: New → Triaged
importance: Undecided → High
Changed in octavia (Ubuntu Focal):
status: Triaged → Fix Released
Corey Bryant (corey.bryant) wrote :

Marked as fix released since the point releases are released.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers