Can't create load balancer; stuck in "Pending create"

Bug #1914179 reported by Przemyslaw Hausman
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Octavia Charm
New
Undecided
Chris MacNaughton

Bug Description

Fresh OpenStack deployment: Focal+Ussuri+OVS
Disaggregated architecture
DVR, no neutron-gateway nodes
Juju version: 2.8.7-bionic-amd64

On a fresh OpenStack deployment, I can't create a load balancer.

Load balancer is stuck in "PENDING_CREATE" and eventually errors out. Amphora VMs are up and active but are repored as "BOOTING" in `openstack loadbalancer amphora list` output. Only two out of three octavia units have IP addresses assigned to o-hm0. The third unit does have the o-hm0 interface but no IP address assigned.

Octavia management network, security groups, image, flavor etc are created as described in attached octavia-post-deployment-configuration.txt.

I tried amphora images generated by octavia-diskimage-retrofit tool and downloaded from http://tarballs.openstack.org/octavia/test-images/test-only-amphora-x64-haproxy-ubuntu-bionic.qcow2. Both images give the same result.

Note that the bundle defines the following options for Octavia:
- loadbalancer-topology: ACTIVE_STANDBY
- spare-pool-size: 2

When I set topology to SINGLE and reset pool size to 0, and redeployed the bundle, I was able to create a load balancer.

Bundle, juju status and juju-crashdump: https://docs.google.com/document/d/1r6wkRch_s9PSqR2sTFkbiyTbaMW9xxTd-8HBMAw1wSg/edit?usp=sharing

We hit this bug yesterday in two separate deployments.

Revision history for this message
Przemyslaw Hausman (phausman) wrote :
Revision history for this message
Przemyslaw Hausman (phausman) wrote :

Subscribing field-critital. This is blocking two customer deployments.

Revision history for this message
Billy Olsen (billy-olsen) wrote :

Removing field-critical from this designation. The functionality is > 3 months old so it doesn't qualify for the field sla. Additionally, I believe there were workarounds that were provided and my understanding of the situation is that restarting one of the octavia units (or possibly even performing a down/up on one of the interfaces) works around the issues and allows it to be resolved.

The bug is still being looked at, but just not under field sla

Changed in charm-octavia:
assignee: nobody → Chris MacNaughton (chris.macnaughton)
Revision history for this message
Przemyslaw Hausman (phausman) wrote :

Restarting octavia leader unit (which is missing the IP on o-hm0) does in fact work this issue around. Thank you!

Also, when deployed the -next version of the octavia charm, the issue was not observed.

Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

I think we are seeing something similar with SQA on yoga with ovs:
https://solutions.qa.canonical.com/testruns/testRun/e5641713-02c8-4b32-bfec-28b81ad2e1a4

We can see in the tempest results that the load balancers are stuck with status Pending create:
https://oil-jenkins.canonical.com/artifacts/e5641713-02c8-4b32-bfec-28b81ad2e1a4/generated/generated/openstack/tempest_result.html

In the octavia health-manager logs we see this message repeated many times:
```
octavia/octavia-health-manager.log:2022-06-01 07:39:53.489 331227 WARNING octavia.controller.healthmanager.health_manager [-] Load balancer d2f20a0a-4bfd-4035-8dd8-cd1b4e089f66 is in immutable state PENDING_CREATE. Skipping failover.
```

Link to crashdumps for this run:
https://oil-jenkins.canonical.com/artifacts/e5641713-02c8-4b32-bfec-28b81ad2e1a4/index.html

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.