Openstack integrator not sucessful in resizing load balancers to match current k8s workers

Bug #1897795 reported by Paul Goins
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Openstack Integrator Charm
Incomplete
Undecided
Unassigned

Bug Description

On a customer cloud, we're seeing a problem where the openstack-cloud-controller-manager pods appear to be trying to resize load balancers to target the current set of running K8s workers, but are somehow failing to do so.

Here is a concrete example, with customer-identifying information removed:

$ kubectl get events -n customer-namespace |grep UpdateLoadBalancerFailed | tail -n1
3m8s Warning UpdateLoadBalancerFailed service/customer-service Error updating load balancer with new hosts map[juju-123456-customer-k8s-1:{} juju-123456-customer-k8s-2:{} juju-123456-customer-k8s-3:{} juju-123456-customer-k8s-4:{} juju-123456-customer-k8s-5:{} juju-123456-customer-k8s-6:{} juju-123456-customer-k8s-7:{} juju-123456-customer-k8s-8:{} juju-123456-customer-k8s-9:{} juju-123456-customer-k8s-10:{} juju-123456-customer-k8s-11:{} juju-123456-customer-k8s-12:{}]: failed to find object

A "kubectl describe <service>" shows it is of type LoadBalancer, with something like this under Events:

Events:
  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Warning UpdateLoadBalancerFailed 3m33s (x4686 over 5d19h) service-controller Error updating load balancer with new hosts map[juju-123456-customer-k8s-1:{} juju-123456-customer-k8s-2:{} juju-123456-customer-k8s-3:{} juju-123456-customer-k8s-4:{} juju-123456-customer-k8s-5:{} juju-123456-customer-k8s-6:{} juju-123456-customer-k8s-7:{} juju-123456-customer-k8s-8:{} juju-123456-customer-k8s-9:{} juju-123456-customer-k8s-10:{} juju-123456-customer-k8s-11:{} juju-123456-customer-k8s-12:{}]: failed to find object

It appears that one of the openstack-cloud-controller-manager pods is generating this message.

"kubectl logs" from the pod in question look something like this:

E0929 19:41:26.551170 1 service_controller.go:667] failed to update load balancer hosts for service customer-namespace/interruption-service: failed to find object
I0929 19:41:26.551337 1 event.go:258] Event(v1.ObjectReference{Kind:"Service", Namespace:"customer-namespace", Name:"interruption-service", UID:"deadbeef-dead-beef-dead-beefdeadbeef", APIVersion:"v1", ResourceVersion:"87136827", FieldPath:""}): type: 'Warning' reason: 'UpdateLoadBalancerFailed' Error updating load balancer with new hosts map[juju-123456-customer-k8s-1:{} juju-123456-customer-k8s-2:{} juju-123456-customer-k8s-3:{} juju-123456-customer-k8s-4:{} juju-123456-customer-k8s-5:{} juju-123456-customer-k8s-6:{} juju-123456-customer-k8s-7:{} juju-123456-customer-k8s-8:{} juju-123456-customer-k8s-9:{} juju-123456-customer-k8s-10:{} juju-123456-customer-k8s-11:{} juju-123456-customer-k8s-12:{}]: failed to find object

"kubectl version" reports the following:

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-26T20:32:49Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.14", GitCommit:"d2a081c8e14e21e28fe5bdfa38a817ef9c0bb8e3", GitTreeState:"clean", BuildDate:"2020-08-26T20:35:13Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

The customer is running OpenStack Rocky with Octavia.

The K8s model is running cs:~containers/openstack-integrator-81.

Revision history for this message
George Kraft (cynerva) wrote :

> failed to find object

Looks like this error is defined here: https://github.com/kubernetes/cloud-provider-openstack/blob/v1.15.0/pkg/cloudprovider/providers/openstack/openstack.go#L67

I suspect it is being raised somewhere in here: https://github.com/kubernetes/cloud-provider-openstack/blob/v1.15.0/pkg/cloudprovider/providers/openstack/openstack_loadbalancer.go

Looking through the code for where that error can be raised, I have the following questions:

1. Can you confirm that the LoadBalancer exists in OpenStack? It should have a name like kube_service_kubernetes-********************************_<namespace>_<service>
2. Does the OpenStack LB have provisioning_status=ACTIVE?
3. Does the OpenStack LB have a floating IP?
4. Does the OpenStack LB have attached listeners? Do the listeners each have a single pool?
5. Does the OpenStack LB have members attached?
6. Are all master/worker instances located in the same OpenStack region?

Changed in charm-openstack-integrator:
status: New → Incomplete
Revision history for this message
Paul Goins (vultaire) wrote :

Unfortunately, while this issue was affecting this customer's cloud for the last several days, it appears that the integrator has started to be able to work. I do not know the root cause.

My original kubectl command, "kubectl get events -n customer-namespace |grep UpdateLoadBalancerFailed", now comes back clean. Doing a describe on the previously affected LBs no longer shows the errors either.

I'll keep an eye out for a recurrance, but for now, things seem OK.

Revision history for this message
Paul Goins (vultaire) wrote :

One last update here: I manually fixed a bunch of load balancers a day or two ago on this cloud, so I think most of the recoveries were basically no-ops since the LBs already contained the intended targets. However, while performing that maintenance I did observe several load balancers apparently self-adjust before I could perform their updates. So, I'm wondering if something had things "backed up" or timing out for some reason...

Regardless, I will keep an eye out for a recurrance.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.