Openstack Integrator Charm

Openstack integrator not sucessful in resizing load balancers to match current k8s workers

Bug #1897795 reported by Paul Goins on 2020-09-29

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Openstack Integrator Charm	Incomplete	Undecided	Unassigned

Bug Description

On a customer cloud, we're seeing a problem where the openstack-cloud-controller-manager pods appear to be trying to resize load balancers to target the current set of running K8s workers, but are somehow failing to do so.

Here is a concrete example, with customer-identifying information removed:

$ kubectl get events -n customer-namespace |grep UpdateLoadBalancerFailed | tail -n1
3m8s Warning UpdateLoadBalancerFailed service/customer-service Error updating load balancer with new hosts map[juju-123456-customer-k8s-1:{} juju-123456-customer-k8s-2:{} juju-123456-customer-k8s-3:{} juju-123456-customer-k8s-4:{} juju-123456-customer-k8s-5:{} juju-123456-customer-k8s-6:{} juju-123456-customer-k8s-7:{} juju-123456-customer-k8s-8:{} juju-123456-customer-k8s-9:{} juju-123456-customer-k8s-10:{} juju-123456-customer-k8s-11:{} juju-123456-customer-k8s-12:{}]: failed to find object

A "kubectl describe <service>" shows it is of type LoadBalancer, with something like this under Events:

Events:
  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Warning UpdateLoadBalancerFailed 3m33s (x4686 over 5d19h) service-controller Error updating load balancer with new hosts map[juju-123456-customer-k8s-1:{} juju-123456-customer-k8s-2:{} juju-123456-customer-k8s-3:{} juju-123456-customer-k8s-4:{} juju-123456-customer-k8s-5:{} juju-123456-customer-k8s-6:{} juju-123456-customer-k8s-7:{} juju-123456-customer-k8s-8:{} juju-123456-customer-k8s-9:{} juju-123456-customer-k8s-10:{} juju-123456-customer-k8s-11:{} juju-123456-customer-k8s-12:{}]: failed to find object

It appears that one of the openstack-cloud-controller-manager pods is generating this message.

"kubectl logs" from the pod in question look something like this:

E0929 19:41:26.551170 1 service_controller.go:667] failed to update load balancer hosts for service customer-namespace/interruption-service: failed to find object
I0929 19:41:26.551337 1 event.go:258] Event(v1.ObjectReference{Kind:"Service", Namespace:"customer-namespace", Name:"interruption-service", UID:"deadbeef-dead-beef-dead-beefdeadbeef", APIVersion:"v1", ResourceVersion:"87136827", FieldPath:""}): type: 'Warning' reason: 'UpdateLoadBalancerFailed' Error updating load balancer with new hosts map[juju-123456-customer-k8s-1:{} juju-123456-customer-k8s-2:{} juju-123456-customer-k8s-3:{} juju-123456-customer-k8s-4:{} juju-123456-customer-k8s-5:{} juju-123456-customer-k8s-6:{} juju-123456-customer-k8s-7:{} juju-123456-customer-k8s-8:{} juju-123456-customer-k8s-9:{} juju-123456-customer-k8s-10:{} juju-123456-customer-k8s-11:{} juju-123456-customer-k8s-12:{}]: failed to find object

"kubectl version" reports the following:

Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.8", GitCommit:"9f2892aab98fe339f3bd70e3c470144299398ace", GitTreeState:"clean", BuildDate:"2020-08-26T20:32:49Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.14", GitCommit:"d2a081c8e14e21e28fe5bdfa38a817ef9c0bb8e3", GitTreeState:"clean", BuildDate:"2020-08-26T20:35:13Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

The customer is running OpenStack Rocky with Octavia.

The K8s model is running cs:~containers/openstack-integrator-81.

Revision history for this message

George Kraft (cynerva) wrote on 2020-09-30:

> failed to find object

Looks like this error is defined here: https://github.com/kubernetes/cloud-provider-openstack/blob/v1.15.0/pkg/cloudprovider/providers/openstack/openstack.go#L67

I suspect it is being raised somewhere in here: https://github.com/kubernetes/cloud-provider-openstack/blob/v1.15.0/pkg/cloudprovider/providers/openstack/openstack_loadbalancer.go

Looking through the code for where that error can be raised, I have the following questions:

1. Can you confirm that the LoadBalancer exists in OpenStack? It should have a name like kube_service_kubernetes-********************************_<namespace>_<service>
2. Does the OpenStack LB have provisioning_status=ACTIVE?
3. Does the OpenStack LB have a floating IP?
4. Does the OpenStack LB have attached listeners? Do the listeners each have a single pool?
5. Does the OpenStack LB have members attached?
6. Are all master/worker instances located in the same OpenStack region?

Changed in charm-openstack-integrator:
status:	New → Incomplete

Revision history for this message

Paul Goins (vultaire) wrote on 2020-09-30:

Unfortunately, while this issue was affecting this customer's cloud for the last several days, it appears that the integrator has started to be able to work. I do not know the root cause.

My original kubectl command, "kubectl get events -n customer-namespace |grep UpdateLoadBalancerFailed", now comes back clean. Doing a describe on the previously affected LBs no longer shows the errors either.

I'll keep an eye out for a recurrance, but for now, things seem OK.

Revision history for this message

Paul Goins (vultaire) wrote on 2020-09-30:

One last update here: I manually fixed a bunch of load balancers a day or two ago on this cloud, so I think most of the recoveries were basically no-ops since the LBs already contained the intended targets. However, while performing that maintenance I did observe several load balancers apparently self-adjust before I could perform their updates. So, I'm wondering if something had things "backed up" or timing out for some reason...

Regardless, I will keep an eye out for a recurrance.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.