[1.21] Kubernetes-masters on OpenStack using the integrator charm stuck in waiting for system pods on unsupported loadbalancer type

Bug #1926651 reported by Michael Skalka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Openstack Integrator Charm
New
Undecided
Unassigned

Bug Description

Bundle: https://pastebin.canonical.com/p/cmH39yjnHz/
Status: https://pastebin.canonical.com/p/7pFZxX6yPw/
Crashdump attached below.

Using the linked bundle a 1.21 kubernetes cluster was deployed on a Bionic Ussuri OpenStack with OVS networking. The integrator charm is configured with the correct network ids (post deployment) and has trust in juju. The relations are to the k8s master and worker units as we would have used in 1.20.

In this deployment the k8s-masters are waiting on 5 cubesystem pods to start, 3 of which are in crashloop backoff:

$ kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-f59cdf866-nx4rd 0/1 Pending 0 80m
coredns-6f867cd986-sxj2r 0/1 Pending 0 83m
csi-cinder-controllerplugin-0 0/6 Pending 0 83m
csi-cinder-nodeplugin-844qc 3/3 Running 0 81m
csi-cinder-nodeplugin-bfhmw 3/3 Running 0 81m
csi-cinder-nodeplugin-p4ns9 3/3 Running 0 81m
kube-state-metrics-7799879d89-mgslw 0/1 Pending 0 83m
metrics-server-v0.3.6-7d66499544-dv27f 0/2 Pending 0 83m
openstack-cloud-controller-manager-9r855 0/1 CrashLoopBackOff 28 81m
openstack-cloud-controller-manager-gbftr 0/1 CrashLoopBackOff 25 81m
openstack-cloud-controller-manager-x65k5 0/1 CrashLoopBackOff 24 81m

and checking into the openstack-cloud-controller-manager pods we can see that it is attempting to use haproxy as the LB type:

$ kubectl logs -n kube-system openstack-cloud-controller-manager-9r855

...
I0429 19:09:01.931787 1 serving.go:331] Generated self-signed cert in-memory
W0429 19:09:03.021162 1 client_config.go:614] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
W0429 19:09:03.027716 1 openstack.go:302] failed to read config: Unsupported LoadBalancer Provider: haproxy
F0429 19:09:03.027761 1 main.go:107] Cloud provider could not be initialized: could not init cloud provider "openstack": Unsupported LoadBalancer Provider: haproxy
...
My first theory was the integrator/master loadbalancer relation was missing, however upon adding it the charm goes into an error state:

$ juju status openstack-integrator
...
openstack-integrator/0* error idle 20 10.244.32.247 hook failed: "loadbalancer-relation-joined"
  canonical-livepatch/6 active idle 10.244.32.247 Running kernel 5.4.0-72.80-generic, patchState: nothing-to-apply (source version/commit 9eb41f7)
  filebeat/6 active idle 10.244.32.247 Filebeat ready.
  ntp/6 active idle 10.244.32.247 123/udp chrony: Ready
  telegraf/6 active idle 10.244.32.247 9103/tcp Monitoring openstack-integrator/0 (source version/commit dec0633)
...

And in the charm logs:

$ less /var/log/juju/unit-openstack-integrator-0.log
...
2021-04-29 19:12:02 WARNING loadbalancer-relation-joined Failed to discover available identity versions when contacting https://keystone.production.solutionsqa:5000/v3. Attempting to parse version from URL.
2021-04-29 19:12:02 WARNING loadbalancer-relation-joined SSL exception connecting to https://keystone.production.solutionsqa:5000/v3/auth/tokens: HTTPSConnectionPool(host='keystone.production.solutionsqa', port=5000): Max retries exceeded with url: /v3/auth/tokens (Caused by SSLError(SSLError("unable to load trusted certificates: Error([('system library', 'fopen',
 'Permission denied'), ('BIO routines', 'BIO_new_file', 'system lib'), ('x509 certificate routines', 'X509_load_cert_crl_file', 'system lib')],)",),))
2021-04-29 19:12:03 DEBUG jujuc server.go:211 running hook tool "juju-log" for openstack-integrator/0-loadbalancer-relation-joined-1186859103304395474
2021-04-29 19:12:03 ERROR juju-log loadbalancer:93: Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-openstack-integrator-0/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-openstack-integrator-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-openstack-integrator-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-openstack-integrator-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/reactive/openstack.py", line 127, in create_or_update_loadbalancers
    lb = layer.openstack.manage_loadbalancer(request.application_name,
  File "lib/charms/layer/openstack.py", line 151, in manage_loadbalancer
    subnet = config['lb-subnet'] or _default_subnet(members)
  File "lib/charms/layer/openstack.py", line 350, in _default_subnet
    for subnet_info in _openstack('subnet', 'list'):
  File "lib/charms/layer/openstack.py", line 312, in _openstack
    output = _run_with_creds('openstack', *args, '--format=yaml')
  File "lib/charms/layer/openstack.py", line 303, in _run_with_creds
    result = subprocess.run(args,
  File "/usr/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('openstack', 'subnet', 'list', '--format=yaml')' returned non-zero exit status 1.
...

The charm is attempting to reach the neutron endpoint but fails.

At his point I'm stumped.

Revision history for this message
Michael Skalka (mskalka) wrote :
Revision history for this message
Michael Skalka (mskalka) wrote :

Tagging this with field-critical. We cannot validate kubernetes on openstack for either our stable testing or any kubernetes or openstack release testing without this functionality.

Revision history for this message
Michael Skalka (mskalka) wrote :

Removing crit, this is a duplicate of bug#1922720, after reinstalling the openstackclients snap in `--devmode` on the integrator unit the pods eventually came alive.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.