Bug #1959720 “/root/.kube/config file only refers to one kuberne...” : Bugs : Openstack Integrator Charm

Revision history for this message

Drew Freiberger (afreiberger) wrote on 2022-02-01:

#1

Subscribing field-high as this causes kubelet/pod outages/failover when a single kubernetes-master in a cluster fails.

George Kraft (cynerva) on 2022-02-02

Changed in charm-kubernetes-master:
importance:	Undecided → High
Changed in charm-kubernetes-worker:
importance:	Undecided → High

Revision history for this message

George Kraft (cynerva) wrote on 2022-02-02:

#2

Please attach or otherwise send us output of `juju status --format yaml`. We need details about the cluster. Kubelets and other clients should be using loadbalanced apiserver IPs if the cluster is properly configured.

Have you established a relation between openstack-integrator and kubernetes-master:loadbalancer, as described in our docs[1]? This would be in addition to the usual relation between openstack-integrator and kubernetes-master:openstack.

[1]: https://ubuntu.com/kubernetes/docs/openstack-integration#api-server-load-balancer

Changed in charm-kubernetes-master:
status:	New → Incomplete
Changed in charm-kubernetes-worker:
status:	New → Incomplete

Revision history for this message

Drew Freiberger (afreiberger) wrote on 2022-02-02:

#3

Download full text (6.2 KiB)

After deeper inspection, it appears that kubernetes-master provides the local IP addresses for each unit in the kube-api-endpoint relation data to kubernetes-worker in a specific failure scenario based on the new <cloud>-integrator:loadbalancer pattern (instead of kubeapi-load-balancer).

When using openstack-integrator:loadbalancer relation to kubernetes-master:loadbalancer to provide the ingress IP for API instead of using kubeapi-load-balancer, the openstack-integrator uses openstack loadbalancer API calls to create and manage an Octavia loadbalancer. I'm specifically on a Focal-Ussuri cloud upgraded to latest packages and 21.10 charms.

It appears that my openstack-integrator is in a blocked state due to a bug in the openstack octavia service: https://storyboard.openstack.org/#!/story/2009128

This results in the current relation data from the loadbalancer being the following:
juju show-unit kubernetes-master/0|grep -A10 endpoint:\ loadbalancer
  - endpoint: loadbalancer
    related-endpoint: loadbalancer
    application-data: {}
    related-units:
      openstack-integrator/0:
        in-scope: true
        data:
          egress-subnets: 192.168.200.203/32
          ingress-address: 192.168.200.203
          private-address: 192.168.200.203

So, likely, the issue of openstack integrator not getting a valid return from openstack's octavia/loadbalancer endpoint when trying to add an already-existing member to a pool is keeping openstack-integrator from providing the loadbalanced API IP address in the relation data back to kubernetes-master.

Ultimately, there's a bug in Openstack that the 500 error occurs instead of responding with "pool member already exists", but it may be something that could be worked around by having the openstack-integrator check pool members before blindly adding already existing members of the pool.

Current members of loadbalancer:

openstack loadbalancer member list openstack-integrator-82ea31e06743-kubernetes-master
+--------------------------------------+-----------------+----------------------------------+---------------------+-----------------+---------------+------------------+--------+
| id | name | project_id | provisioning_status | address | protocol_port | operating_status | weight |
+--------------------------------------+-----------------+----------------------------------+---------------------+-----------------+---------------+------------------+--------+
| 97dce0a8-c09d-46b1-9a14-4a4b3689a02d | 192.168.200.139 | 1f11f0c9c0ac40b980313c184b4eb951 | ACTIVE | 192.168.200.139 | 6443 | ONLINE | 1 |
| 09f37976-0899-4f02-90b3-4f480106b06e | 192.168.200.123 | 1f11f0c9c0ac40b980313c184b4eb951 | ACTIVE | 192.168.200.123 | 6443 | ONLINE | 1 |
| ea1be436-a516-4366-b904-950ace366414 | 192.168.200.197 | 1f11f0c9c0ac40b980313c184b4eb951 | ACTIVE | 192.168.200.197 | 6443 | ONLINE | 1 |
+--------------------------------------+-----------------+----------------------------------+---------------------+-----------------+---------------...

After deeper inspection, it appears that kubernetes-master provides the local IP addresses for each unit in the kube-api-endpoint relation data to kubernetes-worker in a specific failure scenario based on the new <cloud>-integrator:loadbalancer pattern (instead of kubeapi-load-balancer).

When using openstack-integrator:loadbalancer relation to kubernetes-master:loadbalancer to provide the ingress IP for API instead of using kubeapi-load-balancer, the openstack-integrator uses openstack loadbalancer API calls to create and manage an Octavia loadbalancer.  I'm specifically on a Focal-Ussuri cloud upgraded to latest packages and 21.10 charms.

It appears that my openstack-integrator is in a blocked state due to a bug in the openstack octavia service: https://storyboard.openstack.org/#!/story/2009128

This results in the current relation data from the loadbalancer being the following:
juju show-unit kubernetes-master/0|grep -A10 endpoint:\ loadbalancer
  - endpoint: loadbalancer
    related-endpoint: loadbalancer
    application-data: {}
    related-units:
      openstack-integrator/0:
        in-scope: true
        data:
          egress-subnets: 192.168.200.203/32
          ingress-address: 192.168.200.203
          private-address: 192.168.200.203

So, likely, the issue of openstack integrator not getting a valid return from openstack's octavia/loadbalancer endpoint when trying to add an already-existing member to a pool is keeping openstack-integrator from providing the loadbalanced API IP address in the relation data back to kubernetes-master.

Ultimately, there's a bug in Openstack that the 500 error occurs instead of responding with "pool member already exists", but it may be something that could be worked around by having the openstack-integrator check pool members before blindly adding already existing members of the pool.

Current members of loadbalancer:

openstack loadbalancer member list openstack-integrator-82ea31e06743-kubernetes-master
+--------------------------------------+-----------------+----------------------------------+---------------------+-----------------+---------------+------------------+--------+
| id                                   | name            | project_id                       | provisioning_status | address         | protocol_port | operating_status | weight |
+--------------------------------------+-----------------+----------------------------------+---------------------+-----------------+---------------+------------------+--------+
| 97dce0a8-c09d-46b1-9a14-4a4b3689a02d | 192.168.200.139 | 1f11f0c9c0ac40b980313c184b4eb951 | ACTIVE              | 192.168.200.139 |          6443 | ONLINE           |      1 |
| 09f37976-0899-4f02-90b3-4f480106b06e | 192.168.200.123 | 1f11f0c9c0ac40b980313c184b4eb951 | ACTIVE              | 192.168.200.123 |          6443 | ONLINE           |      1 |
| ea1be436-a516-4366-b904-950ace366414 | 192.168.200.197 | 1f11f0c9c0ac40b980313c184b4eb951 | ACTIVE              | 192.168.200.197 |          6443 | ONLINE           |      1 |
+--------------------------------------+-----------------+----------------------------------+---------------------+-----------------+---------------+------------------+--------+

Relevant log from openstack-integrator unit:

unit-openstack-integrator-0: 15:17:59 INFO unit.openstack-integrator/0.juju-log Managing load balancer for kubernetes-master
unit-openstack-integrator-0: 15:18:09 INFO unit.openstack-integrator/0.juju-log Found existing security group openstack-integrator-82ea31e06743-kubernetes-master-members (ec6f1673-638b-4da2-a849-4e61716ab5bd)
unit-openstack-integrator-0: 15:18:12 WARNING unit.openstack-integrator/0.update-status 'NoneType' object has no attribute 'to_dict' (HTTP 500) (Request-ID: req-7b82b33b-7f24-4ab2-a614-0a193c8c325c)
unit-openstack-integrator-0: 15:18:13 ERROR unit.openstack-integrator/0.juju-log Error updating loadbalancer
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/lib/charms/layer/openstack.py", line 597, in update_members
    self._impl.create_member(member)
  File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/lib/charms/layer/openstack.py", line 835, in create_member
    _openstack('loadbalancer', 'member', 'create',
  File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/lib/charms/layer/openstack.py", line 312, in _openstack
    output = _run_with_creds('openstack', *args, '--format=yaml')
  File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/lib/charms/layer/openstack.py", line 303, in _run_with_creds
    result = subprocess.run(args,
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('openstack', 'loadbalancer', 'member', 'create', '--name', '192.168.200.123', '--address', '192.168.200.123', '--protocol-port', '6443', '--subnet-id', 'ad028ae1-e941-4730-bbc7-4b8be9526852', 'openstack-integrator-82ea31e06743-kubernetes-master', '--format=yaml')' returned non-zero exit status 1.

unit-openstack-integrator-0: 15:18:13 INFO unit.openstack-integrator/0.juju-log status-set: blocked: Error while updating load balancer; check credential and debug-log

Looking in the openstack-integrator code based on the unit traceback, I see the following:
manage_loadbalancer is being called with a list of current "members" from somewhere:
https://github.com/juju-solutions/charm-openstack-integrator/blob/master/lib/charms/layer/openstack.py#L165
That sends the members to an update_members method of the LoadBalancer class:
https://github.com/juju-solutions/charm-openstack-integrator/blob/master/lib/charms/layer/openstack.py#L172-L174
Update members then does a set-based check of current members detected vs the input members to the function.
https://github.com/juju-solutions/charm-openstack-integrator/blob/a0363d0d103764418e6cf93fbbdbaa0b2b02e55a/lib/charms/layer/openstack.py#L588-L589

In my use case where the members of the loadbalancer appear to match all of the kubernetes-master units, I would expect this to pass and return out of this function, but it continues to detect there are missing elements and attempts to create them.

https://github.com/juju-solutions/charm-openstack-integrator/blob/a0363d0d103764418e6cf93fbbdbaa0b2b02e55a/lib/charms/layer/openstack.py#L599-L607

Revision history for this message

Drew Freiberger (afreiberger) wrote on 2022-02-02:

#4

Download full text (3.4 KiB)

Ultimately, there are two separate assumptions in openstack-integrator's update_members function that leads to this failure mode.

First:

Adding in some sanity checking of the members vs self.members check, it appears that self.members is a local cache of data and is blank. Perhaps if self.members is None when running an update_loadbalancer routine, we should query the LB provider/Octavia for a current list of members before just blindly creating members with the assumption that it's never been set up.

I added some debug log entries to the top of the function to print out the passed arg "members" vs the "self.members" object attribute.

unit-openstack-integrator-0: 16:19:12 INFO unit.openstack-integrator/0.juju-log members: {('192.168.200.139', '6443'), ('192.168.200.197', '6443'), ('192.168.200.123', '6443')}
unit-openstack-integrator-0: 16:19:12 INFO unit.openstack-integrator/0.juju-log self.members: set()

In the create method, we detect if there are already members, though we just created the LB (probably for idempotent and error-handling purposes.)

https://github.com/juju-solutions/charm-openstack-integrator/blob/a0363d0d103764418e6cf93fbbdbaa0b2b02e55a/lib/charms/layer/openstack.py#L525-L527

We may wish to add this same list_members logic into the update_members function:

if not self.members:
self.members = self._impl.list_members()

This would help to prime the charm's cached information about the existing LB and avoid the octavia 500 error bug upon creation of a duplicate member.

I think this may be specifically related to environments where openstack-integrator is removed/added/migrated where it'll detect the loadbalancer and attempt to update it, but has no cached info about the lb's pools/members/etc in the local kv store.

Second:

Upon trying this logic on my local failing unit, it's running into issues with python3 and set logic not working as this code assumes. It is the result of the incoming tuples being (str, str) instead of (str, int) as the self._impl.list_members() results in.

We need to do a stronger match than set logic, or groom the input to the update_loadbalancer function to match the octavia implementation data structure.

From this logic at the top of update_members:

        # prime the members cache before update of pre-existing LB lp#1959720
        log("members: {}", members)
        log("self.members: {}", self.members)
        if not self.members:
            self.members = self._impl.list_members()
            log("self.members updated: {}", self.members)

unit-openstack-integrator-0: 16:38:13 INFO unit.openstack-integrator/0.juju-log members: {('192.168.200.123', '6443'), ('192.168.200.139', '6443'), ('192.168.200.197', '6443')}
unit-openstack-integrator-0: 16:38:13 INFO unit.openstack-integrator/0.juju-log self.members: set()
unit-openstack-integrator-0: 16:38:16 INFO unit.openstack-integrator/0.juju-log self.members updated: {('192.168.200.197', 6443), ('192.168.200.123', 6443), ('192.168.200.139', 6443)}
unit-openstack-integrator-0: 16:38:18 INFO unit.openstack-integrator/0.juju-log Removed member: ('192.168.200.197', 6443)
unit-openstack-integrator-0: 16:38:24 INFO unit.openstack-integra...

I was able to work around the members issue with this patch:

https://github.com/juju-solutions/charm-openstack-integrator/pull/56

However, now I'm getting a new error surrounding checking ports and the charm calling out two port IDs at once, perhaps?

unit-openstack-integrator-0: 17:07:19 INFO unit.openstack-integrator/0.juju-log status-set: maintenance: Managing load balancers
unit-openstack-integrator-0: 17:07:19 INFO unit.openstack-integrator/0.juju-log Managing load balancer for kubernetes-master
unit-openstack-integrator-0: 17:07:29 INFO unit.openstack-integrator/0.juju-log Found existing security group openstack-integrator-82ea31e06743-kubernetes-master-members (ec6f1673-638b-4da2-a849-4e61716ab5bd)
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status No Port found for 6c6267ab-4eb1-4fb3-b7dc-a1de33b00350 81e2a001-803d-45f8-a5c0-be43e652b436
unit-openstack-integrator-0: 17:07:34 ERROR unit.openstack-integrator/0.juju-log Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-openstack-integrator-0/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-openstack-integrator-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-openstack-integrator-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-openstack-integrator-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/reactive/openstack.py", line 142, in create_or_update_loadbalancers
    lb = layer.openstack.manage_loadbalancer(request.application_name,
  File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/lib/charms/layer/openstack.py", line 172, in manage_loadbalancer
    lb_manager = LoadBalancer.get_or_create(
  File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/lib/charms/layer/openstack.py", line 369, in get_or_create
    lb = cls(app_name, port, subnet, algorithm, fip_net, manage_secgrps)
  File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/lib/charms/layer/openstack.py", line 404, in __init__
    self._try_load_cached_info()
  File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/lib/charms/layer/openstack.py", line 663, in _try_load_cached_info
    self._add_member_sg(member)
  File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/lib/charms/layer/openstack.py", line 632, in _add_member_sg
    if self.member_sg_id not in _openstack(
  File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/lib/charms/layer/openstack.py", line 307, in _openstack
    output = _run_with_creds('openstack', *args, '--format=yaml')
  File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/lib/charms/layer/openstack.py", line 298, in _run_with_creds
    result = subprocess.run(args,
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('openstack', 'port', 'show', '6c6267ab-4eb1-4fb3-b7dc-a1de33b00350 81e2a001-803d-45f8-a5c0-be43e652b436', '--format=yaml')' returned non-zero exit status 1.

unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status Traceback (most recent call last):
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status   File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/hooks/update-status", line 22, in <module>
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status     main()
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status   File "/var/lib/juju/agents/unit-openstack-integrator-0/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status     bus.dispatch(restricted=restricted_mode)
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status   File "/var/lib/juju/agents/unit-openstack-integrator-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status     _invoke(other_handlers)
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status   File "/var/lib/juju/agents/unit-openstack-integrator-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status     handler.invoke()
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status   File "/var/lib/juju/agents/unit-openstack-integrator-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status     self._action(*args)
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status   File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/reactive/openstack.py", line 142, in create_or_update_loadbalancers
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status     lb = layer.openstack.manage_loadbalancer(request.application_name,
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status   File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/lib/charms/layer/openstack.py", line 172, in manage_loadbalancer
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status     lb_manager = LoadBalancer.get_or_create(
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status   File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/lib/charms/layer/openstack.py", line 369, in get_or_create
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status     lb = cls(app_name, port, subnet, algorithm, fip_net, manage_secgrps)
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status   File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/lib/charms/layer/openstack.py", line 404, in __init__
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status     self._try_load_cached_info()
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status   File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/lib/charms/layer/openstack.py", line 663, in _try_load_cached_info
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status     self._add_member_sg(member)
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status   File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/lib/charms/layer/openstack.py", line 632, in _add_member_sg
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status     if self.member_sg_id not in _openstack(
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status   File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/lib/charms/layer/openstack.py", line 307, in _openstack
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status     output = _run_with_creds('openstack', *args, '--format=yaml')
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status   File "/var/lib/juju/agents/unit-openstack-integrator-0/charm/lib/charms/layer/openstack.py", line 298, in _run_with_creds
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status     result = subprocess.run(args,
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status   File "/usr/lib/python3.8/subprocess.py", line 516, in run
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status     raise CalledProcessError(retcode, process.args,
unit-openstack-integrator-0: 17:07:34 WARNING unit.openstack-integrator/0.update-status subprocess.CalledProcessError: Command '('openstack', 'port', 'show', '6c6267ab-4eb1-4fb3-b7dc-a1de33b00350 81e2a001-803d-45f8-a5c0-be43e652b436', '--format=yaml')' returned non-zero exit status 1.
unit-openstack-integrator-0: 17:07:34 ERROR juju.worker.uniter.operation hook "update-status" (via explicit, bespoke hook script) failed: exit status 1

George Kraft (cynerva) on 2022-02-02

Changed in charm-openstack-integrator:
importance:	Undecided → High
no longer affects:	charm-kubernetes-master
no longer affects:	charm-kubernetes-worker
Changed in charm-openstack-integrator:
status:	New → Triaged

George Kraft (cynerva) on 2022-02-02

tags:

added: review-needed

Revision history for this message

George Kraft (cynerva) wrote on 2022-02-02:

#6

Thanks for the detailed investigation and the patch.

Aye, that looks like find_port[1] is returning two IDs as a single string. Seems like that code assumes there will only be 1 port.

I'm not sure off the top of my head how those ports get created. I'm guessing they're created implicitly when the Loadbalancer is created. Any details you can provide about ports 6c6267ab-4eb1-4fb3-b7dc-a1de33b00350 and 81e2a001-803d-45f8-a5c0-be43e652b436 would be helpful. I'm trying to figure out if they're two legitimate ports that the charm should be operating on, or if one was created in error. If they are duplicates then perhaps you can delete one to work around the remaining issue.

[1]: https://github.com/juju-solutions/charm-openstack-integrator/blob/a0363d0d103764418e6cf93fbbdbaa0b2b02e55a/lib/charms/layer/openstack.py#L735-L738

Revision history for this message

Drew Freiberger (afreiberger) wrote on 2022-02-02:

#7

Two separate project/tenant networks sharing the same CIDR (but not routed/related to each other), two separate juju controller-deployed models that just happened to get the same IP address.

This call needs to be limited to the subnet of the LB:

$ openstack port list --fixed-ip ip-address=192.168.200.139 -c ID -f value
6c6267ab-4eb1-4fb3-b7dc-a1de33b00350
81e2a001-803d-45f8-a5c0-be43e652b436

$ openstack port list --fixed-ip subnet=ad028ae1-e941-4730-bbc7-4b8be9526852,ip-address=192.168.200.139 -c ID -f value
81e2a001-803d-45f8-a5c0-be43e652b436

I'll open another bug and submit a PR.

Revision history for this message

Drew Freiberger (afreiberger) wrote on 2022-02-02:

#8

https://bugs.launchpad.net/charm-openstack-integrator/+bug/1959853

George Kraft (cynerva) on 2022-02-02

Changed in charm-openstack-integrator:
status:	Triaged → Fix Committed
milestone:	none → 1.23+ck1
tags:	added: backport-needed removed: review-needed

Revision history for this message

Drew Freiberger (afreiberger) wrote on 2022-02-02:

#9

Back to my original issue for opening the case.

It appears that once I've patched the two items which were causing hook failures for openstack-integrator, I can now see that openstack-integrator provides the Octavia Loadbalancer's public-address within the loadbalancer relation to kubernetes-master, but on cs:~containers/kubernetes-master-1034, I'm not seeing either the public-address of the LB or the private-address of the LB making it's way through relations to the kubernetes-worker units and their /root/.kube/config file.

I know Thiago was working on reproducing and couldn't reproduce with kubeapi-load-balancer as the LB provider for kubernetes-master, so I'm wondering if there's an issue with the way openstack-integrator provides this information that isn't being passed on as expected.

Expected result:

Either the public or private IP address of the octavia loadbalancer for the kubernetes-master units created by the openstack-integrator should be configured as the upstream API address for kubernetes-worker processes such as kubelet and kubectl.

If this is solved in a later version of kubernetes-master and this is a k8s 1.19 charms + openstack-integrator:loadbalancer issue, please let me know, as we may have mixed new model patterns with an older version of charms to suit this particular environment's request for the 1.19 cluster.

Revision history for this message

George Kraft (cynerva) wrote on 2022-02-03:

#10

Ah, okay. Re-adding kubernetes-master and kubernetes-worker as affected projects until we figure out where the remaining issue lies.

> If this is solved in a later version of kubernetes-master and this is a k8s 1.19 charms + openstack-integrator:loadbalancer issue, please let me know

It's possible. It looks like you're running Charmed Kubernetes 1.21+ck3, given the kubernetes-master and kubernetes-worker revs from the bug description. The apiserver address handling was refactored significantly in Charmed Kubernetes 1.22[1][2]. This issue may have been fixed as part of that work, but I'm unable to confirm at the moment.

I'll see if I can reproduce this in serverstack.

[1]: https://bugs.launchpad.net/charm-kubernetes-master/+bug/1921776
[2]: https://github.com/charmed-kubernetes/charm-kubernetes-control-plane/pull/153

Changed in charm-kubernetes-master:
importance:	Undecided → High
Changed in charm-kubernetes-worker:
importance:	Undecided → High

Revision history for this message

George Kraft (cynerva) wrote on 2022-02-03:

#11

If you want additional eyes on your cluster, I'm interested in seeing:

juju show-unit kubernetes-master/0
juju show-unit kubernetes-worker/0
juju debug-log --replay

Revision history for this message

George Kraft (cynerva) wrote on 2022-02-03:

#12

My serverstack credentials don't have permissions to list/create loadbalancers. I'm looking into what it will take to get those permissions added, but until then, I'm blocked on reproducing this issue.

Revision history for this message

George Kraft (cynerva) wrote on 2022-02-04:

#13

Ok, I got my credential permissions sorted and was able to reproduce the issue easily enough.

I can confirm that it is fixed in Charmed Kubernetes 1.22. After you upgrade to 1.22, make sure you remove the deprecated relation between kubernetes-worker:kube-api-endpoint and kubernetes-master:kube-api-endpoint. That will allow kubernetes-worker to get the API endpoint from the kube-control relation instead, which uses the loadbalanced API endpoint.

Changed in charm-kubernetes-master:
status:	New → Invalid
Changed in charm-kubernetes-worker:
status:	New → Invalid

Revision history for this message

Drew Freiberger (afreiberger) wrote on 2022-02-04:

#14

removed field-high. fixes are all merged upstream, and workarounds deployed onsite.

Kevin W Monroe (kwmonroe) on 2022-05-10

Changed in charm-openstack-integrator:
milestone:	1.23+ck1 → 1.24
tags:	removed: backport-needed

Kevin W Monroe (kwmonroe) on 2022-05-10

Changed in charm-openstack-integrator:
status:	Fix Committed → Fix Released

	Status	Importance	Assigned to	Milestone
Kubernetes Control Plane Charm	Invalid	High	Unassigned
Kubernetes Worker Charm	Invalid	High	Unassigned
Openstack Integrator Charm	Fix Released	High	Unassigned	Openstack Integrator Charm 1.24

Openstack Integrator Charm

/root/.kube/config file only refers to one kubernetes-master IP causing API timeouts between kubelet and offline clustered k8s-master

Bug Description

Other bug subscribers

Remote bug watches