Openstack Integrator Charm

NO_MONITOR for openstack-integrator provisioned Octavia LB in front of kubernetes-master

Bug #1937171 reported by Nobuto Murata on 2021-07-22

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Openstack Integrator Charm	Fix Released	High	Samuel Allan	Openstack Integrator Charm 1.24

Bug Description

By having the following relation, openstack-integrator will provision an Octavia loadbalancer in front of k8s-master for user API traffic including kubectl.

- ['openstack-integrator:loadbalancer', 'kubernetes-master:loadbalancer']

However, the provisioned loadbalancer doesn't have any monitor to the backend k8s master units so member list won't be managed properly and haproxy in amphora keeps retrying the requests including the failed backend all the time. There seems to be no obvious failure from an API user point of view.

$ openstack loadbalancer list -c name -c provisioning_status -c operating_status -c provider
+-----------------------------------------------------+---------------------+------------------+----------+
| name | provisioning_status | operating_status | provider |
+-----------------------------------------------------+---------------------+------------------+----------+
| openstack-integrator-7f03314f1796-kubernetes-master | ACTIVE | ONLINE | amphora |
+-----------------------------------------------------+---------------------+------------------+----------+

^^^ NO_MONITOR

$ openstack loadbalancer healthmonitor list
-> (empty)

See original description

Revision history for this message

Nobuto Murata (nobuto) wrote on 2021-07-22:

It looks like the health-check endpoints are not allowed by an unauthorized user by default so we could use a port based status check for the time being.

https://kubernetes.io/docs/reference/access-authn-authz/_print/#other-component-roles
> Allows read access to control-plane monitoring endpoints (i.e.
> kube-apiserver liveness and readiness endpoints (/healthz, /livez,
> /readyz), the individual health-check endpoints (/healthz/*, /livez/*,
> /readyz/*), and /metrics). Note that individual health check endpoints
> and the metric endpoint may expose sensitive information.

$ kubectl get --raw='/livez'
ok

$ kubectl get --raw='/livez?verbose'
[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
livez check passed

$ curl -ks https://192.168.151.76:6443/livez
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}

Revision history for this message

Nobuto Murata (nobuto) wrote on 2021-07-22:

TLS-HELLO would do the trick to monitor the port 6443 of k8s-master.

$ openstack loadbalancer healthmonitor create \
    --delay 5 --max-retries 4 --timeout 10 \
    --type TLS-HELLO \
    openstack-integrator-7f03314f1796-kubernetes-master

^^^ ONLINE

By stopping snap.kube-apiserver.daemon.service for testing.

^^^ ERROR

TLS-HELLO would do the trick to monitor the port 6443 of k8s-master.

$ openstack loadbalancer healthmonitor create \
    --delay 5 --max-retries 4 --timeout 10 \
    --type TLS-HELLO  \
    openstack-integrator-7f03314f1796-kubernetes-master

^^^ ONLINE

By stopping snap.kube-apiserver.daemon.service for testing.

^^^ ERROR

Revision history for this message

Nobuto Murata (nobuto) wrote on 2021-07-22:

This may not be critical. As far as I see, even without the health monitor, API requests are retried(?).

Anyway, by adding the health monitor, haproxy config inside the Amphpora instance will be updated as follows:

[without health monitor]

backend 7d5d4323-f382-4a85-a965-4fcedd8c77f2:199362ee-e305-4ebf-947a-d3c155341ac9
    mode tcp
    balance roundrobin
    fullconn 50000
    option allbackups
    timeout connect 5000
    timeout server 50000
    server 91d6a444-6986-49c7-b46a-a17838d7a638 10.5.5.23:6443 weight 1
    server 0e91354d-1f60-40d2-b748-d4e6be491ebe 10.5.5.66:6443 weight 1

[with TLS-HELLO health monitor]

backend 7d5d4323-f382-4a85-a965-4fcedd8c77f2:199362ee-e305-4ebf-947a-d3c155341ac9
    mode tcp
    balance roundrobin
    timeout check 10s
    option ssl-hello-chk
    fullconn 50000
    option allbackups
    timeout connect 5000
    timeout server 50000
    server 91d6a444-6986-49c7-b46a-a17838d7a638 10.5.5.23:6443 weight 1 check inter 5s fall 3 rise 4
    server 0e91354d-1f60-40d2-b748-d4e6be491ebe 10.5.5.66:6443 weight 1 check inter 5s fall 3 rise 4

description:

updated

Revision history for this message

Nobuto Murata (nobuto) wrote on 2021-07-22:

Screenshot from 2021-07-23 00-30-46.png Edit (89.5 KiB, image/png)

> As far as I see, even without the health monitor, API requests are retried(?).

Now I know what's going on. Without the health monitor, haproxy is not aware of the backend failures but retrying the connection (wredis: redispatches - warning) to the backends all the time thanks to "option redispatch" in haproxy.cfg.

====
defaults
    log global
    retries 3
    option redispatch
    option splice-request
    option splice-response
    option http-keep-alive
====

So the backend failure is not obvious from an API user point of view which is good. So having the health monitor on top is not must-have but still nice-to-have to manage the backend service status properly instead of retrying always including the failed backend.

description:

updated

Samuel Allan (samuelallan) on 2022-03-17

Changed in charm-openstack-integrator:
assignee:	nobody → Samuel Walladge (swalladge)
status:	New → In Progress

Revision history for this message

Samuel Allan (samuelallan) wrote on 2022-03-17:

Opened PR (work in progress): https://github.com/juju-solutions/charm-openstack-integrator/pull/59

Revision history for this message

Samuel Allan (samuelallan) wrote on 2022-03-18:

PR is ready for review! https://github.com/juju-solutions/charm-openstack-integrator/pull/59

Revision history for this message

Nobuto Murata (nobuto) wrote on 2022-03-18:

Subscribing ~field-medium.

George Kraft (cynerva) on 2022-03-18

Changed in charm-openstack-integrator:
importance:	Undecided → High
tags:	added: review-needed

Kevin W Monroe (kwmonroe) on 2022-03-28

Changed in charm-openstack-integrator:
milestone:	none → 1.24

Revision history for this message

George Kraft (cynerva) wrote on 2022-03-30:

Thanks for the detailed investigation and notes, Nobuto. Thanks for the PR, Samuel. This is merged and will go out with Charmed Kubernetes 1.24.

Changed in charm-openstack-integrator:
status:	In Progress → Fix Committed
tags:	removed: review-needed

Kevin W Monroe (kwmonroe) on 2022-05-10

Changed in charm-openstack-integrator:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Screenshot from 2021-07-23 00-30-46.png Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.