NO_MONITOR for openstack-integrator provisioned Octavia LB in front of kubernetes-master

Bug #1937171 reported by Nobuto Murata
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Openstack Integrator Charm
Fix Released
High
Samuel Allan

Bug Description

By having the following relation, openstack-integrator will provision an Octavia loadbalancer in front of k8s-master for user API traffic including kubectl.

  - ['openstack-integrator:loadbalancer', 'kubernetes-master:loadbalancer']

However, the provisioned loadbalancer doesn't have any monitor to the backend k8s master units so member list won't be managed properly and haproxy in amphora keeps retrying the requests including the failed backend all the time. There seems to be no obvious failure from an API user point of view.

$ openstack loadbalancer list -c name -c provisioning_status -c operating_status -c provider
+-----------------------------------------------------+---------------------+------------------+----------+
| name | provisioning_status | operating_status | provider |
+-----------------------------------------------------+---------------------+------------------+----------+
| openstack-integrator-7f03314f1796-kubernetes-master | ACTIVE | ONLINE | amphora |
+-----------------------------------------------------+---------------------+------------------+----------+

$ openstack loadbalancer member list openstack-integrator-7f03314f1796-kubernetes-master
+--------------------------------------+-----------+----------------------------------+---------------------+-----------+---------------+------------------+--------+
| id | name | project_id | provisioning_status | address | protocol_port | operating_status | weight |
+--------------------------------------+-----------+----------------------------------+---------------------+-----------+---------------+------------------+--------+
| 741c3862-1e10-4ff3-a67b-e95b5335f754 | 10.5.5.16 | 97cec311887e415ca46736bb9e917376 | ACTIVE | 10.5.5.16 | 6443 | NO_MONITOR | 1 |
+--------------------------------------+-----------+----------------------------------+---------------------+-----------+---------------+------------------+--------+

^^^ NO_MONITOR

$ openstack loadbalancer healthmonitor list
-> (empty)

Revision history for this message
Nobuto Murata (nobuto) wrote :

It looks like the health-check endpoints are not allowed by an unauthorized user by default so we could use a port based status check for the time being.

https://kubernetes.io/docs/reference/access-authn-authz/_print/#other-component-roles
> Allows read access to control-plane monitoring endpoints (i.e.
> kube-apiserver liveness and readiness endpoints (/healthz, /livez,
> /readyz), the individual health-check endpoints (/healthz/*, /livez/*,
> /readyz/*), and /metrics). Note that individual health check endpoints
> and the metric endpoint may expose sensitive information.

$ kubectl get --raw='/livez'
ok

$ kubectl get --raw='/livez?verbose'
[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
livez check passed

$ curl -ks https://192.168.151.76:6443/livez
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}

Revision history for this message
Nobuto Murata (nobuto) wrote :

TLS-HELLO would do the trick to monitor the port 6443 of k8s-master.

$ openstack loadbalancer healthmonitor create \
    --delay 5 --max-retries 4 --timeout 10 \
    --type TLS-HELLO \
    openstack-integrator-7f03314f1796-kubernetes-master

$ openstack loadbalancer member list openstack-integrator-7f03314f1796-kubernetes-master
+--------------------------------------+-----------+----------------------------------+---------------------+-----------+---------------+------------------+--------+
| id | name | project_id | provisioning_status | address | protocol_port | operating_status | weight |
+--------------------------------------+-----------+----------------------------------+---------------------+-----------+---------------+------------------+--------+
| 741c3862-1e10-4ff3-a67b-e95b5335f754 | 10.5.5.16 | 97cec311887e415ca46736bb9e917376 | ACTIVE | 10.5.5.16 | 6443 | ONLINE | 1 |
+--------------------------------------+-----------+----------------------------------+---------------------+-----------+---------------+------------------+--------+

^^^ ONLINE

By stopping snap.kube-apiserver.daemon.service for testing.

$ openstack loadbalancer member list openstack-integrator-7f03314f1796-kubernetes-master
+--------------------------------------+-----------+----------------------------------+---------------------+-----------+---------------+------------------+--------+
| id | name | project_id | provisioning_status | address | protocol_port | operating_status | weight |
+--------------------------------------+-----------+----------------------------------+---------------------+-----------+---------------+------------------+--------+
| 741c3862-1e10-4ff3-a67b-e95b5335f754 | 10.5.5.16 | 97cec311887e415ca46736bb9e917376 | ACTIVE | 10.5.5.16 | 6443 | ERROR | 1 |
+--------------------------------------+-----------+----------------------------------+---------------------+-----------+---------------+------------------+--------+

^^^ ERROR

Revision history for this message
Nobuto Murata (nobuto) wrote :

This may not be critical. As far as I see, even without the health monitor, API requests are retried(?).

Anyway, by adding the health monitor, haproxy config inside the Amphpora instance will be updated as follows:

[without health monitor]

backend 7d5d4323-f382-4a85-a965-4fcedd8c77f2:199362ee-e305-4ebf-947a-d3c155341ac9
    mode tcp
    balance roundrobin
    fullconn 50000
    option allbackups
    timeout connect 5000
    timeout server 50000
    server 91d6a444-6986-49c7-b46a-a17838d7a638 10.5.5.23:6443 weight 1
    server 0e91354d-1f60-40d2-b748-d4e6be491ebe 10.5.5.66:6443 weight 1

[with TLS-HELLO health monitor]

backend 7d5d4323-f382-4a85-a965-4fcedd8c77f2:199362ee-e305-4ebf-947a-d3c155341ac9
    mode tcp
    balance roundrobin
    timeout check 10s
    option ssl-hello-chk
    fullconn 50000
    option allbackups
    timeout connect 5000
    timeout server 50000
    server 91d6a444-6986-49c7-b46a-a17838d7a638 10.5.5.23:6443 weight 1 check inter 5s fall 3 rise 4
    server 0e91354d-1f60-40d2-b748-d4e6be491ebe 10.5.5.66:6443 weight 1 check inter 5s fall 3 rise 4

description: updated
Revision history for this message
Nobuto Murata (nobuto) wrote :

> As far as I see, even without the health monitor, API requests are retried(?).

Now I know what's going on. Without the health monitor, haproxy is not aware of the backend failures but retrying the connection (wredis: redispatches - warning) to the backends all the time thanks to "option redispatch" in haproxy.cfg.

====
defaults
    log global
    retries 3
    option redispatch
    option splice-request
    option splice-response
    option http-keep-alive
====

So the backend failure is not obvious from an API user point of view which is good. So having the health monitor on top is not must-have but still nice-to-have to manage the backend service status properly instead of retrying always including the failed backend.

description: updated
Changed in charm-openstack-integrator:
assignee: nobody → Samuel Walladge (swalladge)
status: New → In Progress
Revision history for this message
Samuel Allan (samuelallan) wrote :
Revision history for this message
Samuel Allan (samuelallan) wrote :
Revision history for this message
Nobuto Murata (nobuto) wrote :

Subscribing ~field-medium.

George Kraft (cynerva)
Changed in charm-openstack-integrator:
importance: Undecided → High
tags: added: review-needed
Changed in charm-openstack-integrator:
milestone: none → 1.24
Revision history for this message
George Kraft (cynerva) wrote :

Thanks for the detailed investigation and notes, Nobuto. Thanks for the PR, Samuel. This is merged and will go out with Charmed Kubernetes 1.24.

Changed in charm-openstack-integrator:
status: In Progress → Fix Committed
tags: removed: review-needed
Changed in charm-openstack-integrator:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.