non cdk apps stuck at "waiting for /root/.kube/config" when filebeat in cdk env

Bug #1858668 reported by Jeff Hillman
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Charmed Kubernetes Bundles
Fix Released
Low
Kevin W Monroe
Filebeat Charm
Fix Released
Medium
Kevin W Monroe

Bug Description

In a charmed kubernetes environment, adding the LMA stack (or vault), is seeing that non CDK items are having filebeat stuck in a situation where it says 'Waiting for: /root/.kube/config'

Here's a juju status, notice that all non k8s related items have this message:

---

$ juju status filebeat
Model Controller Cloud/Region Version SLA Timestamp
kubernetes foundations-maas maas_cloud 2.6.10 unsupported 17:32:40Z

App Version Status Scale Charm Store Rev OS Notes
canal 0.10.0/3.10.1 active 0 canal jujucharms 688 ubuntu
ceph-osd 12.2.12 active 3 ceph-osd jujucharms 294 ubuntu
containerd active 0 containerd local 0 ubuntu
filebeat 6.8.6 maintenance 18 filebeat jujucharms 25 ubuntu
hacluster-kubernetes-master active 0 hacluster jujucharms 63 ubuntu
hacluster-vault active 0 hacluster jujucharms 63 ubuntu
kubernetes-master 1.17.0 active 3 kubernetes-master jujucharms 788 ubuntu
kubernetes-worker 1.17.0 active 3 kubernetes-worker jujucharms 623 ubuntu exposed
landscape-client maintenance 0 landscape-client jujucharms 32 ubuntu
landscape-haproxy active 1 haproxy jujucharms 55 ubuntu exposed
landscape-postgresql 10.10 active 2 postgresql jujucharms 199 ubuntu
landscape-rabbitmq-server 3.6.10 active 3 rabbitmq-server jujucharms 97 ubuntu
landscape-server active 3 landscape-server jujucharms 35 ubuntu
nrpe-container active 0 nrpe jujucharms 60 ubuntu
nrpe-host active 0 nrpe jujucharms 60 ubuntu
telegraf active 0 telegraf jujucharms 29 ubuntu
vault 1.1.1 active 3 vault jujucharms 32 ubuntu

Unit Workload Agent Machine Public address Ports Message
ceph-osd/0* active idle 22 10.109.12.5 Unit is ready (8 OSD)
ceph-osd/1 active idle 23 10.109.12.6 Unit is ready (8 OSD)
ceph-osd/2 active idle 24 10.109.12.7 Unit is ready (8 OSD)
kubernetes-master/0 active idle 19/lxd/2 10.109.15.112 6443/tcp Kubernetes master running.
  canal/5 active idle 10.109.15.112 Flannel subnet 10.1.9.1/24
  containerd/5 active idle 10.109.15.112 Container runtime available
  filebeat/13 active idle 10.109.15.112 Filebeat ready.
  hacluster-kubernetes-master/2 active idle 10.109.15.112 Unit is ready and clustered
  landscape-client/27 maintenance idle 10.109.15.112 Need computer-title and juju-info to proceed
  nrpe-container/12 active idle 10.109.15.112 icmp,5666/tcp ready
  telegraf/16 active idle 10.109.15.112 9103/tcp Monitoring kubernetes-master/0
kubernetes-master/1 active idle 20/lxd/2 10.109.15.116 6443/tcp Kubernetes master running.
  canal/4 active idle 10.109.15.116 Flannel subnet 10.1.80.1/24
  containerd/4 active idle 10.109.15.116 Container runtime available
  filebeat/12 active idle 10.109.15.116 Filebeat ready.
  hacluster-kubernetes-master/1* active idle 10.109.15.116 Unit is ready and clustered
  landscape-client/26 maintenance idle 10.109.15.116 Need computer-title and juju-info to proceed
  nrpe-container/11 active idle 10.109.15.116 icmp,5666/tcp ready
  telegraf/15 active idle 10.109.15.116 9103/tcp Monitoring kubernetes-master/1
kubernetes-master/2* active idle 21/lxd/2 10.109.15.110 6443/tcp Kubernetes master running.
  canal/3 active idle 10.109.15.110 Flannel subnet 10.1.16.1/24
  containerd/3 active idle 10.109.15.110 Container runtime available
  filebeat/11 active idle 10.109.15.110 Filebeat ready.
  hacluster-kubernetes-master/0 active idle 10.109.15.110 Unit is ready and clustered
  landscape-client/25 maintenance idle 10.109.15.110 Need computer-title and juju-info to proceed
  nrpe-container/10 active idle 10.109.15.110 icmp,5666/tcp ready
  telegraf/14 active idle 10.109.15.110 9103/tcp Monitoring kubernetes-master/2
kubernetes-worker/0* active idle 22 10.109.12.5 80/tcp,443/tcp Kubernetes worker running.
  canal/0* active idle 10.109.12.5 Flannel subnet 10.1.58.1/24
  containerd/0* active idle 10.109.12.5 Container runtime available
  filebeat/8 active idle 10.109.12.5 Filebeat ready.
  landscape-client/22 maintenance idle 10.109.12.5 Need computer-title and juju-info to proceed
  nrpe-host/12 active idle 10.109.12.5 icmp,5666/tcp ready
  telegraf/11 active idle 10.109.12.5 9103/tcp Monitoring kubernetes-worker/0
kubernetes-worker/1 active idle 23 10.109.12.6 80/tcp,443/tcp Kubernetes worker running.
  canal/1 active idle 10.109.12.6 Flannel subnet 10.1.19.1/24
  containerd/1 active idle 10.109.12.6 Container runtime available
  filebeat/9 active idle 10.109.12.6 Filebeat ready.
  landscape-client/23 maintenance idle 10.109.12.6 Need computer-title and juju-info to proceed
  nrpe-host/13 active idle 10.109.12.6 icmp,5666/tcp ready
  telegraf/12 active idle 10.109.12.6 9103/tcp Monitoring kubernetes-worker/1
kubernetes-worker/2 active idle 24 10.109.12.7 80/tcp,443/tcp Kubernetes worker running.
  canal/2 active idle 10.109.12.7 Flannel subnet 10.1.73.1/24
  containerd/2 active idle 10.109.12.7 Container runtime available
  filebeat/10 active idle 10.109.12.7 Filebeat ready.
  landscape-client/24 maintenance idle 10.109.12.7 Need computer-title and juju-info to proceed
  nrpe-host/14 active idle 10.109.12.7 icmp,5666/tcp ready
  telegraf/13 active idle 10.109.12.7 9103/tcp Monitoring kubernetes-worker/2
landscape-haproxy/0* active idle 9 10.109.15.86 80/tcp,443/tcp Unit is ready
  filebeat/0 maintenance idle 10.109.15.86 Waiting for: /root/.kube/config
  landscape-client/1 maintenance idle 10.109.15.86 Need computer-title and juju-info to proceed
  nrpe-host/1 active idle 10.109.15.86 icmp,5666/tcp ready
  telegraf/0* active idle 10.109.15.86 9103/tcp Monitoring landscape-haproxy/0
landscape-postgresql/0 active idle 13 10.109.15.90 5432/tcp Live secondary (10.10)
  filebeat/15 maintenance idle 10.109.15.90 Waiting for: /root/.kube/config
  landscape-client/31 maintenance idle 10.109.15.90 Need computer-title and juju-info to proceed
  nrpe-host/18 active idle 10.109.15.90 icmp,5666/tcp ready
  telegraf/18 active idle 10.109.15.90 9103/tcp Monitoring landscape-postgresql/0
landscape-postgresql/1* active idle 14 10.109.15.93 5432/tcp Live master (10.10)
  filebeat/6 maintenance idle 10.109.15.93 Waiting for: /root/.kube/config
  landscape-client/10 maintenance idle 10.109.15.93 Need computer-title and juju-info to proceed
  nrpe-host/10 active idle 10.109.15.93 icmp,5666/tcp ready
  telegraf/6 active idle 10.109.15.93 9103/tcp Monitoring landscape-postgresql/1
landscape-rabbitmq-server/0 active idle 10 10.109.15.97 5672/tcp Unit is ready and clustered
  filebeat/7* maintenance idle 10.109.15.97 Waiting for: /root/.kube/config
  landscape-client/11 maintenance idle 10.109.15.97 Need computer-title and juju-info to proceed
  nrpe-host/11 active idle 10.109.15.97 icmp,5666/tcp ready
  telegraf/7 active idle 10.109.15.97 9103/tcp Monitoring landscape-rabbitmq-server/0
landscape-rabbitmq-server/1 active idle 11 10.109.15.92 5672/tcp Unit is ready and clustered
  filebeat/17 maintenance idle 10.109.15.92 Waiting for: /root/.kube/config
  landscape-client/33 maintenance idle 10.109.15.92 Need computer-title and juju-info to proceed
  nrpe-host/20 active idle 10.109.15.92 icmp,5666/tcp ready
  telegraf/20 active idle 10.109.15.92 9103/tcp Monitoring landscape-rabbitmq-server/1
landscape-rabbitmq-server/2* active idle 12 10.109.15.83 5672/tcp Unit is ready and clustered
  filebeat/5 maintenance idle 10.109.15.83 Waiting for: /root/.kube/config
  landscape-client/8 maintenance idle 10.109.15.83 Need computer-title and juju-info to proceed
  nrpe-host/8 active idle 10.109.15.83 icmp,5666/tcp ready
  telegraf/5 active idle 10.109.15.83 9103/tcp Monitoring landscape-rabbitmq-server/2
landscape-server/0 active idle 15 10.109.15.88
  filebeat/1 maintenance idle 10.109.15.88 Waiting for: /root/.kube/config
  landscape-client/4 maintenance idle 10.109.15.88 Need computer-title and juju-info to proceed
  nrpe-host/4 active idle 10.109.15.88 icmp,5666/tcp ready
  telegraf/1 active idle 10.109.15.88 9103/tcp Monitoring landscape-server/0
landscape-server/1 active idle 16 10.109.15.95
  filebeat/16 maintenance idle 10.109.15.95 Waiting for: /root/.kube/config
  landscape-client/32 maintenance idle 10.109.15.95 Need computer-title and juju-info to proceed
  nrpe-host/19 active idle 10.109.15.95 icmp,5666/tcp ready
  telegraf/19 active idle 10.109.15.95 9103/tcp Monitoring landscape-server/1
landscape-server/2* active idle 17 10.109.15.87
  filebeat/2 maintenance idle 10.109.15.87 Waiting for: /root/.kube/config
  landscape-client/5 maintenance idle 10.109.15.87 Need computer-title and juju-info to proceed
  nrpe-host/5 active idle 10.109.15.87 icmp,5666/tcp ready
  telegraf/2 active idle 10.109.15.87 9103/tcp Monitoring landscape-server/2
vault/0 active idle 6 10.109.15.84 8200/tcp Unit is ready (active: true, mlock: enabled)
  filebeat/14 maintenance idle 10.109.15.84 Waiting for: /root/.kube/config
  hacluster-vault/2 active idle 10.109.15.84 Unit is ready and clustered
  landscape-client/30 maintenance idle 10.109.15.84 Need computer-title and juju-info to proceed
  nrpe-host/17 active idle 10.109.15.84 icmp,5666/tcp ready
  telegraf/17 active idle 10.109.15.84 9103/tcp Monitoring vault/0
vault/1 active idle 7 10.109.15.94 8200/tcp Unit is ready (active: false, mlock: enabled)
  filebeat/4 maintenance idle 10.109.15.94 Waiting for: /root/.kube/config
  hacluster-vault/1* active idle 10.109.15.94 Unit is ready and clustered
  landscape-client/7 maintenance idle 10.109.15.94 Need computer-title and juju-info to proceed
  nrpe-host/7 active idle 10.109.15.94 icmp,5666/tcp ready
  telegraf/4 active idle 10.109.15.94 9103/tcp Monitoring vault/1
vault/2* active idle 8 10.109.15.89 8200/tcp Unit is ready (active: false, mlock: enabled)
  filebeat/3 maintenance idle 10.109.15.89 Waiting for: /root/.kube/config
  hacluster-vault/0 active idle 10.109.15.89 Unit is ready and clustered
  landscape-client/6 maintenance idle 10.109.15.89 Need computer-title and juju-info to proceed
  nrpe-host/6 active idle 10.109.15.89 icmp,5666/tcp ready
  telegraf/3 active idle 10.109.15.89 9103/tcp Monitoring vault/2

---

Here's the relations from that same output

---
Relation provider Requirer Interface Type Message
apache2:juju-info landscape-client:container juju-info subordinate
apache2:nrpe-external-master nrpe-host:nrpe-external-master nrpe-external-master subordinate
ceph-mon:admin kubernetes-master:ceph-storage ceph-admin regular
ceph-mon:client kubernetes-master:ceph-client ceph-client regular
ceph-mon:client landscape-client:ceph-client ceph-client subordinate
ceph-mon:mon ceph-mon:mon ceph peer
ceph-mon:nrpe-external-master nrpe-container:nrpe-external-master nrpe-external-master subordinate
ceph-mon:osd ceph-osd:mon ceph-osd regular
easyrsa:client etcd:certificates tls-certificates regular
easyrsa:client kubernetes-master:certificates tls-certificates regular
easyrsa:client kubernetes-worker:certificates tls-certificates regular
easyrsa:juju-info landscape-client:container juju-info subordinate
easyrsa:juju-info nrpe-host:general-info juju-info subordinate
elasticsearch:client graylog:elasticsearch elasticsearch regular
elasticsearch:juju-info landscape-client:container juju-info subordinate
elasticsearch:nrpe-external-master nrpe-host:nrpe-external-master nrpe-external-master subordinate
elasticsearch:peer elasticsearch:peer http peer
etcd:cluster etcd:cluster etcd peer
etcd:db canal:etcd etcd regular
etcd:db kubernetes-master:etcd etcd regular
etcd:db vault:etcd etcd regular
etcd:juju-info landscape-client:container juju-info subordinate
etcd:nrpe-external-master nrpe-container:nrpe-external-master nrpe-external-master subordinate
grafana:juju-info landscape-client:container juju-info subordinate
grafana:nrpe-external-master nrpe-host:nrpe-external-master nrpe-external-master subordinate
graylog:beats filebeat:logstash elastic-beats regular
graylog:juju-info landscape-client:container juju-info subordinate
graylog:nrpe-external-master nrpe-host:nrpe-external-master nrpe-external-master subordinate
graylog:website apache2:reverseproxy http regular
hacluster-kubernetes-master:ha kubernetes-master:ha hacluster subordinate
hacluster-kubernetes-master:hanode hacluster-kubernetes-master:hanode hacluster peer
hacluster-mysql:ha mysql:ha hacluster subordinate
hacluster-mysql:hanode hacluster-mysql:hanode hacluster peer
hacluster-vault:ha vault:ha hacluster subordinate
hacluster-vault:hanode hacluster-vault:hanode hacluster peer
kubernetes-master:cni canal:cni kubernetes-cni subordinate
kubernetes-master:container-runtime containerd:containerd container-runtime subordinate
kubernetes-master:coordinator kubernetes-master:coordinator coordinator peer
kubernetes-master:juju-info filebeat:beats-host juju-info subordinate
kubernetes-master:juju-info landscape-client:container juju-info subordinate
kubernetes-master:juju-info telegraf:juju-info juju-info subordinate
kubernetes-master:kube-api-endpoint kubernetes-worker:kube-api-endpoint http regular
kubernetes-master:kube-control kubernetes-worker:kube-control kube-control regular
kubernetes-master:kube-masters kubernetes-master:kube-masters kube-masters peer
kubernetes-master:nrpe-external-master nrpe-container:nrpe-external-master nrpe-external-master subordinate
kubernetes-worker:cni canal:cni kubernetes-cni subordinate
kubernetes-worker:container-runtime containerd:containerd container-runtime subordinate
kubernetes-worker:coordinator kubernetes-worker:coordinator coordinator peer
kubernetes-worker:juju-info filebeat:beats-host juju-info subordinate
kubernetes-worker:juju-info landscape-client:container juju-info subordinate
kubernetes-worker:juju-info telegraf:juju-info juju-info subordinate
kubernetes-worker:nrpe-external-master nrpe-host:nrpe-external-master nrpe-external-master subordinate
landscape-haproxy:juju-info filebeat:beats-host juju-info subordinate
landscape-haproxy:juju-info landscape-client:container juju-info subordinate
landscape-haproxy:juju-info telegraf:juju-info juju-info subordinate
landscape-haproxy:nrpe-external-master nrpe-host:nrpe-external-master nrpe-external-master subordinate
landscape-haproxy:peer landscape-haproxy:peer haproxy-peer peer
landscape-postgresql:coordinator landscape-postgresql:coordinator coordinator peer
landscape-postgresql:db-admin landscape-server:db pgsql regular
landscape-postgresql:juju-info filebeat:beats-host juju-info subordinate
landscape-postgresql:juju-info landscape-client:container juju-info subordinate
landscape-postgresql:juju-info telegraf:juju-info juju-info subordinate
landscape-postgresql:nrpe-external-master nrpe-host:nrpe-external-master nrpe-external-master subordinate
landscape-postgresql:replication landscape-postgresql:replication pgpeer peer
landscape-rabbitmq-server:amqp landscape-server:amqp rabbitmq regular
landscape-rabbitmq-server:cluster landscape-rabbitmq-server:cluster rabbitmq-ha peer
landscape-rabbitmq-server:juju-info filebeat:beats-host juju-info subordinate
landscape-rabbitmq-server:juju-info landscape-client:container juju-info subordinate
landscape-rabbitmq-server:juju-info telegraf:juju-info juju-info subordinate
landscape-rabbitmq-server:nrpe-external-master nrpe-host:nrpe-external-master nrpe-external-master subordinate
landscape-server:juju-info filebeat:beats-host juju-info subordinate
landscape-server:juju-info landscape-client:container juju-info subordinate
landscape-server:juju-info nrpe-host:general-info juju-info subordinate
landscape-server:juju-info telegraf:juju-info juju-info subordinate
landscape-server:website landscape-haproxy:reverseproxy http regular
mongodb:database graylog:mongodb mongodb regular
mongodb:juju-info landscape-client:container juju-info subordinate
mongodb:nrpe-external-master nrpe-container:nrpe-external-master nrpe-external-master subordinate
mongodb:replica-set mongodb:replica-set mongodb-replica-set peer
mysql:cluster mysql:cluster percona-cluster peer
mysql:juju-info landscape-client:container juju-info subordinate
mysql:juju-info telegraf:juju-info juju-info subordinate
mysql:nrpe-external-master nrpe-container:nrpe-external-master nrpe-external-master subordinate
mysql:shared-db vault:shared-db mysql-shared regular
nrpe-container:monitors nagios:monitors monitors regular
nrpe-host:monitors nagios:monitors monitors regular
prometheus:grafana-source grafana:grafana-source grafana-source regular
prometheus:juju-info landscape-client:container juju-info subordinate
prometheus:nrpe-external-master nrpe-host:nrpe-external-master nrpe-external-master subordinate
telegraf:prometheus-client prometheus:target http regular
vault:cluster vault:cluster vault-ha peer
vault:juju-info filebeat:beats-host juju-info subordinate
vault:juju-info landscape-client:container juju-info subordinate
vault:juju-info telegraf:juju-info juju-info subordinate
vault:nrpe-external-master nrpe-host:nrpe-external-master nrpe-external-master subordinate
vault:secrets ceph-osd:secrets-storage vault-kv regular

Tags: cpe-onsite
Jeff Hillman (jhillman)
summary: - non cdk apps stuck at "waiting for /root/.cdk/config" when filebeat in
+ non cdk apps stuck at "waiting for /root/.kube/config" when filebeat in
cdk env
description: updated
Jeff Hillman (jhillman)
description: updated
description: updated
Changed in filebeat-charm:
assignee: nobody → Kevin W Monroe (kwmonroe)
importance: Undecided → Medium
Revision history for this message
George Kraft (cynerva) wrote :

Duplicate issue with additional details: https://bugs.launchpad.net/filebeat-charm/+bug/1860591

Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

This is happening because we assume filebeat with kube_logs=True will only be related to k8s applications:

https://github.com/juju-solutions/layer-filebeat/commit/3515ae3f283829e64363c09d3f701fd7f4e8b078#diff-12a676df1a18154f544f02c8d70e5519R67

That was not a good assumption. Immediate workaround is to deploy a separate filebeat charm with kube_logs=False (the default), and then relate that to the non-k8s apps:

-----
juju deploy cs:bionic/filebeat-29 filebeat-nok8s
juju relate filebeat-nok8s:beats-host landscape-$foo
juju relate filebeat-nok8s:logstash graylog
-----

I think the fix will either be (1) let filebeat proceed without a kubeconfig even if kube_logs=True, or (2) get rid of kube_logs altogether and just add k8s metadata if we detect a kubeconfig.

Problem with (1) is that today, filebeat will fail if kube metadata is rendered in the config without a kubeconfig present. Problem with (2) is that we may not re-render the template post-install, so if filebeat comes up before k8s charms have a valid kubeconfig, we'll miss the kube metadata even after that kubeconfig is present.

I'm working both angles now to see how we can best fix this.

Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

Couple PRs for review. This is fixed with a new flag in beats-base that only gets set when kube_logs=True and /root/.kube/config exists:

https://github.com/juju-solutions/layer-beats-base/pull/36

That is consumed by filebeat; k8s-related config is rendered when set, and ignored otherwise:

https://github.com/juju-solutions/layer-filebeat/pull/78

Revisions 31 and above have this fix, which is currently available on edge:

https://jaas.ai/filebeat/31

Changed in filebeat-charm:
status: New → In Progress
tags: added: review-needed
George Kraft (cynerva)
Changed in filebeat-charm:
status: In Progress → Fix Committed
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

Pulled in a change based on review. New revision for test is 32, which is now pushed through candidate:

https://jaas.ai/filebeat/32

If no objections, I'll push this to stable next week.

tags: removed: review-needed
Changed in charmed-kubernetes-bundles:
assignee: nobody → Kevin W Monroe (kwmonroe)
milestone: none → 1.18
status: New → In Progress
importance: Undecided → Low
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

This has been working well for me over the last week. Rev 32 is now in stable.

Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

CK bundle change is ready for review:

https://github.com/charmed-kubernetes/bundle/pull/763

Note, the change is simply to stop locking charms to a specific revision. This is in line with our other overlays, as we always want the latest stable charms.

Changed in charmed-kubernetes-bundles:
assignee: Kevin W Monroe (kwmonroe) → nobody
assignee: nobody → Kevin W Monroe (kwmonroe)
Revision history for this message
Camille Rodriguez (camille.rodriguez) wrote :

Thank you Kevin! We changed the filebeat version in our deployment to the most recent one you pushed out and it resolved the issue.

Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

CK bundle overlay has been committed.

Changed in charmed-kubernetes-bundles:
status: In Progress → Fix Committed
Changed in charmed-kubernetes-bundles:
status: Fix Committed → Fix Released
Eric Chen (eric-chen)
Changed in filebeat-charm:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.