worker nodes outside of openstack cannot join cluster when Octavia is used as LB for k8s-master

Bug #1878097 reported by Jeff Hillman
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Openstack Integrator Charm
Triaged
Medium
Unassigned

Bug Description

Kubernetes 1.17.5
Openstack Train (Stein for Octavia)

When using this bundle to deploy Openstack with Octavia:

https://pastebin.ubuntu.com/p/6K6P3psBwS/

Bare-metal worker nodes do not join the cluster permanently. They will join for a moment and then disappear and never come back.

the LB IP given to the masters from Octavia is 172.16.7.191 and is reachable from the bare-metal worker.

Openstack kubernetes bundle - https://pastebin.ubuntu.com/p/RD9nr9PFPV/

k8s-worker bare-metal bundle - https://pastebin.ubuntu.com/p/tTrm2987qd/

Juju status of kubernetes model on openstack controller:

---

Model Controller Cloud/Region Version SLA Timestamp
kubernetes openstack-regionone openstack/RegionOne 2.7.6 unsupported 16:39:35-04:00

App Version Status Scale Charm Store Rev OS Notes
ceph-proxy active 1 ceph-proxy jujucharms 29 ubuntu
containerd active 5 containerd jujucharms 61 ubuntu
easyrsa 3.0.1 active 1 easyrsa jujucharms 296 ubuntu
etcd 3.3.15 active 1 etcd jujucharms 496 ubuntu
flannel 0.11.0 active 5 flannel jujucharms 468 ubuntu
kubernetes-master 1.17.5 active 3 kubernetes-master jujucharms 808 ubuntu exposed
kubernetes-worker-os 1.17.5 active 2 kubernetes-worker jujucharms 634 ubuntu exposed
openstack-integrator train active 1 openstack-integrator jujucharms 59 ubuntu

Unit Workload Agent Machine Public address Ports Message
ceph-proxy/0* active idle 0 172.16.7.184 Ready to proxy settings
easyrsa/0* active idle 1 172.16.7.193 Certificate Authority connected.
etcd/0* active idle 2 172.16.7.179 2379/tcp Healthy with 1 known peer
kubernetes-master/0* active idle 3 172.16.7.203 6443/tcp Kubernetes master running.
  containerd/0* active idle 172.16.7.203 Container runtime available
  flannel/0* active idle 172.16.7.203 Flannel subnet 10.1.89.1/24
kubernetes-master/1 active idle 4 172.16.7.185 6443/tcp Kubernetes master running.
  containerd/4 active idle 172.16.7.185 Container runtime available
  flannel/4 active idle 172.16.7.185 Flannel subnet 10.1.67.1/24
kubernetes-master/2 active idle 5 172.16.7.187 6443/tcp Kubernetes master running.
  containerd/3 active idle 172.16.7.187 Container runtime available
  flannel/3 active idle 172.16.7.187 Flannel subnet 10.1.39.1/24
kubernetes-worker-os/0* active idle 6 172.16.7.192 80/tcp,443/tcp Kubernetes worker running.
  containerd/1 active idle 172.16.7.192 Container runtime available
  flannel/1 active idle 172.16.7.192 Flannel subnet 10.1.38.1/24
kubernetes-worker-os/1 active idle 7 172.16.7.201 80/tcp,443/tcp Kubernetes worker running.
  containerd/2 active idle 172.16.7.201 Container runtime available
  flannel/2 active idle 172.16.7.201 Flannel subnet 10.1.36.1/24
openstack-integrator/0* active idle 8 172.16.7.194 Ready

Machine State DNS Inst id Series AZ Message
0 started 172.16.7.184 00028a1f-2e1f-4dc0-8660-6a52fb494ab1 bionic nova ACTIVE
1 started 172.16.7.193 aaa825e1-cc01-465b-b782-df7439b930f3 bionic nova ACTIVE
2 started 172.16.7.179 4de47d67-fad8-4ed5-9cfb-45723645c80f bionic nova ACTIVE
3 started 172.16.7.203 20d36010-97af-4217-833b-10ba27d89b37 bionic nova ACTIVE
4 started 172.16.7.185 9bd706de-15a3-4eb9-a690-74d9ff07ce4c bionic nova ACTIVE
5 started 172.16.7.187 880cb07a-194b-430e-bc4e-02f29285efe8 bionic nova ACTIVE
6 started 172.16.7.192 f4bfbd36-71c0-4585-814a-3377ba520563 bionic nova ACTIVE
7 started 172.16.7.201 cb82027c-3033-4439-a05b-1f8edd453db6 bionic nova ACTIVE
8 started 172.16.7.194 507048d1-6378-4b0e-8307-e011d23a25d7 bionic nova ACTIVE

Offer Application Charm Rev Connected Endpoint Interface Role
easyrsa easyrsa easyrsa 296 1/1 client tls-certificates provider
etcd etcd etcd 496 1/1 db etcd provider
kubernetes-master-api-endpoint kubernetes-master kubernetes-master 808 1/1 kube-api-endpoint http provider
kubernetes-master-control kubernetes-master kubernetes-master 808 1/1 kube-control kube-control provider

Relation provider Requirer Interface Type Message
ceph-proxy:client kubernetes-master:ceph-client ceph-client regular
easyrsa:client etcd:certificates tls-certificates regular
easyrsa:client kubernetes-master:certificates tls-certificates regular
easyrsa:client kubernetes-worker-os:certificates tls-certificates regular
etcd:cluster etcd:cluster etcd peer
etcd:db flannel:etcd etcd regular
etcd:db kubernetes-master:etcd etcd regular
kubernetes-master:cni flannel:cni kubernetes-cni subordinate
kubernetes-master:container-runtime containerd:containerd container-runtime subordinate
kubernetes-master:coordinator kubernetes-master:coordinator coordinator peer
kubernetes-master:kube-api-endpoint kubernetes-worker-os:kube-api-endpoint http regular
kubernetes-master:kube-control kubernetes-worker-os:kube-control kube-control regular
kubernetes-master:kube-masters kubernetes-master:kube-masters kube-masters peer
kubernetes-worker-os:cni flannel:cni kubernetes-cni subordinate
kubernetes-worker-os:container-runtime containerd:containerd container-runtime subordinate
kubernetes-worker-os:coordinator kubernetes-worker-os:coordinator coordinator peer
openstack-integrator:clients kubernetes-master:openstack openstack-integration regular
openstack-integrator:clients kubernetes-worker-os:openstack openstack-integration regular
openstack-integrator:loadbalancer kubernetes-master:loadbalancer public-address regular

---

juju status of bare-metal worker model on MAAS controller:

---

Model Controller Cloud/Region Version SLA Timestamp
k8s-worker jhillman-maas jhillman-maas 2.7.6 unsupported 16:41:01-04:00

SAAS Status Store URL
easyrsa active openstack-regionone admin/kubernetes.easyrsa
etcd active openstack-regionone admin/kubernetes.etcd
kubernetes-master-api-endpoint active openstack-regionone admin/kubernetes.kubernetes-master-api-endpoint
kubernetes-master-control active openstack-regionone admin/kubernetes.kubernetes-master-control

App Version Status Scale Charm Store Rev OS Notes
containerd active 1 containerd jujucharms 61 ubuntu
flannel 0.11.0 active 1 flannel jujucharms 468 ubuntu
kubernetes-worker-bm 1.17.5 active 1 kubernetes-worker jujucharms 634 ubuntu

Unit Workload Agent Machine Public address Ports Message
kubernetes-worker-bm/0* active idle 0 172.16.7.92 80/tcp,443/tcp Kubernetes worker running.
  containerd/0* active idle 172.16.7.92 Container runtime available
  flannel/0* active idle 172.16.7.92 Flannel subnet 10.1.92.1/24

Machine State DNS Inst id Series AZ Message
0 started 172.16.7.92 agrippa bionic default Deployed

Relation provider Requirer Interface Type Message
easyrsa:client kubernetes-worker-bm:certificates tls-certificates regular
etcd:db flannel:etcd etcd regular
kubernetes-master-api-endpoint:kube-api-endpoint kubernetes-worker-bm:kube-api-endpoint http regular
kubernetes-master-control:kube-control kubernetes-worker-bm:kube-control kube-control regular
kubernetes-worker-bm:cni flannel:cni kubernetes-cni subordinate
kubernetes-worker-bm:container-runtime containerd:containerd container-runtime subordinate
kubernetes-worker-bm:coordinator kubernetes-worker-bm:coordinator coordinator peer

---

As can be seen on the bare-metal model, the charm says that the worker is running, but 'kubectl get nodes' only shows the openstack nodes.

---

$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
juju-472727-kubernetes-6 Ready <none> 66m v1.17.5
juju-472727-kubernetes-7 Ready <none> 67m v1.17.5

---

unit log from the bare-metal worker and syslog from bare-metal worker will be uploaded to the bug.

In the syslog is the repeated message of:

---

May 11 20:42:36 agrippa kubelet.daemon[25990]: E0511 20:42:36.086470 25990 kubelet.go:2263] node "agrippa" not found
May 11 20:42:36 agrippa kubelet.daemon[25990]: E0511 20:42:36.140653 25990 kubelet_node_status.go:402] Error updating node status, will retry: error getting node "agrippa": nodes "agrippa" not found
May 11 20:42:36 agrippa kubelet.daemon[25990]: E0511 20:42:36.143578 25990 kubelet_node_status.go:402] Error updating node status, will retry: error getting node "agrippa": nodes "agrippa" not found
May 11 20:42:36 agrippa kubelet.daemon[25990]: E0511 20:42:36.167508 25990 kubelet_node_status.go:402] Error updating node status, will retry: error getting node "agrippa": nodes "agrippa" not found
May 11 20:42:36 agrippa kubelet.daemon[25990]: W0511 20:42:36.167566 25990 reflector.go:328] k8s.io/client-go/informers/factory.go:135: watch of *v1beta1.RuntimeClass ended with: very short watch: k8s.io/client-go/informers/factory.go:135: Unexpected watch close - watch lasted less than a second and no items received
May 11 20:42:36 agrippa kubelet.daemon[25990]: W0511 20:42:36.167586 25990 reflector.go:328] k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: watch of *v1.Service ended with: very short watch: k8s.io/kubernetes/pkg/kubelet/kubelet.go:449: Unexpected watch close - watch lasted less than a second and no items received
May 11 20:42:36 agrippa kubelet.daemon[25990]: W0511 20:42:36.167601 25990 reflector.go:328] k8s.io/client-go/informers/factory.go:135: watch of *v1beta1.CSIDriver ended with: very short watch: k8s.io/client-go/informers/factory.go:135: Unexpected watch close - watch lasted less than a second and no items received
May 11 20:42:36 agrippa kubelet.daemon[25990]: W0511 20:42:36.167565 25990 reflector.go:328] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: watch of *v1.Pod ended with: very short watch: k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:46: Unexpected watch close - watch lasted less than a second and no items received
May 11 20:42:36 agrippa kubelet.daemon[25990]: W0511 20:42:36.167621 25990 reflector.go:328] k8s.io/kubernetes/pkg/kubelet/kubeletconfig/controller.go:227: watch of *v1.Node ended with: very short watch: k8s.io/kubernetes/pkg/kubelet/kubeletconfig/controller.go:227: Unexpected watch close - watch lasted less than a second and no items received
May 11 20:42:36 agrippa kubelet.daemon[25990]: W0511 20:42:36.167647 25990 reflector.go:328] k8s.io/kubernetes/pkg/kubelet/kubelet.go:458: watch of *v1.Node ended with: very short watch: k8s.io/kubernetes/pkg/kubelet/kubelet.go:458: Unexpected watch close - watch lasted less than a second and no items received
May 11 20:42:36 agrippa kubelet.daemon[25990]: E0511 20:42:36.175638 25990 kubelet_node_status.go:402] Error updating node status, will retry: error getting node "agrippa": nodes "agrippa" not found
May 11 20:42:36 agrippa kubelet.daemon[25990]: E0511 20:42:36.183583 25990 kubelet_node_status.go:402] Error updating node status, will retry: error getting node "agrippa": nodes "agrippa" not found
May 11 20:42:36 agrippa kubelet.daemon[25990]: E0511 20:42:36.183612 25990 kubelet_node_status.go:389] Unable to update node status: update node status exceeds retry count
May 11 20:42:36 agrippa kubelet.daemon[25990]: E0511 20:42:36.186637 25990 kubelet.go:2263] node "agrippa" not found

---

The kubernetes master can resolve this node name (agrippa), in fact everyone in this environment can resolve that name, and the fqdn of agrippa.maas.

A grep of 'agrippa' on the syslog of a kubernetes master in the kubernetes model on the openstack controller.:

---

May 11 20:18:56 juju-472727-kubernetes-3 kube-controller-manager.daemon[2019]: W0511 20:18:56.729915 2019 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="agrippa" does not exist
May 11 20:18:56 juju-472727-kubernetes-3 kube-scheduler.daemon[2654]: I0511 20:18:56.731467 2654 node_tree.go:86] Added node "agrippa" in group "" to NodeTree
May 11 20:18:56 juju-472727-kubernetes-3 kube-apiserver.daemon[5587]: I0511 20:18:56.754398 5587 httplog.go:90] PATCH /api/v1/nodes/agrippa: (20.956037ms) 200 [kube-controller-manager/v1.17.5 (linux/amd64) kubernetes/e0fccaf/system:serviceaccount:kube-system:ttl-controller 172.16.7.203:36290]
May 11 20:18:56 juju-472727-kubernetes-3 kube-controller-manager.daemon[2019]: I0511 20:18:56.755454 2019 ttl_controller.go:271] Changed ttl annotation for node agrippa to 0 seconds
May 11 20:19:00 juju-472727-kubernetes-3 kube-controller-manager.daemon[2019]: I0511 20:19:00.375328 2019 node_lifecycle_controller.go:787] Controller observed a new Node: "agrippa"
May 11 20:19:00 juju-472727-kubernetes-3 kube-controller-manager.daemon[2019]: I0511 20:19:00.375413 2019 controller_utils.go:167] Recording Registered Node agrippa in Controller event message for node agrippa
May 11 20:19:00 juju-472727-kubernetes-3 kube-controller-manager.daemon[2019]: W0511 20:19:00.375668 2019 node_lifecycle_controller.go:1058] Missing timestamp for Node agrippa. Assuming now as a timestamp.
May 11 20:19:00 juju-472727-kubernetes-3 kube-controller-manager.daemon[2019]: I0511 20:19:00.375722 2019 node_lifecycle_controller.go:886] Node agrippa is NotReady as of 2020-05-11 20:19:00.375686429 +0000 UTC m=+193.241321992. Adding it to the Taint queue.
May 11 20:19:00 juju-472727-kubernetes-3 kube-controller-manager.daemon[2019]: I0511 20:19:00.376226 2019 event.go:281] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"agrippa", UID:"a08bd7e1-7538-42dd-ab67-8803417c5a08", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RegisteredNode' Node agrippa event: Registered Node agrippa in Controller
May 11 20:19:00 juju-472727-kubernetes-3 kube-apiserver.daemon[5587]: I0511 20:19:00.429348 5587 httplog.go:90] GET /api/v1/nodes/agrippa?resourceVersion=0: (493.892µs) 200 [kube-controller-manager/v1.17.5 (linux/amd64) kubernetes/e0fccaf/system:serviceaccount:kube-system:node-controller 172.16.7.203:36290]
May 11 20:19:00 juju-472727-kubernetes-3 kube-apiserver.daemon[5587]: I0511 20:19:00.459648 5587 httplog.go:90] PATCH /api/v1/nodes/agrippa: (28.677137ms) 200 [kube-controller-manager/v1.17.5 (linux/amd64) kubernetes/e0fccaf/system:serviceaccount:kube-system:node-controller 172.16.7.203:36290]
May 11 20:19:00 juju-472727-kubernetes-3 kube-controller-manager.daemon[2019]: I0511 20:19:00.865938 2019 garbagecollector.go:404] processing item [storage.k8s.io/v1/CSINode, namespace: , name: agrippa, uid: 5e9fa110-bd2a-45d6-864f-f55fa44c3411]
May 11 20:19:00 juju-472727-kubernetes-3 kube-controller-manager.daemon[2019]: I0511 20:19:00.866416 2019 garbagecollector.go:404] processing item [coordination.k8s.io/v1/Lease, namespace: kube-node-lease, name: agrippa, uid: ddf1cd1b-5d73-4217-9a96-b931617218ef]
May 11 20:19:00 juju-472727-kubernetes-3 kube-scheduler.daemon[2654]: I0511 20:19:00.867303 2654 node_tree.go:100] Removed node "agrippa" in group "" from NodeTree
May 11 20:19:01 juju-472727-kubernetes-3 kube-apiserver.daemon[5587]: I0511 20:19:01.277224 5587 httplog.go:90] GET /apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/agrippa: (50.177293ms) 200 [kube-controller-manager/v1.17.5 (linux/amd64) kubernetes/e0fccaf/system:serviceaccount:kube-system:generic-garbage-collector 172.16.7.203:36290]
May 11 20:19:01 juju-472727-kubernetes-3 kube-controller-manager.daemon[2019]: I0511 20:19:01.277839 2019 garbagecollector.go:517] delete object [coordination.k8s.io/v1/Lease, namespace: kube-node-lease, name: agrippa, uid: ddf1cd1b-5d73-4217-9a96-b931617218ef] with propagation policy Background
May 11 20:19:01 juju-472727-kubernetes-3 kube-apiserver.daemon[5587]: I0511 20:19:01.279723 5587 httplog.go:90] GET /apis/storage.k8s.io/v1/csinodes/agrippa: (54.192897ms) 200 [kube-controller-manager/v1.17.5 (linux/amd64) kubernetes/e0fccaf/system:serviceaccount:kube-system:generic-garbage-collector 172.16.7.203:36290]
May 11 20:19:01 juju-472727-kubernetes-3 kube-controller-manager.daemon[2019]: I0511 20:19:01.280727 2019 garbagecollector.go:517] delete object [storage.k8s.io/v1/CSINode, namespace: , name: agrippa, uid: 5e9fa110-bd2a-45d6-864f-f55fa44c3411] with propagation policy Background
May 11 20:19:01 juju-472727-kubernetes-3 kube-apiserver.daemon[5587]: I0511 20:19:01.295025 5587 httplog.go:90] DELETE /apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/agrippa: (16.394536ms) 200 [kube-controller-manager/v1.17.5 (linux/amd64) kubernetes/e0fccaf/system:serviceaccount:kube-system:generic-garbage-collector 172.16.7.203:36290]
May 11 20:19:01 juju-472727-kubernetes-3 kube-apiserver.daemon[5587]: I0511 20:19:01.299567 5587 httplog.go:90] DELETE /apis/storage.k8s.io/v1/csinodes/agrippa: (18.133653ms) 200 [kube-controller-manager/v1.17.5 (linux/amd64) kubernetes/e0fccaf/system:serviceaccount:kube-system:generic-garbage-collector 172.16.7.203:36290]
May 11 20:19:05 juju-472727-kubernetes-3 kube-controller-manager.daemon[2019]: I0511 20:19:05.376133 2019 node_lifecycle_controller.go:799] Controller observed a Node deletion: agrippa
May 11 20:19:05 juju-472727-kubernetes-3 kube-controller-manager.daemon[2019]: I0511 20:19:05.376170 2019 controller_utils.go:167] Recording Removing Node agrippa from Controller event message for node agrippa
May 11 20:19:05 juju-472727-kubernetes-3 kube-controller-manager.daemon[2019]: I0511 20:19:05.376762 2019 event.go:281] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"agrippa", UID:"a08bd7e1-7538-42dd-ab67-8803417c5a08", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'RemovingNode' Node agrippa event: Removing Node agrippa from Controller
May 11 20:19:22 juju-472727-kubernetes-3 kube-controller-manager.daemon[2019]: W0511 20:19:22.752776 2019 actual_state_of_world.go:506] Failed to update statusUpdateNeeded field in actual state of world: Failed to set statusUpdateNeeded to needed true, because nodeName="agrippa" does not exist

---

the syslog of that kubernetes-master node will be uploaded to the bug.

Using the same openstack deployment, creating a kubernetes model with this bundle: https://pastebin.ubuntu.com/p/Np9Fbqxs9q/ and the same bare-metal bundle (except of course the changing of the consumption-to-offers because there's no kube-api-lb in the octavia bundle), the bare-metal worker is able to join into the environment and can freely pass workloads from openstack to bare-metal.

from a lot of troubleshooting, the only key differences are:

- octavia instead of kube-api-lb (used with address pairs in openstack to allow a floating VIP)
- the relations to openstack-integrator for k8s-worker/master
- the CMR going to k8s-master for the api-endpoint as opposed to kube-api-lb

I believe the issue is coming into play somehow with how either octavia is being configured, or possibly how the masters are listening. The /root/.kube/config file is identical on the bare-metal workers and the openstack workers. and the bare-metal worker can run kubectl commands successfully with this config.

Revision history for this message
Jeff Hillman (jhillman) wrote :

tarball containing the following:

kubernetes-master-os-syslog - /var/log/syslog from k8s-master in kubernetes model in openstack controller

unit-kubernetes-master-0.log - /var/log/juju unit file from k8s-master in kubernetes model in openstack controller

kubernetes-worker-bm-syslog - /var/log/syslog from k8s-worker in bare-metal model in MAAS

un-tkubernetes-worker-bm.log - /var/log/juju unit file from k8s-worker in bare-metal model in MAAS

Revision history for this message
George Kraft (cynerva) wrote :

The nodes are getting created, and then subsequently deleted. I strongly suspect that openstack-cloud-controller-manager is deleting nodes that it doesn't recognize. I've seen similar behavior when using other cloud providers.

Can you share log output from the openstack-cloud-controller-manager pods in the kube-system namespace?

Changed in charm-openstack-integrator:
status: New → Incomplete
Revision history for this message
Jeff Hillman (jhillman) wrote :
Download full text (3.3 KiB)

Good call. Here's the end of the logs doing what you said:

---

E0511 20:19:00.708403 1 node_lifecycle_controller.go:154] error checking if node agrippa is shutdown: ProviderID "" didn't match expected format "openstack:///InstanceID"
I0511 20:19:00.836908 1 event.go:258] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"agrippa", UID:"a08bd7e1-7538-42dd-ab67-8803417c5a08", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'Deleting node agrippa because it does not exist in the cloud provider' Node agrippa event: DeletingNode
E0511 20:19:25.859834 1 node_lifecycle_controller.go:154] error checking if node agrippa is shutdown: ProviderID "" didn't match expected format "openstack:///InstanceID"
I0511 20:19:25.983624 1 event.go:258] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"agrippa", UID:"c19e2bb5-e45f-491c-8d11-67b5537b875a", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'Deleting node agrippa because it does not exist in the cloud provider' Node agrippa event: DeletingNode
E0511 20:19:46.030642 1 node_lifecycle_controller.go:154] error checking if node agrippa is shutdown: ProviderID "" didn't match expected format "openstack:///InstanceID"
I0511 20:19:46.170501 1 event.go:258] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"agrippa", UID:"e90d4b0a-f0ee-4e27-8d66-0da6a8bd8549", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'Deleting node agrippa because it does not exist in the cloud provider' Node agrippa event: DeletingNode
E0511 20:20:11.193591 1 node_lifecycle_controller.go:154] error checking if node agrippa is shutdown: ProviderID "" didn't match expected format "openstack:///InstanceID"
I0511 20:20:11.488352 1 event.go:258] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"agrippa", UID:"46fa58d7-ba56-4e05-8690-6e068efcbe3b", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'Deleting node agrippa because it does not exist in the cloud provider' Node agrippa event: DeletingNode
E0511 20:21:46.530713 1 node_lifecycle_controller.go:154] error checking if node agrippa is shutdown: ProviderID "" didn't match expected format "openstack:///InstanceID"
I0511 20:21:46.662844 1 event.go:258] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"agrippa", UID:"0241a0e0-9158-4a9c-9581-0de920f1aae1", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'Deleting node agrippa because it does not exist in the cloud provider' Node agrippa event: DeletingNode
E0511 20:26:01.752260 1 node_lifecycle_controller.go:154] error checking if node agrippa is shutdown: ProviderID "" didn't match expected format "openstack:///InstanceID"
I0511 20:26:01.885283 1 event.go:258] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"agrippa", UID:"c51d161c-df6b-4813-beaf-0b8d4823b952", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'Deleting node agrippa because it does not exist in the cloud provider' Node agrippa event: DeletingNode

---

I suspected something like this for that and Cinder. I ...

Read more...

Revision history for this message
George Kraft (cynerva) wrote :

> I can try modifying the daemonset to only use the node labels that match openstack nodes if you think it will help.

I suspect the node will get deleted anyway, but I'm not sure. If you have time to spare and can try it, that would be a great help to us.

If it doesn't work, then I'm not sure we'll be able to do much about it in the charms. We can at least open an issue against openstack-cloud-controller-manager to see if there's anything that can be done there.

Changed in charm-openstack-integrator:
importance: Undecided → Critical
importance: Critical → High
status: Incomplete → Triaged
Revision history for this message
Jeff Hillman (jhillman) wrote :
Download full text (4.0 KiB)

That has the same issue. I used the failure-domain label for the daemonset:

---

$ kubectl get all -n kube-system
NAME READY STATUS RESTARTS AGE
pod/coredns-6bf76f8dc5-z22gc 1/1 Running 0 21h
pod/csi-cinder-controllerplugin-0 4/4 Running 1 21h
pod/csi-cinder-nodeplugin-2nd24 2/2 Running 0 23m
pod/csi-cinder-nodeplugin-kkv6d 2/2 Running 0 23m
pod/heapster-v1.6.0-beta.1-6747db6947-mq8v8 4/4 Running 0 20h
pod/kube-state-metrics-7c765f4c5c-tvnhs 1/1 Running 0 21h
pod/metrics-server-v0.3.6-75cd4549f8-84tbv 2/2 Running 0 21h
pod/monitoring-influxdb-grafana-v4-7f879555b-qw4fb 2/2 Running 0 21h
pod/openstack-cloud-controller-manager-q5hcn 1/1 Running 0 23m
pod/openstack-cloud-controller-manager-rfwzm 1/1 Running 0 23m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/csi-cinder-controller-service ClusterIP 10.152.183.93 <none> 12345/TCP 21h
service/heapster ClusterIP 10.152.183.40 <none> 80/TCP 21h
service/kube-dns ClusterIP 10.152.183.227 <none> 53/UDP,53/TCP,9153/TCP 21h
service/kube-state-metrics ClusterIP 10.152.183.52 <none> 8080/TCP,8081/TCP 21h
service/metrics-server ClusterIP 10.152.183.129 <none> 443/TCP 21h
service/monitoring-grafana ClusterIP 10.152.183.44 <none> 80/TCP 21h
service/monitoring-influxdb ClusterIP 10.152.183.231 <none> 8083/TCP,8086/TCP 21h

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/csi-cinder-nodeplugin 2 2 2 2 2 failure-domain.beta.kubernetes.io/region=RegionOne 21h
daemonset.apps/openstack-cloud-controller-manager 2 2 2 2 2 failure-domain.beta.kubernetes.io/region=RegionOne 21h

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/coredns 1/1 1 1 21h
deployment.apps/heapster-v1.6.0-beta.1 1/1 1 1 21h
deployment.apps/kube-state-metrics 1/1 1 1 21h
deployment.apps/metrics-server-v0.3.6 1/1 1 1 21h
deployment.apps/monitoring-influxdb-grafana-v4 1/1 1 1 21h

NAME DESIRED CURRENT READY AGE
replicaset.apps/coredns-6bf76f8dc5 1 1 1 21h
replicaset.apps/heapster-v1.6.0-beta.1-5cff8964b7 ...

Read more...

George Kraft (cynerva)
Changed in charm-openstack-integrator:
importance: High → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.