Pod stuck in ContainerCreating for long time after worker host locked

Bug #1882814 reported by Nimalini Rasa
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Triaged
Low
Unassigned

Bug Description

Brief Description
-----------------
After worker host locked, some containers stuck in "containerCreating" for long time due to TLS handshake timeout with Calico on the recovering host (compute-6). Eventually the pod recovered (17min)

Severity
--------
Minor

Steps to Reproduce
------------------
-Launch test pods 30 per worker host.
-lock worker node with pods

Expected Behavior
------------------
Pods to recover within reasonable time

Actual Behavior
----------------
Pods recovery took longer than 17min

Reproducibility
---------------
seen once

System Configuration
--------------------
AIO-DX+ with 20 worker nodes

Branch/Pull Time/Commit
-----------------------
Build:2020-05-31_20-00-00

Last Pass
---------
N/A

Timestamp/Logs
--------------
Worker node Locked at:2020-05-31_20-00-00 (compute-4)
Number pods on compute4:37

QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 30s
                 node.kubernetes.io/unreachable:NoExecute for 30s
Events:
  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Normal Scheduled 23m default-scheduler Successfully assigned small-benchmark/small-7497946d9-2lksb to compute-6
  Warning FailedCreatePodSandBox 23m kubelet, compute-6 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "696c24f03791402eb47561d6bfad14f6d08e1868b29ec8d4a6bfeb840e3abb95": Multus: [small-benchmark/small-7497946d9-2lksb]: error adding container to network "chain": delegateAdd: error invoking conflistAdd - "chain": conflistAdd: error in getting result from AddNetworkList: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/ippools: net/http: TLS handshake timeout
  Warning FailedCreatePodSandBox 22m kubelet, compute-6 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3097ebad32acb76c253e041cebff90648297ec4a26c20d0383a69fb3b0473b93": Multus: [small-benchmark/small-7497946d9-2lksb]: error adding container to network "chain": delegateAdd: error invoking conflistAdd - "chain": conflistAdd: error in getting result from AddNetworkList: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/ippools: net/http: TLS handshake timeout
  Warning FailedCreatePodSandBox 21m kubelet, compute-6 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "f5337096de27c3f8449ee5ee11cf6dc71d4eef4b19be553e059853f5c1d3dc80": Multus: [small-benchmark/small-7497946d9-2lksb]: error adding container to network "chain": delegateAdd: error invoking conflistAdd - "chain": conflistAdd: error in getting result from AddNetworkList: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: net/http: TLS handshake timeout
  Warning FailedCreatePodSandBox 21m kubelet, compute-6 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "885fc1f294ef8e83c79f21919eeadf833f00923e26ae28f01d5f5437c1dbfb71": Multus: [small-benchmark/small-7497946d9-2lksb]: error adding container to network "chain": delegateAdd: error invoking conflistAdd - "chain": conflistAdd: error in getting result from AddNetworkList: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: net/http: TLS handshake timeout
  Warning FailedCreatePodSandBox 20m kubelet, compute-6 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e2d4c794e17829d28e4d45650f5069a598fb24b45d7c36c34c72ecbbe0f4e9fc": Multus: [small-benchmark/small-7497946d9-2lksb]: error adding container to network "chain": delegateAdd: error invoking conflistAdd - "chain": conflistAdd: error in getting result from AddNetworkList: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/ippools: net/http: TLS handshake timeout
  Warning FailedCreatePodSandBox 19m kubelet, compute-6 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3d154b12b0eb7e65d37b6043213489c39187d0f6a84e9b966c0cc4644e4f8c9e": Multus: [small-benchmark/small-7497946d9-2lksb]: error adding container to network "chain": delegateAdd: error invoking conflistAdd - "chain": conflistAdd: error in getting result from AddNetworkList: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: net/http: TLS handshake timeout
  Warning FailedCreatePodSandBox 19m kubelet, compute-6 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b4d4f02cdadb7e3be3adcbaf8eaa0f8c6a518cf071452b1483407b11eef1b448": Multus: [small-benchmark/small-7497946d9-2lksb]: error adding container to network "chain": delegateAdd: error invoking conflistAdd - "chain": conflistAdd: error in getting result from AddNetworkList: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/ippools: net/http: TLS handshake timeout
  Warning FailedCreatePodSandBox 18m kubelet, compute-6 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "db5763842af95374df3e52592a7e6e5dc1f7176e244586b1de198a8d32e501e5": Multus: [small-benchmark/small-7497946d9-2lksb]: error adding container to network "chain": delegateAdd: error invoking conflistAdd - "chain": conflistAdd: error in getting result from AddNetworkList: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: net/http: TLS handshake timeout
  Warning FailedCreatePodSandBox 17m kubelet, compute-6 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "f981810a951aff1969b4720753add22eda9544cd656b442dea48f66eebff55ba": Multus: [small-benchmark/small-7497946d9-2lksb]: error adding container to network "chain": delegateAdd: error invoking conflistAdd - "chain": conflistAdd: error in getting result from AddNetworkList: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/ippools: net/http: TLS handshake timeout
  Warning FailedCreatePodSandBox 13m (x6 over 17m) kubelet, compute-6 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "1a679cf17b18b6d0ef3005c43f86f18aba4087895b7ecdfdc40b9521e4c5f96e": Multus: [small-benchmark/small-7497946d9-2lksb]: error adding container to network "chain": delegateAdd: error invoking conflistAdd - "chain": conflistAdd: error in getting result from AddNetworkList: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: net/http: TLS handshake timeout
  Normal Pulled 13m kubelet, compute-6 Container image "gcr.io/kubernetes-e2e-test-images/resource-consumer:1.4" already present on machine
  Normal Created 13m kubelet, compute-6 Created container small
  Normal Started 13m kubelet, compute-6 Started container small

Test Activity
-------------
System Test

Revision history for this message
Nimalini Rasa (nrasa) wrote :
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as low priority for now given the pod recovered and the issue was seen once.
If this becomes more frequent, please let Yang Liu know so that we can raise the priority.

Changed in starlingx:
importance: Undecided → Low
status: New → Triaged
tags: added: stx.containers stx.networking
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.