Brief Description
-----------------
Large number of orphaned pod volumes on some worker nodes(Worker-11 and Worker-13).
This possibly causes unexpected "NodeNotReady" state on the worker nodes.
After the "NodeNotReady", the system can not be recovered automatically.
Severity
--------
Critical
Steps to Reproduce
------------------
1. Worker-13 rebooted.
<Worker-13 kern.log>
2023-05-30T02:31:41.753 worker-13 kernel: notice [ 0.000000] Linux version 5.10.112-200.27.tis.el7.x86_64 (<email address hidden>) (gcc (GCC) 8.3.1 20190311 (Red Hat 8.3.1-3), GNU ld version 2.30-55.el7.2) #1 SMP PREEMPT Thu Feb 16 14:05:56 EST 2023
2023-05-30T02:31:41.753 worker-13 kernel: info [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.10.112-200.27.tis.el7.x86_64 root=UUID=4df4067a-05a7-44a7-84e7-64c50a48cf84 ro security_profile=standard module_blacklist=integrity,ima tboot=false crashkernel=512M biosdevname=0 console=tty0 iommu=pt usbcore.autosuspend=-1 selinux=0 enforcing=0 nmi_watchdog=panic,1 softlockup_panic=1 softdog.soft_panic=1 intel_iommu=on user_namespace.enable=1 intel_idle.max_cstate=0 LANG=en_US.UTF-8 hugepagesz=1G hugepages=60 hugepagesz=2M hugepages=0 default_hugepagesz=1G irqaffinity=1-6,13,16-18,25,28-55,57-62,69,72-74,81,84-111 isolcpus=7-12,14-15,19-24,26-27,63-68,70-71,75-80,82-83 rcu_nocbs=1-55,57-111 nohz_full=7-12,14-15,19-24,26-27,63-68,70-71,75-80,82-83 kthread_cpus=0,56 audit=0 audit_backlog_limit=8192 nopti nospectre_v2 nospectre_v1
2. Kubelet Server started
<Worker-13 daemon.log.1>
2023-05-30T02:33:08.120 worker-13 systemd[1]: info Starting Kubernetes Kubelet Server...
3. Kubelete process started
<Worker-13 daemon.log.1>
2023-05-30T02:33:15.221 worker-13 kubelet[43341]: info I0530 02:33:15.221448 43341 server.go:1190] "Started kubelet"
4. The orphaned pod volume message started.
<Worker-13 daemon.log.1>
2023-05-30T02:33:17.431 worker-13 kubelet[43341]: info I0530 02:33:17.430994 43341 kubelet_getters.go:300] "Path does not exist" path="/var/lib/kubelet/pods/0016f912-d586-4291-9040-8e407313645f/volumes"
2023-05-30T02:33:17.431 worker-13 kubelet[43341]: info I0530 02:33:17.431011 43341 kubelet_getters.go:300] "Path does not exist" path="/var/lib/kubelet/pods/0016f912-d586-4291-9040-8e407313645f/volumes"
2023-05-30T02:33:17.431 worker-13 kubelet[43341]: info I0530 02:33:17.431452 43341 kubelet_volumes.go:140] "Cleaned up orphaned pod volumes dir" podUID=0016f912-d586-4291-9040-8e407313645f path="/var/lib/kubelet/pods/0016f912-d586-4291-9040-8e407313645f/volumes"
5. "There were many similar errors. Turn up verbosity to see them." message issued.
<Worker-13 daemon.log.1>
2023-05-30T02:33:17.482 worker-13 kubelet[43341]: info E0530 02:33:17.482942 43341 kubelet_volumes.go:225] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"0016f912-d586-4291-9040-8e407313645f\" found, but failed to rmdir() subpath at path /var/lib/kubelet/pods/0016f912-d586-4291-9040-8e407313645f/volume-subpaths/localtime-cuupoam/oamcuup/0: not a directory" numErrs=118
Expected Behavior
------------------
Orphaned volumes are cleanedup.
Actual Behavior
----------------
Orphaned files are not cleaned-up.
Reproducibility
-----------------
100%
System Configuration
--------------------
<One node system, Two node system, Multi-node system, Dedicated storage, https, IPv4, IPv6 etc.>
Branch/Pull Time/Commit
-----------------------
Branch and the time when code was pulled or git commit or cengn load info
Last Pass
---------
Did this test scenario pass previously? If so, please indicate the load/pull time info of the last pass.
Use this section to also indicate if this is a new test scenario.
Timestamp/Logs
--------------
<Worker-13 daemon.log.1>
2023-05-30T02:33:17.480 worker-13 kubelet[43341]: info I0530 02:33:17.480675 43341 kubelet_getters.go:300] "Path does not exist" path="/var/lib/kubelet/pods/ef8363d3-0368-4c0d-8bf5-26ddd44e5f64/volumes"
2023-05-30T02:33:17.480 worker-13 kubelet[43341]: info I0530 02:33:17.480690 43341 kubelet_getters.go:300] "Path does not exist" path="/var/lib/kubelet/pods/ef8363d3-0368-4c0d-8bf5-26ddd44e5f64/volumes"
2023-05-30T02:33:17.481 worker-13 kubelet[43341]: info I0530 02:33:17.481132 43341 kubelet_volumes.go:140] "Cleaned up orphaned pod volumes dir" podUID=ef8363d3-0368-4c0d-8bf5-26ddd44e5f64 path="/var/lib/kubelet/pods/ef8363d3-0368-4c0d-8bf5-26ddd44e5f64/volumes"
2023-05-30T02:33:17.481 worker-13 kubelet[43341]: info I0530 02:33:17.481309 43341 kubelet_volumes.go:113] "Cleaned up orphaned volume from pod" podUID=f262085c-4a97-4e84-9bef-d407048338bf path="/var/lib/kubelet/pods/f262085c-4a97-4e84-9bef-d407048338bf/volumes/kubernetes.io~projected/kube-api-access-wzpgx"
2023-05-30T02:33:17.481 worker-13 kubelet[43341]: info I0530 02:33:17.481585 43341 kubelet_volumes.go:113] "Cleaned up orphaned volume from pod" podUID=f434ac8c-02dc-4c49-a49e-a7e08b82ed54 path="/var/lib/kubelet/pods/f434ac8c-02dc-4c49-a49e-a7e08b82ed54/volumes/kubernetes.io~projected/kube-api-access-g66q9"
2023-05-30T02:33:17.481 worker-13 kubelet[43341]: info I0530 02:33:17.481651 43341 kubelet_volumes.go:140] "Cleaned up orphaned pod volumes dir" podUID=f434ac8c-02dc-4c49-a49e-a7e08b82ed54 path="/var/lib/kubelet/pods/f434ac8c-02dc-4c49-a49e-a7e08b82ed54/volumes"
2023-05-30T02:33:17.482 worker-13 kubelet[43341]: info I0530 02:33:17.482197 43341 kubelet_volumes.go:113] "Cleaned up orphaned volume from pod" podUID=f8eb7eef-7395-4bde-b069-686a3ecc5b7d path="/var/lib/kubelet/pods/f8eb7eef-7395-4bde-b069-686a3ecc5b7d/volumes/kubernetes.io~projected/kube-api-access-pcf6x"
2023-05-30T02:33:17.482 worker-13 kubelet[43341]: info E0530 02:33:17.482942 43341 kubelet_volumes.go:225] "There were many similar errors. Turn up verbosity to see them." err="orphaned pod \"0016f912-d586-4291-904
Test Activity
-------------
Workaround
----------
https:/ /review. opendev. org/c/starlingx /stx-puppet/ +/889116