StarlingX

containerized services logs need improvement and systemd order

Bug #1968781 reported by Jim Gauld on 2022-04-12

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	New	Low	Jim Gauld

Bug Description

Brief Description
-----------------
There is no service dependency relationship (i.e., Before=, After= in systemd service definitions) between the containerd.service and kubelet.service. This means that the start order and shutdown order can have races. There should also be an explicit dependency introduced for logger so can see the last logs when as system is going down.

containerd.service should start before kubelet.service (i.e. when host comes up).
containerd.service should stop after kubelet.service (i.e., when host shutdown, reboot)

Also noticed during host reboot, the shutdown of of services, the "logger" will disappear before containerd is shutdown. This can lead to missing logs. I have manually tested that the service ran to completion and filesystem was up, but logger was no longer available.

This was noticed when investigating unmanaged coredumps due to containers not getting SIGTERM due to unpredictable shutdown procedure.

Severity
--------
Minor: System is less predictable on shutdown.

Steps to Reproduce
------------------
Reboot a given host, and watch the order of services start/stop, eg.,
sudo systemctl reboot
sudo reboot

Look at specific service start/stop logs., eg.
grep -e k8s-container-cleanup -e "Started kubelet" -e "Stopped kubelet" -e "Kubernetes Kubelet Server" -e "containerd container runtime" /var/log/daemon.log

Expected Behavior
------------------
On shutdown, containerd.service should stop after kubelet.service.

Actual Behavior
----------------
Depending on the lab and how many containers provisioned, on shutdown, the order of stopping containerd.service can be different than kubelet.service.

Reproducibility
---------------
100%, but timing depends on lab and what is configured.

System Configuration
--------------------
All configs.

Branch/Pull Time/Commit
-----------------------
Apr 12, 2022.

Last Pass:
---------
Day one k8s containers issue.

Timestamp/Logs
--------------
Attach the logs for debugging (use attachments in Launchpad or for large collect files use: https://files.starlingx.kube.cengn.ca/)
Provide a snippet of logs here and the timestamp when issue was seen.
Please indicate the unique identifier in the logs to highlight the problem

Test Activity
-------------
[Sanity, Feature Testing, Regression Testing, Developer Testing, Evaluation, Other - Please specify]

Workaround
----------
None

Tags:

Ghada Khalil (gkhalil) on 2022-04-19

tags:	added: stx.containers
Changed in starlingx:
importance:	Undecided → Low

Jim Gauld (jgauld) on 2023-05-09

summary:	- containerd.service and kubelet.service should have dependency order + containerized services logs need improvement and systemd order
Changed in starlingx:
assignee:	nobody → Jim Gauld (jgauld)

Revision history for this message

Jim Gauld (jgauld) wrote on 2023-05-10:

Download full text (3.2 KiB)

This issue has popped up a few times. Some observations:

- Many systemd services are shutting down in parallel, including syslog.service.
- After syslog.service (alias: syslog-ng.service) shuts down, we lose any further shutdown logs.
- Can see different behavior in general for controllers vs worker nodes since there are different distribution of services and pods running. We reach syslog.service stopped sooner on workers than controllers.
- There is a missing service dependency on 'syslog.service' for services that we need to be able support/debug.
- To guarantee getting containerization cleanup logs during shutdown, we need to enforce systemd dependency "After:syslog.service" for both kubelet.service and containerd.service. This will make syslog.service shutdown after containers.
- The shutdown order for containerized services is not ideal. Kubelet requires containerd and etcd to function properly. There can be log floods of errors in kubelet if etcd not providing service. To improve gracefulness of containerization on shutdown, we should add the following dependency for kubelet.service:
"After=containerd.service etcd.service"
(OR, specify the equivalent Before=kubelet.service)
- The k8s-containerd-cleanup script can take too long to run, we don't always see the final line, "k8s-container-cleanup(127949): info : Stopping all container completed."
- the debugging of containers is very difficult since we often only have a container ID and nothing else identifying (eg, pod, container name, namespace, etc). Log scraping can be challenging without easier cross-referencing.

Recommendations:

* Improve K8S systemd service order and logging, append to the following 3 files.

./stx/config-files/containerd-config/*/containerd-stx-override.conf
(i.e, /etc/systemd/system/containerd.service.d/containerd-stx-override.conf )

After=syslog.service

./stx-puppet/puppet-manifests/src/modules/platform/templates/kube-stx-override.conf.erb
(i.e., /etc/systemd/system/kubelet.service.d/kube-stx-override.conf )

After=containerd.service etcd.service
After=syslog.service

docker-stx-override.conf :
After=syslog.service

Alternatively, make equivalent kube-stx-override.conf dependencies that do not require puppet-manifest generation:

containerd-stx-override.conf:

After=syslog.service
Before=kubelet.service

etcd-override.conf:

After=syslog.service
Before=kubelet.service

* Change the k8s-containerd-cleanup script to shutdown containers in parallel.

* Improve the support/debugging of Kubernetes - add one-liner log details during the k8s-container-cleanup script. Include more specific identifying info per container during shutdown.

Here is suggested prototype (this info provided by 'crictl inspect'):

2023-05-08T16:31:26.000 compute-0 k8s-container-cleanup(63687): info : pid: 27858 cgroupsPath: /k8s-infra/kubepods/besteffort/podf52c2b5d-8856-4948-b4fa-773aa3b2e568/4501d53cc43629d6a61b24bc9431e0e49565989ac69f9adee021a4aa1bfb31a8 id: 4501d53cc43629d6a61b24bc9431e0e49565989ac69f9adee021a4aa1bfb31a8 container.name: kube-proxy pod.name: kube-proxy-w7w8s pod.namespace: kube-system pod.uid: f52c2b5d-8856-4948-b4fa-773aa3b2e568 logPath: /var/log/pods/kube-system_k...

This issue has popped up a few times. Some observations:

- Many systemd services are shutting down in parallel, including syslog.service.
- After syslog.service (alias: syslog-ng.service) shuts down, we lose any further shutdown logs.
- Can see different behavior in general for controllers vs worker nodes since there are different distribution of services and pods running. We reach syslog.service stopped sooner on workers than controllers.
- There is a missing service dependency on 'syslog.service' for services that we need to be able support/debug.
- To guarantee getting containerization cleanup logs during shutdown, we need to enforce systemd dependency "After:syslog.service" for both kubelet.service and containerd.service. This will make syslog.service shutdown after containers.
- The shutdown order for containerized services is not ideal.  Kubelet requires containerd and etcd to function properly. There can be log floods of errors in kubelet if etcd not providing service. To improve gracefulness of containerization on shutdown, we should add the following dependency for kubelet.service:
 "After=containerd.service etcd.service"
 (OR,  specify the equivalent Before=kubelet.service)
- The k8s-containerd-cleanup script can take too long to run, we don't always see the final line, "k8s-container-cleanup(127949): info : Stopping all container completed."
- the debugging of containers is very difficult since we often only have a container ID and nothing else identifying (eg, pod, container name, namespace, etc). Log scraping can be challenging without easier cross-referencing.

Recommendations:

* Improve K8S systemd service order and logging, append to the following 3 files.

./stx/config-files/containerd-config/*/containerd-stx-override.conf
(i.e,  /etc/systemd/system/containerd.service.d/containerd-stx-override.conf )

After=syslog.service

./stx-puppet/puppet-manifests/src/modules/platform/templates/kube-stx-override.conf.erb
(i.e., /etc/systemd/system/kubelet.service.d/kube-stx-override.conf )

After=containerd.service etcd.service
After=syslog.service

docker-stx-override.conf :
After=syslog.service

Alternatively, make equivalent kube-stx-override.conf dependencies that do not require puppet-manifest generation:

containerd-stx-override.conf:

After=syslog.service
Before=kubelet.service

etcd-override.conf:

After=syslog.service
Before=kubelet.service

* Change the k8s-containerd-cleanup script to shutdown containers in parallel.

* Improve the support/debugging of Kubernetes - add one-liner log details during the k8s-container-cleanup script. Include more specific identifying info per container during shutdown.

Here is suggested prototype (this info provided by 'crictl inspect'):

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.