Nodes crash while powering off during graceful shutdown
Bug #2043069 reported by
Saba Touheed Mujawar
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
In Progress
|
Undecided
|
Jim Gauld |
Bug Description
Brief Description
-----------------
While trying to power off the Nodes using 'sudo systemctl poweroff', the nodes crash and reboot instead of shutting down.
Severity
--------
Critical
Steps to Reproduce
------------------
Execute 'sudo systemctl poweroff' on the node.
Expected Behavior
------------------
Proper node shutdown
Actual Behavior
----------------
Nodes crash and reboot instead of shutting down
System Configuration
-------
Distributed Cloud (Subcloud)
Changed in starlingx: | |
assignee: | nobody → Jim Gauld (jgauld) |
status: | New → Confirmed |
To post a comment you must log in.
After some instrumentation, discovered the following;
containerd does not terminate all of its child tasks during shut-down.
We see logs where systemd needs to kill containerd-shim tasks, and there are still "pause" tasks that remain to be killed.
* This can be resolved by making containerd's stop procedure actually stop all pods as well as containers (via script: /usr/local/ sbin/k8s- container- cleanup. sh).
We see logs where systemd needs to kill containerd-shim tasks, and there are still "pause" tasks that remain to be killed.
On locked hosts, DRBD volume(s) are not brought down sufficiently early during reboot/shutdown, which results in slight delays during reboot/shutdown. This requires a fix-up systemd unit like "drbd-shutdown. service" , which tears down the DRBD volumes during shut-down, after the following units are stopped: pmond.service sm.service containerd.service docker.service and before the network.target is stopped. On an unlocked node, this should not have a side effects, based on tests with an unlocked controller of an All-in-Duplex set-up.
* This can be resolved by adding a new drbd-shutdown. service, or updating the existing drbd.service provided by drbd-utils . Updated drbd.service dependencies should contain: target sshd.service pmond.service sm.service containerd.service docker.service
After=network.
Before=