StarlingX

Nodes crash while powering off during graceful shutdown

Bug #2043069 reported by Saba Touheed Mujawar on 2023-11-09

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	In Progress	Undecided	Jim Gauld

Bug Description

Brief Description
-----------------
While trying to power off the Nodes using 'sudo systemctl poweroff', the nodes crash and reboot instead of shutting down.

Severity
--------
Critical

Steps to Reproduce
------------------
Execute 'sudo systemctl poweroff' on the node.

Expected Behavior
------------------
Proper node shutdown

Actual Behavior
----------------
Nodes crash and reboot instead of shutting down

System Configuration
--------------------
Distributed Cloud (Subcloud)

Jim Gauld (jgauld) on 2023-11-09

Changed in starlingx:
assignee:	nobody → Jim Gauld (jgauld)
status:	New → Confirmed

Revision history for this message

Jim Gauld (jgauld) wrote on 2023-11-09:

After some instrumentation, discovered the following;

containerd does not terminate all of its child tasks during shut-down.
We see logs where systemd needs to kill containerd-shim tasks, and there are still "pause" tasks that remain to be killed.

* This can be resolved by making containerd's stop procedure actually stop all pods as well as containers (via script: /usr/local/sbin/k8s-container-cleanup.sh).
We see logs where systemd needs to kill containerd-shim tasks, and there are still "pause" tasks that remain to be killed.

On locked hosts, DRBD volume(s) are not brought down sufficiently early during reboot/shutdown, which results in slight delays during reboot/shutdown. This requires a fix-up systemd unit like "drbd-shutdown.service", which tears down the DRBD volumes during shut-down, after the following units are stopped: pmond.service sm.service containerd.service docker.service and before the network.target is stopped. On an unlocked node, this should not have a side effects, based on tests with an unlocked controller of an All-in-Duplex set-up.

* This can be resolved by adding a new drbd-shutdown.service, or updating the existing drbd.service provided by drbd-utils . Updated drbd.service dependencies should contain:
After=network.target sshd.service
Before=pmond.service sm.service containerd.service docker.service

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-11-09: Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/900507

Changed in starlingx:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-11-09: Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/900514

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-11-09: Fix proposed to config-files (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config-files/+/900515

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-11-09: Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/900514
Committed: https://opendev.org/starlingx/stx-puppet/commit/528019a81a82befc0f42283752785942d08102d1
Submitter: "Zuul (22348)"
Branch: master

commit 528019a81a82befc0f42283752785942d08102d1
Author: Jim Gauld <email address hidden>
Date: Thu Nov 9 05:38:23 2023 -0500

Add systemd service dependency order for kubelet

This updates kubelet.service service override file
kube-stx-override.conf with systemd dependencies:

After=containerd.service etcd.service
After=syslog.service

    This improves the startup and shutdown of kubernetes.
    syslog.service is added here because we had cases of
    missing logs during shutdown.

    Test plan:
    - PASS - build-image, install and boot up on AIO-SX
    - PASS - verify service order dependencies via
             'sudo systemd-analyse dump'
    - PASS - perform reboot and verify /var/log/daemon.log
             that etcd and containerd starts before kubelet
             and stops after kubelet.

Partial-Bug: 2043069

Change-Id: I47117d56a9380b754bf5437885dc1614e7dd7ab3
Signed-off-by: Jim Gauld <email address hidden>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-11-09: Fix merged to config-files (master)

Reviewed: https://review.opendev.org/c/starlingx/config-files/+/900515
Committed: https://opendev.org/starlingx/config-files/commit/b6c7d16f1c10e833118e30e6860cbd8071757eb0
Submitter: "Zuul (22348)"
Branch: master

commit b6c7d16f1c10e833118e30e6860cbd8071757eb0
Author: Jim Gauld <email address hidden>
Date: Thu Nov 9 05:59:36 2023 -0500

Add syslog.service dependency order to containerd

This updates containerd.service service override file
containerd-stx-override.conf with systemd dependencies:

After=syslog.service

This addresses cases of missing logs during shutdown.

Partial-Bug: 2043069

Change-Id: I136c584c3832e17cdf35d7ba87387bd3ce3f4a2d
Signed-off-by: Jim Gauld <email address hidden>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-11-09: Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/900507
Committed: https://opendev.org/starlingx/integ/commit/05bbc77057b84975cdae095c0173edb223363c0e
Submitter: "Zuul (22348)"
Branch: master

commit 05bbc77057b84975cdae095c0173edb223363c0e
Author: Jim Gauld <email address hidden>
Date: Thu Nov 9 04:17:58 2023 -0500

Improve shutdown of containerd

    This update is to prevent nodes from crashing while powering
    off during graceful shutdown (or reboot). This improves timing
    and shutdown of containerd.service.

    The containerd shutdown script stops all containers via
    'crictl stop' with 5 second timeout, followed by stop all
    pods via 'crictl stopp'. This cleans up lingering /pause
    sandbox containers.

    This modifies the arguments to xargs and crictl to let xargs
    deal with parallelism instead of batching to crictl.
    crictl appears to do the stop operations serially.

The number stop in parallel is engineered to 10.

    Engineering the number of stop in parallel in relation to
    shutdown timings under stress load will be addressed in a
    subsequent update. The engineering TC should align with
    customer requirements.

    When testing containerd shutdown under the stress of multiple
    pods writing to a shared PersistentVolume, even the new parallel
    shutdown code is not sufficient to complete the shutdown within
    the default 90-second timeout. Additional changes will be needed
    to enable clean shutdown under those circumstances.

Partial-Bug: 2043069

    Test plan:
    - PASS - build-image, install and boot up on AIO-SX
    - PASS - perform reboot and verify /var/log/daemon.log
             has new k8s-container-cleanup.sh logs
             for 'Stopping all pods' and 'Stopping all containers',
             and that drbd stops after containerd.
    - FAIL - verify containerd shutdown works under stress with
             the new parallel stop pods parameter NPAR=10.
             The stress load uses ReadWriteMany PVC, and multiple
             pods, each writing to the shared PVC.

Change-Id: Ibfc0a474a40344a629b3f0780449906a9c6b03ba
Signed-off-by: Jim Gauld <email address hidden>

Reviewed:  https://review.opendev.org/c/starlingx/integ/+/900507
Committed: https://opendev.org/starlingx/integ/commit/05bbc77057b84975cdae095c0173edb223363c0e
Submitter: "Zuul (22348)"
Branch:    master

commit 05bbc77057b84975cdae095c0173edb223363c0e
Author: Jim Gauld <James.Gauld@windriver.com>
Date:   Thu Nov 9 04:17:58 2023 -0500

Improve shutdown of containerd
    
    This update is to prevent nodes from crashing while powering
    off during graceful shutdown (or reboot). This improves timing
    and shutdown of containerd.service.
    
    The containerd shutdown script stops all containers via
    'crictl stop' with 5 second timeout, followed by stop all
    pods via 'crictl stopp'. This cleans up lingering /pause
    sandbox containers.
    
    This modifies the arguments to xargs and crictl to let xargs
    deal with parallelism instead of batching to crictl.
    crictl appears to do the stop operations serially.
    
    The number stop in parallel is engineered to 10.
    
    Engineering the number of stop in parallel in relation to
    shutdown timings under stress load will be addressed in a
    subsequent update. The engineering TC should align with
    customer requirements.
    
    When testing containerd shutdown under the stress of multiple
    pods writing to a shared PersistentVolume, even the new parallel
    shutdown code is not sufficient to complete the shutdown within
    the default 90-second timeout. Additional changes will be needed
    to enable clean shutdown under those circumstances.
    
    Partial-Bug: 2043069
    
    Test plan:
    - PASS - build-image, install and boot up on AIO-SX
    - PASS - perform reboot and verify /var/log/daemon.log
             has new k8s-container-cleanup.sh logs
             for 'Stopping all pods' and 'Stopping all containers',
             and that drbd stops after containerd.
    - FAIL - verify containerd shutdown works under stress with
             the new parallel stop pods parameter NPAR=10.
             The stress load uses ReadWriteMany PVC, and multiple
             pods, each writing to the shared PVC.
    
    Change-Id: Ibfc0a474a40344a629b3f0780449906a9c6b03ba
    Signed-off-by: Jim Gauld <James.Gauld@windriver.com>

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.