commit e25d16065a0983b8b9f1a35465c24b167892864a
Author: Andy Ning <email address hidden>
Date: Thu Dec 9 13:11:23 2021 -0500
Don't fail if pods restart timeout during root CA update
During k8s root CA update, deployments, daemonsets and statefulsets
are rollout restarted in order for the pods deployed by them to take
the new root CA certificate. It is observed that some statefulsets
may take longer than the time limit (10 minutes) to complete the
rollout restart, causing puppet manifests apply timeout and fail
the update.
This change updated the puppet restart code so that it check the
rollout restart status periodically for a limited time (8 mins),
generate a "ATTENTION" log in puppet.log for any of the deloyments,
daemonsets or statefulsets that don't complete restart in the time
limit. After the time limit, the puppet apply returns successfully
so that the root CA update continues.
This solution is a balance between "let the root CA update continue
and finish" and "minimize service impact by restarting applications"
Test Plan:
PASS: Successful root CA update with all sets complete restart in
allocated timeout.
PASS: Successful root CA update with some sets don't complete restart
in allocated timeout. Logs generated in puppet.log.
Closes-Bug: 1954303
Signed-off-by: Andy Ning <email address hidden>
Change-Id: Ie2589701a9ba234928e06d659e58db5412486303
Reviewed: https:/ /review. opendev. org/c/starlingx /stx-puppet/ +/821274 /opendev. org/starlingx/ stx-puppet/ commit/ e25d16065a0983b 8b9f1a35465c24b 167892864a
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit e25d16065a0983b 8b9f1a35465c24b 167892864a
Author: Andy Ning <email address hidden>
Date: Thu Dec 9 13:11:23 2021 -0500
Don't fail if pods restart timeout during root CA update
During k8s root CA update, deployments, daemonsets and statefulsets
are rollout restarted in order for the pods deployed by them to take
the new root CA certificate. It is observed that some statefulsets
may take longer than the time limit (10 minutes) to complete the
rollout restart, causing puppet manifests apply timeout and fail
the update.
This change updated the puppet restart code so that it check the
rollout restart status periodically for a limited time (8 mins),
generate a "ATTENTION" log in puppet.log for any of the deloyments,
daemonsets or statefulsets that don't complete restart in the time
limit. After the time limit, the puppet apply returns successfully
so that the root CA update continues.
This solution is a balance between "let the root CA update continue
and finish" and "minimize service impact by restarting applications"
Test Plan:
PASS: Successful root CA update with all sets complete restart in
allocated timeout.
PASS: Successful root CA update with some sets don't complete restart
in allocated timeout. Logs generated in puppet.log.
Closes-Bug: 1954303 4928e06d659e58d b5412486303
Signed-off-by: Andy Ning <email address hidden>
Change-Id: Ie2589701a9ba23