Vault agent injector on AIO-SX should not have anti-affinity rule
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Low
|
Tae Park |
Bug Description
Brief Description
-----------------
On AIO-SX, when applying an application update on vault which includes a change to vault injector Deployment resource: the new injector pod cannot schedule because a pod is already running and there is only one node.
Severity
--------
Minor: workaround exists
Steps to Reproduce
------------------
# Where the current vault inject is using latest image tag 1.2.1,
# tell the application to use 1.2.0 in order to prompt the injector
# pod to 'update'.
# This workflow is loosely based on the following commit, which
# is identified as an example change that causes the condition:
# commit 198f4e51 "set images to pull from configured registries"
# This sample yaml to cause vault injector to be updated
$ cat <<EOF > vault-injector.yaml
injector:
image:
tag: 1.2.0
EOF
# show and update helm overrides
$ system helm-override-list vault
$ system helm-override-show vault vault vault
$ system helm-override-
--values=
$ system helm-override-show vault vault vault
# apply the new helm overrides
$ system application-apply vault
$ system application-list
# observe pod that is not being scheduled:
$ kubectl get pods -n vault
# examine the pod events to see the bug's symptom
$ unschedulablePo
$ kubectl describe pods -n vault $unschedulablePod \
| grep anti-affinity
Warning FailedScheduling 4m28s default-scheduler 0/1 nodes are available: 1 node(s) didn't match pod anti-affinity rules. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
Expected Behavior
------------------
When the vault application is updated on AIO-SX, all of the pods for which resources are updated can restart.
Actual Behavior
----------------
Vault injector agent cannot schedule.
Reproducibility
---------------
100% Reproducible if an app update contains a change for vault injector Deployment resource.
System Configuration
-------
AIO-SX only
Branch/Pull Time/Commit
-------
Starlingx master, 20230508T060000Z
Last Pass
---------
N/A, but never. the defect is day one. (I have not looked at older vault versions).
Timestamp/Logs
--------------
N/A, per steps to reproduce
Test Activity
-------------
Feature test of vault for another defect, including test of auto-update functionality.
Workaround
----------
1. Before performing application-update:
###
# First, update the current deployment to disable anti-affinity
# and permit the number of running pods to be zero during update
# maxUnavailable is required to work-around current replicaset's
# configuration
cat <<EOF >injector_
injector:
strategy:
rollingUpdate:
maxUnavai
affinity: {}
EOF
system helm-override-
system helm-override-show vault vault vault
system application-apply vault
# wait for replicaset and pod to restart
###
# Second, use the helm overrides we actually want to use by default for AIO-SX
cat <<EOF >injector_
injector:
affinity: {}
EOF
system helm-override-
system helm-override-show vault vault vault
system application-apply vault
# wait for replicaset and pod to restart
###
# Finally, perform the application update that was intended
# etc.
# this assumes the new application also omits anti-affinity
2. If application was run and the bug's symptom is observed
# Within 30 minutes for application-update beginning
Delete the replicaset of the running pod so that the new pod can run.
Changed in starlingx: | |
assignee: | nobody → Tae Park (tparkwr) |
tags: | added: stx.9.0 stx.apps stx.security |
Changed in starlingx: | |
status: | New → In Progress |
Changed in starlingx: | |
importance: | Undecided → Low |
Reviewed: https:/ /review. opendev. org/c/starlingx /vault- armada- app/+/891091 /opendev. org/starlingx/ vault-armada- app/commit/ f7a37e6ad91b7a0 efa79c9cb9783af 343344ad33
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit f7a37e6ad91b7a0 efa79c9cb9783af 343344ad33
Author: Tae Park <email address hidden>
Date: Thu Aug 10 14:16:40 2023 -0400
Removing default injector anti-affinity rules
Adding a null override over default anti-affinity rules for vault injectors. The default rule only allow one vault injector pod at a time. This is a problem because helm-override and application apply will try to schedule a new pod first before completely removing the old pod.
This change lets a new vault agent injector pod to be scheduled without issue.
TEST PLAN:
- Test for AIO-SX
- Update helm-override so that vault-injector has a different image tag than default
- apply the new helm-override
- There should be no FailedScheduling error in the vault pods
- Sanity test for both AIO-SX and AIO-DX + 1 worker
Closes-bug: 2030901
Change-Id: I9814f502558ab1 cbecad48cf37341 639c964258f
Signed-off-by: Tae Park <email address hidden>