vault-manager remains inactive when the cluster host it runs on is locked
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Low
|
Tae Park |
Bug Description
Brief Description
-----------------
When vault is in HA configuration (3 vault servers), with only three available cluster nodes: if the cluster node upon which vault-manager is running is locked then vault-manager will reschedule but not run because it is waiting for all three vault server pods to run.
Severity
--------
Minor - not impact of vault function unless other conditions occur
Steps to Reproduce
------------------
Configure AIO-DX plus one worker, or standard controller with one worker. Apply and configure vault application per Starlingx documentation.
Starlingx Vault Reference: https:/
Confirm that the vault-manager and 3 vault server pods are running:
kubectl get pods -n vault
Identify the cluster node upon which vault manager is running:
kubectl get pods -n vault sva-vault-manager-0 -o jsonpath=
Use 'system host-lock' command to lock the cluster node where vault-manager is running.
Wait for vault-manager pod to be rescheduled. Watch the vault-manager pod log to see that it remains in initializing state with log "Waiting for sva-vault statefulset running pods":
kubectl logs -f -n vault sva-vault-manager-0
Expected Behavior
------------------
Since the vault cluster is initialized already, vault manager does not need to wait for the number of pods in statefulset to equal the configured replica count.
Actual Behavior
----------------
Vault-manager remains in initializing state until three of the platform nodes are unlocked
Reproducibility
---------------
100% With 3 cluster nodes (two controllers and one worker)
System Configuration
-------
AIO-DX plux one worker
Standard configuration with one worker (2+1)
Branch/Pull Time/Commit
-------
starlingx master
Last Pass
---------
N/A, probably day 1 bug
Timestamp/Logs
--------------
N/A
Test Activity
-------------
Developer testing
Workaround
----------
- Evacuate vault-manager before locking the cluster node, or
- unlock all platform hosts, or
- use AIO-DX plus 2 workers,
- use or Standard controller plus 2 workers.
Changed in starlingx: | |
assignee: | nobody → Tae Park (tparkwr) |
Changed in starlingx: | |
status: | New → In Progress |
tags: | added: stx.9.0 stx.apps |
Changed in starlingx: | |
importance: | Undecided → Low |
Reviewed: https:/ /review. opendev. org/c/starlingx /vault- armada- app/+/890256 /opendev. org/starlingx/ vault-armada- app/commit/ 896008fb732dc1d 1541564da109275 a753b6e65c
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 896008fb732dc1d 1541564da109275 a753b6e65c
Author: Tae Park <email address hidden>
Date: Tue Aug 1 17:29:01 2023 -0400
vault-manager wait for one server only when initialized
Modifying the vault-manager initialization logic so that it only waits
for pod number equal to the replica value to be active
if the raft is not yet initialized.
TEST PLAN:
- In a 2 controller, 1 worker setup,
- Upload and apply vault
- Lock the host that vault-manager is running on
- Vault manager should restart
- Within the logs, there should not be a repetition of " Waiting for sva-vault statefulset running pods..."
- Vault Sanity test in AIO-SX
- Bashate of rendered init.sh
Closes-bug: 2029375
Signed-off-by: Tae Park <email address hidden> 5364ef91c048f74 0d0f0675d6b
Change-Id: I41990b87395a5d