Backup & Restore: During AIO-DX restore, ingress validating webhook pod does not terminate
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Low
|
Joshua Kraitberg |
Bug Description
Brief Description
-----------------
During one of the steps in restore, progress will get stuck and block forever.
During restore, etcd will be in a confused state. If multiple nodes were configured during backup, pods assigned to those other nodes, eg. controller-1, will only be removable with '--force' flag.
To preview this, run 'kubectl get pods --all-namespaces'. This will show that pods are running on other nodes, despite the nodes not being installed yet.
```
sysadmin@
NAME STATUS ROLES AGE VERSION
controller-0 Ready control-
controller-1 NotReady control-
```
```
sysadmin@
NAME READY STATUS RESTARTS AGE
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
...
ic-nginx-
...
```
Severity
--------
Minor
Steps to Reproduce
------------------
Run a restore using a backup from AIO-DX.
Expected Behavior
------------------
Restore works.
Actual Behavior
----------------
Restore get stuck forever because pod cannot be killed.
Reproducibility
---------------
100%.
System Configuration
-------
Multi-node system.
Branch/Pull Time/Commit
-------
N/A.
Last Pass
---------
N/A.
Timestamp/Logs
--------------
2022-08-24 01:59:18,869 p=1986 u=sysadmin n=ansible | TASK [common/armada-helm : If on system restore mode, kill ingress validating webhook pod so it can be recreated]
Test Activity
-------------
Developer Testing
Workaround
----------
Add '--force' flag with deleting pods.
Changed in starlingx: | |
assignee: | nobody → Joshua Kraitberg (jkraitbe-wr) |
Changed in starlingx: | |
importance: | Undecided → Low |
tags: | added: stx.8.0 stx.update |
Fix proposed to branch: master /review. opendev. org/c/starlingx /ansible- playbooks/ +/855037
Review: https:/