RVMC image pull failure over subcloud upgrade

Bug #1904898 reported by Tee Ngo
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Tee Ngo

Bug Description

Brief Description
-----------------
A bug in the remote install playbook led to subcloud upgrade (20.06 -> 20.12) failure.

Severity
--------
Major

Steps to Reproduce
------------------
Perform system upgrade from 20.06 (stx4.0) to 20.12 (master) of a distributed cloud

Expected Behavior
------------------
Both the system controller upgrade and orchestrated subcloud upgrade complete successfully

Actual Behavior
----------------
Orchestrated subcloud upgrade failed during upgrade simplex step with the following install playbook error:

TASK [Save Redfish Virtual Media Controller logs if rvmc-subcloud11-lbfct is not ready] ***
fatal: [subcloud11 -> localhost]: FAILED! => {"changed": true, "cmd": ["kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "-n", "rvmc", "logs", "rvmc-subcloud11-lbfct"], "delta": "0:00:00.071850", "end": "2020-11-16 20:43:06.905374", "msg": "non-zero return code", "rc": 1, "start": "2020-11-16 20:43:06.833524", "stderr": "Error from server (BadRequest): container \"rvmc\" in pod \"rvmc-subcloud11-lbfct\" is waiting to start: trying and failing to pull image", "stderr_lines": ["Error from server (BadRequest): container \"rvmc\" in pod \"rvmc-subcloud11-lbfct\" is waiting to start: trying and failing to pull image"], "stdout": "", "stdout_lines": []}

The root cause is the task that copies the default-registry-key secret from kube-system namespace to rvmc namespace had failed silently. Without this secret, rvmc image could not be pulled from controller registry to launch rvmc pod for the remote install of new the load in the subcloud.

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
Distributed Cloud

Branch/Pull Time/Commit
-----------------------
Nov. 17th, 2020 load

Last Pass
---------
Unsure if distributed cloud upgrade from 20.06 -> 20.12 has been officially verified by the test team before.

Timestamp/Logs
--------------
See above

Test Activity
-------------
Developer Testing

Workaround
----------
Manually pull the rvmc image
Delete the existing upgrade strategy
Create a new upgrade strategy and apply

Tee Ngo (teewrs)
Changed in starlingx:
assignee: nobody → Tee Ngo (teewrs)
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.5.0 stx.update
Revision history for this message
Tee Ngo (teewrs) wrote :
Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.