Unlock failed during distributed cloud orchestrated upgrade
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Adriano Oliveira |
Bug Description
Brief Description
-----------------
During distributed cloud orchestrated upgrade, worker node unlock failed.
Output indicates sriov_numvfs configuration might need more time to be applied.
Severity
--------
Major
Steps to Reproduce
------------------
Follow upgrade procedure as per upgrade orchestration.
The issue is seen when orchestration attempts to unlock worker-0.
Expected Behavior
------------------
No failure on host unlock on any node during upgrade orchestration.
Actual Behavior
----------------
Unlock of worker-0 fails.
Reproducibility
---------------
Intermitent
System Configuration
-------
Distributed Cloud
Branch/Pull Time/Commit
-------
stx4.0 as of "2020-06-
Last Pass
---------
Timestamp/Logs
--------------
Alarm ID Reason Text Entity ID Severity Time Stamp
-------
900.203 Software upgrade auto-apply failed orchestration=
800.001 Storage Alarm Condition: HEALTH_WARN. Please check 'ceph -s' for more details. cluster=
200.001 worker-0 was administratively locked to take it out-of-service. host=worker-0 warning 2020-07-02T18:10:
750.006 A configuration change requires a reapply of the oidc-auth-apps application. k8s_application
750.006 A configuration change requires a reapply of the platform-integ-apps application. k8s_application
900.005 System Upgrade in progress. host=controller minor 2020-07-02T17:30:
[2020-11-16 23:55:01,419] 314 DEBUG MainThread ssh.send :: Send 'system --os-endpoint-type internalURL --os-region-name RegionOne host-unlock controller-0'
[2020-11-16 23:55:07,956] 436 DEBUG MainThread ssh.expect :: Output:
Expecting number of interface sriov_numvfs=32. Please wait a few minutes for inventory update and retry host-unlock.
[2020-11-16 23:56:15,366] 314 DEBUG MainThread ssh.send :: Send 'system --os-endpoint-type internalURL --os-region-name RegionOne host-unlock controller-0'
[2020-11-16 23:56:24,457] 436 DEBUG MainThread ssh.expect :: Output:
+------
| Property | Value |
+------
| action | none |
| administrative | locked |
| availability | online |
| bm_ip | None |
| bm_type | none |
| bm_username | None |
| boot_device | /dev/disk/
| capabilities | {u'stor_function': u'monitor'} |
| clock_synchroni
| config_applied | 1d1484c5-
| config_status | None |
| config_target | 1d1484c5-
| console | ttyS0,115200n8 |
| created_at | 2020-11-
| device_image_update | None |
| hostname | controller-0 |
| id | 1 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| inv_state | inventoried |
| invprovision | provisioned |
| location | {} |
| mgmt_ip | abcd:204::2 |
| mgmt_mac | 3c:fd:fe:a0:16:78 |
| operational | disabled |
| personality | controller |
| reboot_needed | False |
| reserved | False |
| rootfs_device | /dev/disk/
| serialid | None |
| software_load | stx 4.0 |
| subfunction_avail | online |
| subfunction_oper | disabled |
| subfunctions | controller,
| task | Unlocking |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2020-11-
| uptime | 111 |
| uuid | 20167fc8-
| vim_progress_status | services-disabled |
+------
Test Activity
-------------
Regression Testing
Workaround
----------
Wait a couple of minutes and try to unlock the node again.
description: | updated |
description: | updated |
Changed in starlingx: | |
status: | In Progress → Fix Released |
Issue initially seen in worker node, but also reproduced in controller node.