Comment 0 for bug 1914836

Revision history for this message
Adriano Oliveira (aoliveir) wrote :

Brief Description
-----------------
During distributed cloud orchestrated upgrade, compute node unlock failed.
Output indicates sriov_numvfs configuration might need more time to be applied.

Severity
--------
Major

Steps to Reproduce
------------------
Follow upgrade procedure as per upgrade orchestration.
The issue is seen when orchestration attempts to unlock compute-0.

Expected Behavior
------------------
No failure on host unlock on any node during upgrade orchestration.

Actual Behavior
----------------
Unlock of compute-0 fails.

Reproducibility
---------------
Intermitent

System Configuration
--------------------
Distributed Cloud

Branch/Pull Time/Commit
-----------------------
stx4.0 as of "2020-06-27_18-35-20"

Last Pass
---------

Timestamp/Logs
--------------

Alarm ID Reason Text Entity ID Severity Time Stamp
------------------------------------------------------------------------------------------+----------------------------------------------+------------------

900.203 Software upgrade auto-apply failed orchestration=sw-upgrade critical 2020-07-02T18:24:
                                                                                                                                              53.413091

800.001 Storage Alarm Condition: HEALTH_WARN. Please check 'ceph -s' for more details. cluster=04a34a69-8c18-4494-a97f- warning 2020-07-02T18:11:
                                                                                              5035682d7427 32.397315

200.001 compute-0 was administratively locked to take it out-of-service. host=compute-0 warning 2020-07-02T18:10:
                                                                                                                                              53.991463

750.006 A configuration change requires a reapply of the oidc-auth-apps application. k8s_application=oidc-auth-apps warning 2020-07-02T17:31:
                                                                                                                                              13.243285

750.006 A configuration change requires a reapply of the platform-integ-apps application. k8s_application=platform-integ-apps warning 2020-07-02T17:31:
                                                                                                                                              13.059851

900.005 System Upgrade in progress. host=controller minor 2020-07-02T17:30:

                                                                                                                                       08.041908

[2020-11-16 23:55:01,419] 314 DEBUG MainThread ssh.send :: Send 'system --os-endpoint-type internalURL --os-region-name RegionOne host-unlock controller-0'
[2020-11-16 23:55:07,956] 436 DEBUG MainThread ssh.expect :: Output:
Expecting number of interface sriov_numvfs=32. Please wait a few minutes for inventory update and retry host-unlock.

[2020-11-16 23:56:15,366] 314 DEBUG MainThread ssh.send :: Send 'system --os-endpoint-type internalURL --os-region-name RegionOne host-unlock controller-0'
[2020-11-16 23:56:24,457] 436 DEBUG MainThread ssh.expect :: Output:
+-----------------------+--------------------------------------------+
| Property | Value |
+-----------------------+--------------------------------------------+
| action | none |
| administrative | locked |
| availability | online |
| bm_ip | None |
| bm_type | none |
| bm_username | None |
| boot_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-5.0 |
| capabilities | {u'stor_function': u'monitor'} |
| clock_synchronization | ntp |
| config_applied | 1d1484c5-dd15-49b3-ab87-0ed0fc4c4a3d |
| config_status | None |
| config_target | 1d1484c5-dd15-49b3-ab87-0ed0fc4c4a3d |
| console | ttyS0,115200n8 |
| created_at | 2020-11-16T14:59:44.949842+00:00 |
| device_image_update | None |
| hostname | controller-0 |
| id | 1 |
| install_output | text |
| install_state | completed |
| install_state_info | None |
| inv_state | inventoried |
| invprovision | provisioned |
| location | {} |
| mgmt_ip | abcd:204::2 |
| mgmt_mac | 3c:fd:fe:a0:16:78 |
| operational | disabled |
| personality | controller |
| reboot_needed | False |
| reserved | False |
| rootfs_device | /dev/disk/by-path/pci-0000:00:1f.2-ata-5.0 |
| serialid | None |
| software_load | 20.06 |
| subfunction_avail | online |
| subfunction_oper | disabled |
| subfunctions | controller,worker,lowlatency |
| task | Unlocking |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2020-11-16T23:55:38.864171+00:00 |
| uptime | 111 |
| uuid | 20167fc8-c125-4b72-aace-fc10bb8de147 |
| vim_progress_status | services-disabled |
+-----------------------+--------------------------------------------+

Test Activity
-------------
Regression Testing

Workaround
----------
Wait a couple of minutes and try to unlock the node again.