Backup & Restore: controller failed to unlock - Manifest apply timeout
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Heitor Matsui |
Bug Description
Brief Description
-----------------
After run the ansible restore playbook the controller failed to unlock.
Severity
--------
Major: System/Feature is usable but degraded
Steps to Reproduce
------------------
- Install a simplex system with WRCP 22.06
- Run the Backup Ansible playbook from controller-0
- Install a clean image of WRCP in the system with wipedisk=false
- Run the restore Ansible playbook locally with the backup file saved above
- Unlock controller-0
Expected Behavior
------------------
Unlock controller-0 after restore successfully
Actual Behavior
----------------
Failed to unlock controller-0 after restore
Reproducibility
---------------
Reproducible
System Configuration
-------
AIO-SX
Branch/Pull Time/Commit
-------
master 2022-06-22
Last Pass
---------
N/A
Timestamp/Logs
--------------
[sysadmin@
+----+-
| id | hostname | personality | administrative | operational | availability |
+----+-
| 1 | controller-0 | controller | locked | disabled | online |
+----+-
[sysadmin@
+------
| Property | Value |
+------
| action | none |
| administrative | locked |
| availability | online |
| bm_ip | None |
| bm_type | none |
| bm_username | None |
| boot_device | /dev/disk/
| capabilities | {u'is_max_
| | u'monitor', u'Personality': u'Controller-
| clock_synchroni
| config_applied | 43208253-
| config_status | None |
| config_target | 43208253-
| console | ttyS0,115200n8 |
| created_at | 2022-06-
| device_image_update | None |
| hostname | controller-0 |
| id | 1 |
| install_output | text |
| install_state | None |
| install_state_info | None |
| inv_state | inventoried |
| invprovision | provisioned |
| location | {} |
| max_cpu_mhz_allowed | 2600 |
| max_cpu_
| mgmt_ip | 192.168.204.2 |
| mgmt_mac | 00:00:00:00:00:00 |
| operational | disabled |
| personality | controller |
| reboot_needed | False |
| reserved | False |
| rootfs_device | /dev/disk/
| serialid | None |
| software_load | 22.06 |
| subfunction_avail | offline |
| subfunction_oper | disabled |
| subfunctions | controller,worker |
| task | Manifest apply timeout ; Unlock to retry |
| tboot | false |
| ttys_dcd | None |
| updated_at | 2022-06-
| uptime | 4346 |
| uuid | 12991d90-
| vim_progress_status | services-enabled |
+------
[sysadmin@
+------
| application | version | manifest name | manifest file | status | progress |
+------
| cert-manager | 22.06-37 | cert-manager-
| nginx-ingress-
| oidc-auth-apps | 22.06-71 | oidc-auth-
| platform-integ-apps | 22.06-55 | platform-
+------
[sysadmin@
cluster:
id: 8a14acce-
health: HEALTH_WARN
13 slow ops, oldest one blocked for 4259 sec, mon.controller-0 has slow ops
services:
mon: 1 daemons, quorum controller-0 (age 72m)
mgr: controller-
mds: 1 up:standby
osd: 1 osds: 0 up, 0 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:
After run the restore ansible playbook the controller finished in offline state anyways the bootstrap finished without error, check in the ansible logs:
2022-06-23 10:32:44,559 p=11433 u=sysadmin | TASK [restore-
2022-06-23 10:32:44,559 p=11433 u=sysadmin | Thursday 23 June 2022 10:32:44 +0000 (0:00:00.273) 0:32:15.257 *********
2022-06-23 10:32:47,011 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (30 retries left).
2022-06-23 10:32:59,263 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (29 retries left).
2022-06-23 10:33:11,431 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (28 retries left).
2022-06-23 10:33:23,608 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (27 retries left).
2022-06-23 10:33:35,808 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (26 retries left).
2022-06-23 10:33:48,009 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (25 retries left).
2022-06-23 10:34:00,189 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (24 retries left).
2022-06-23 10:34:12,546 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (23 retries left).
2022-06-23 10:34:24,724 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (22 retries left).
2022-06-23 10:34:36,882 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (21 retries left).
2022-06-23 10:34:49,085 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (20 retries left).
2022-06-23 10:35:01,415 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (19 retries left).
2022-06-23 10:35:13,592 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (18 retries left).
2022-06-23 10:35:25,830 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (17 retries left).
2022-06-23 10:35:38,003 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (16 retries left).
2022-06-23 10:35:50,238 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (15 retries left).
2022-06-23 10:36:02,408 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (14 retries left).
2022-06-23 10:36:14,581 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (13 retries left).
2022-06-23 10:36:26,860 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (12 retries left).
2022-06-23 10:36:39,041 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (11 retries left).
2022-06-23 10:36:51,246 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (10 retries left).
2022-06-23 10:37:03,555 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (9 retries left).
2022-06-23 10:37:15,759 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (8 retries left).
2022-06-23 10:37:27,965 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (7 retries left).
2022-06-23 10:37:40,168 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (6 retries left).
2022-06-23 10:37:52,412 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (5 retries left).
2022-06-23 10:38:04,693 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (4 retries left).
2022-06-23 10:38:17,020 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (3 retries left).
2022-06-23 10:38:29,234 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (2 retries left).
2022-06-23 10:38:41,418 p=11433 u=sysadmin | FAILED - RETRYING: Check controller-0 is in online state (1 retries left).
2022-06-23 10:38:53,632 p=11433 u=sysadmin | changed: [localhost]
mtcAgent.log
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
2022-06-
Test Activity
-------------
Regression Testing
Workaround
----------
N/A
description: | updated |
Changed in starlingx: | |
assignee: | nobody → Heitor Matsui (heitormatsui) |
description: | updated |
Changed in starlingx: | |
importance: | Undecided → Medium |
tags: | added: stx.7.0 stx.update |
Fix proposed to branch: master /review. opendev. org/c/starlingx /ansible- playbooks/ +/847619
Review: https:/