Backup and Restore: upon execution of 'controller_controller --restore-system' controller-0 was rebooted unexpectedly
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Wei Zhou |
Bug Description
Brief Description
-----------------
During system restore, when executing CLI 'sudo config_controller --restore-system <backup-
[16%]ERROR:
...
*** Host watchdog (hostwd) not receiving messages from PMON ***
...
*** Initiating reboot ***
Severity
--------
Major
Steps to Reproduce
------------------
1 do system backup according to customer documents while do not backup any cinder volumes
sudo config_controller --backup <output_
save the generated backup-files outside the lab under test
2 do system restore
1) wipe disks on all nodes except the storage nodes
2) power off all nodes except the storage nodes
3) reinstall controller-0 with the same version system software
4) restore the system data:
sudo config_controller --restore-system <backup-
Expected Behavior
------------------
The CLI completes without any issues and system does not reboot.
Actual Behavior
----------------
The CLI output some error messages and rebooted the node when it almost finished all steps:
Restoring system (this will take several minutes):
Step 1 of 24 [# ] [4%]Step 2 of 24 [### ] [8%]Step 3 of 24 [##### ] [12%]Step 4 of 24 [####### ] [16%]ERROR:
Traceback (most recent call last):
File "/usr/lib64/
value = str(config.
File "/usr/lib64/
raise NoOptionError(
NoOptionError: No option 'subfunction' in section: 'platform_conf'
INFO:controller
INFO:controller
Step 5 of 24 [######### ] [20%]Step 6 of 24 [########### ] [25%]Step 7 of 24 [############# ] [29%]Step 8 of 24 [############### ] [33%]Step 9 of 24 [################ ] [37%]Step 10 of 24 [################## ] [41%]INFO:
Step 11 of 24 [######
INFO:controller
Step 22 of 24 [######
*** Host Watchdog declaring system unhealthy ***
*** Initiating reboot ***
*** Host watchdog (hostwd) not receiving messages from PMON ***
*** Host Watchdog declaring system unhealthy ***
*** Initiating reboot ***
Reproducibility
---------------
Reproducible
System Configuration
-------
Dedicated storage
Branch/Pull Time/Commit
-------
master as of build-date-time
StarlingX_
Timestamp/Logs
--------------
Thu Dec 13 18:07:23 UTC 2018
Changed in starlingx: | |
assignee: | nobody → Wei Zhou (wzhou007) |
tags: | added: stx.2019.03 stx.config |
Changed in starlingx: | |
status: | New → Triaged |
importance: | Undecided → Medium |
tags: |
added: stx.2019.05 removed: stx.2019.03 |
tags: |
added: stx.2.0 removed: stx.2019.05 |
Retest with a temporary code change to another but (https:/ /bugs.launchpad .net/starlingx/ +bug/1808417) on controller-0 with the same test steps, the problem did not happen.
So mark it duplicate to https:/ /bugs.launchpad .net/starlingx/ +bug/1808417.