Backup and Restore: upon execution of 'controller_controller --restore-system' controller-0 was rebooted unexpectedly

Bug #1808422 reported by mhg
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Wei Zhou

Bug Description

Brief Description
-----------------
During system restore, when executing CLI 'sudo config_controller --restore-system <backup-files_system.tgz>', system output the following messages and rebooted:
[16%]ERROR:root:Failed to read platform.conf
...
*** Host watchdog (hostwd) not receiving messages from PMON ***
...
*** Initiating reboot ***

Severity
--------
Major

Steps to Reproduce
------------------
1 do system backup according to customer documents while do not backup any cinder volumes
      sudo config_controller --backup <output_backup_files>
    save the generated backup-files outside the lab under test

2 do system restore
   1) wipe disks on all nodes except the storage nodes
   2) power off all nodes except the storage nodes
   3) reinstall controller-0 with the same version system software
   4) restore the system data:
           sudo config_controller --restore-system <backup-file_system.tgz>

Expected Behavior
------------------
The CLI completes without any issues and system does not reboot.

Actual Behavior
----------------
The CLI output some error messages and rebooted the node when it almost finished all steps:
Restoring system (this will take several minutes):
Step 1 of 24 [# ] [4%]Step 2 of 24 [### ] [8%]Step 3 of 24 [##### ] [12%]Step 4 of 24 [####### ] [16%]ERROR:root:Failed to read platform.conf
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/tsconfig/tsconfig.py", line 84, in _load
    value = str(config.get('platform_conf', 'subfunction'))
  File "/usr/lib64/python2.7/ConfigParser.py", line 618, in get
    raise NoOptionError(option, section)
NoOptionError: No option 'subfunction' in section: 'platform_conf'
INFO:controllerconfig.backup_restore:File config/server-cert.pem is not in archive.
INFO:controllerconfig.backup_restore:File config/iptables.rules is not in archive.
Step 5 of 24 [######### ] [20%]Step 6 of 24 [########### ] [25%]Step 7 of 24 [############# ] [29%]Step 8 of 24 [############### ] [33%]Step 9 of 24 [################ ] [37%]Step 10 of 24 [################## ] [41%]INFO:controllerconfig.backup_restore:File cinder/saveconfig.json is not in archive.
Step 11 of 24 [#################### ] [45%]Step 12 of 24 [###################### ] [50%]Step 13 of 24 [######################## ] [54%]Step 14 of 24 [########################## ] [58%]Step 15 of 24 [############################ ] [62%]Step 16 of 24 [############################## ] [66%]Step 17 of 24 [############################### ] [70%]Step 18 of 24 [################################# ] [75%]Step 19 of 24 [################################### ] [79%]Step 20 of 24 [##################################### ] [83%]INFO:controllerconfig.backup_restore:File patch-vault is not in archive.
INFO:controllerconfig.backup_restore:File config/ceph-config is not in archive.
Step 22 of 24 [######################################### ] [91%]Step 23 of 24 [########################################### ] [95%]*** Host watchdog (hostwd) not receiving messages from PMON ***
*** Host Watchdog declaring system unhealthy ***
*** Initiating reboot ***
*** Host watchdog (hostwd) not receiving messages from PMON ***
*** Host Watchdog declaring system unhealthy ***
*** Initiating reboot ***

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Dedicated storage

Branch/Pull Time/Commit
-----------------------
master as of build-date-time
StarlingX_Upstream_build as of 2018-12-07_20-18-00

Timestamp/Logs
--------------
Thu Dec 13 18:07:23 UTC 2018

Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Wei Zhou (wzhou007)
Revision history for this message
mhg (marvinhg) wrote :

Retest with a temporary code change to another but (https://bugs.launchpad.net/starlingx/+bug/1808417) on controller-0 with the same test steps, the problem did not happen.

So mark it duplicate to https://bugs.launchpad.net/starlingx/+bug/1808417.

Ghada Khalil (gkhalil)
tags: added: stx.2019.03 stx.config
Changed in starlingx:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Ghada Khalil (gkhalil) wrote :
Changed in starlingx:
status: Triaged → Fix Released
Ken Young (kenyis)
tags: added: stx.2019.05
removed: stx.2019.03
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.