controller-0 fails to recover prior to configuration following bootstrap

Bug #1854367 reported by Matt Peters on 2019-11-28
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Medium
Tee Ngo

Bug Description

Brief Description
-----------------
Rebooting the first controller following the ansible playbook apply, but prior to configuring and unlocking the host results in an unrecoverable system. The controller is missing critical configuration required for it to recover the local services.

Severity
--------
Major

Steps to Reproduce
------------------
1) Bootstrap the system using Ansible
2) Reboot Controller (sudo reboot)

Expected Behavior
------------------
The controller should be able to recover to the bootstrap state following the reboot. All host services should recover and should be ready to accept the required configuration to unlock it.

A reboot at this phase of the installation process may be required for patching of the system that would require a reboot of the host.

Actual Behavior
----------------
The controller services are in a failed state due to a partially applied puppet manifest and missing network configuration against the loopback interface.

Reproducibility
---------------
100% Reproducible

System Configuration
--------------------
All configurations.

Branch/Pull Time/Commit
-----------------------
BUILD_ID="2019-11-06_10-52-51"

Last Pass
---------
Never. It seems this scenario has never been tested.

Timestamp/Logs
--------------
From failed puppet apply:
2019-11-27T09:41:59.090 Debug: 2019-11-27 09:41:59 +0000 Exec[apply-network-config](provider=posix): Executing 'apply_network_config.sh'
2019-11-27T09:41:59.092 Debug: 2019-11-27 09:41:59 +0000 Executing: 'apply_network_config.sh'
2019-11-27T09:41:59.109 Error: 2019-11-27 09:41:59 +0000 apply_network_config.sh returned 1 instead of one of [0]

2019-11-27T09:42:12.862 Debug: 2019-11-27 09:42:12 +0000 Executing: 'drbdadm adjust drbd-extension'
2019-11-27T09:42:12.864 Notice: 2019-11-27 09:42:12 +0000 /Stage[main]/Platform::Drbd::Extension/Platform::Drbd::Filesystem[drbd-extension]/Drbd::Resource[drbd-extension]/Drbd::Resource::Enable[drbd-extension]/Drbd::Resource::Up[drbd-extension]/Exec[reuse existing DRBD resource drbd-extension]/returns: /etc/drbd.d/drbd-extension.res:44: in resource drbd-extension, on controller-0:
2019-11-27T09:42:12.869 Notice: 2019-11-27 09:42:12 +0000 /Stage[main]/Platform::Drbd::Extension/Platform::Drbd::Filesystem[drbd-extension]/Drbd::Resource[drbd-extension]/Drbd::Resource::Enable[drbd-extension]/Drbd::Resource::Up[drbd-extension]/Exec[reuse existing DRBD resource drbd-extension]/returns: IP 192.168.204.2 not found on this host.
2019-11-27T09:42:12.872 Error: 2019-11-27 09:42:12 +0000 drbdadm adjust drbd-extension returned 10 instead of one of [0]

Test Activity
-------------
Developer Testing - Patching

Ghada Khalil (gkhalil) wrote :

Recommending for stx.4.0 -- ansible config robustness item

description: updated
Changed in starlingx:
importance: Undecided → Low
status: New → Triaged
assignee: nobody → Tee Ngo (teewrs)
tags: added: stx.4.0 stx.config
Changed in starlingx:
importance: Low → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers