Backup & Restore HTTPS: helm repo add command failed due to lighttpd.service after active controller restore
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
Ovidiu Poncea |
Bug Description
Brief Description
-----------------
The storage lab was converted from http to https and backup operation was performed. After active controller restore and unlock, it failed to become active.
The “helm repo add” command failed due to lighttpd.service was not up. When we tried to bring it up manually, it complained that file etc/ssl/
The “helm repo add” command is failing:
2019-09-
2019-09-
2019-09-
2019-09-
2019-09-
2019-09-
2019-09-
2019-09-
2019-09-
2019-09-
2019-09-
2019-09-
2019-09-
2019-09-
2019-09-
2019-09-
controller-
controller-
● lighttpd.service - Lightning Fast Webserver With Light System Requirements
Loaded: loaded (/usr/lib/
Active: failed (Result: exit-code) since Wed 2019-09-25 18:28:37 UTC; 9s ago
Process: 135381 ExecStart=
Main PID: 135381 (code=exited, status=255)
controller-
2019-09-25 18:28:54: (configfile.c.59) Warning: please add "mod_openssl" to server.modules list in lighttpd.conf. A future release of lighttpd 1.4.x *will not* automatically load mod_openssl and lighttpd *will not* use SSL/TLS where your lighttpd.conf contains ssl.* directives
2019-09-25 18:28:54: (mod_openssl.c.434) SSL: BIO_read_
2019-09-25 18:28:54: (server.c.1191) Initialization of plugins failed. Going down.
controller-
ls: cannot access /etc/ssl/
controller-
openstack registry-cert.crt registry-cert.key registry-
controller-
This lab from http to https. It is not installed as a https lab initially. When https is enabled the certificate “server-cert.pem” will be generated in /etc/ssl/private/. Even though this certificate file is backed up it is not restored during platform restore.
The code that needs to be modified is in stx/ansible-
- block:
- name: Restore certificate and key files
command: >-
tar -C /etc/ssl/private -xpf {{ target_backup_dir }}/{{ backup_filename }} --transform=
'
args:
warn: false
when: mode == 'restore'
It seems that when the lab is initially installed as a https lab, we don’t see this issue (We were able to unlock controller-0 but failed to unlocked other nodes LP-1844828). Does that mean the certificate “server-cert.pem” is installed automatically in /etc/ssl/private without needing to restore it?
Severity
--------
Provide the severity of the defect.
Critical: Controller-0 failed to become active after unlock
Steps to Reproduce
------------------
1. Create an environment for ansible remote host
2. Bring up the regular system with storage
3. Backup the system using ansible remotely
4. Re-install the controller with the same load
5. Restore the system using ansible remotely.
6. Unlock the active controller
Expected Behavior
------------------
The controller-0 should become Active after unlock
Actual Behavior
----------------
Controller-0 failed to become active after unlock
Reproducibility
---------------
Reproducible
System Configuration
-------
Regular system with storage
Branch/Pull Time/Commit
-------
BUILD_ID=
Test Activity
-------------
Feature Testing
description: | updated |
Changed in starlingx: | |
assignee: | nobody → Ovidiu Poncea (ovidiu.poncea) |
tags: | added: stx.retestneeded |
summary: |
- Backup & Restore: helm repo add command failed due to lighttpd.service - after active controller restore + Backup & Restore HTTPS: helm repo add command failed due to + lighttpd.service after active controller restore |
tags: | removed: stx.retestneeded |
Reviewed: https:/ /review. opendev. org/685390 /git.openstack. org/cgit/ starlingx/ ansible- playbooks/ commit/ ?id=9930bafd71b 61f42c3d6d5956c b96f3de1aa7bf2
Committed: https:/
Submitter: Zuul
Branch: master
commit 9930bafd71b61f4 2c3d6d5956cb96f 3de1aa7bf2
Author: Wei Zhou <email address hidden>
Date: Fri Sep 27 11:22:57 2019 -0400
Backup & Restore: Failed to unlock controller-0 after platform restore
This issue happened only when https is enabled.
Controller-0 failed to unlock because lighttpd.service was not up and
that caused "helm repo add" command to fail when applying controller
manifest. Because https is enabled, lighttpd service needs to access
server-cert.pem certificate to start. Even though this certification
is backed up but it is not restored during platform restore.
Change-Id: I0d7915bc950649 74675614be8eb4b 15cb091f684
Closes-Bug: 1845504
Signed-off-by: Wei Zhou <email address hidden>