2019-09-26 14:02:11 |
Senthil Mukundakumar |
description |
Brief Description
-----------------
The storage lab was converted from http to https and backup operation was performed. After active controller restore and unlock, it failed to become active.
The “helm repo add” command failed due to lighttpd.service was not up. When we tried to bring it up manually, it complained that file etc/ssl/private/server-cert.pem didn’t exist.
controller-0:/var/log/puppet# systemctl restart lighttpd.service
controller-0:/var/log/puppet# systemctl status lighttpd.service
● lighttpd.service - Lightning Fast Webserver With Light System Requirements
Loaded: loaded (/usr/lib/systemd/system/lighttpd.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2019-09-25 18:28:37 UTC; 9s ago
Process: 135381 ExecStart=/usr/sbin/lighttpd -D -f /etc/lighttpd/lighttpd.conf (code=exited, status=255)
Main PID: 135381 (code=exited, status=255)
controller-0:/var/log/puppet# /usr/sbin/lighttpd -D -f /etc/lighttpd/lighttpd.conf
2019-09-25 18:28:54: (configfile.c.59) Warning: please add "mod_openssl" to server.modules list in lighttpd.conf. A future release of lighttpd 1.4.x *will not* automatically load mod_openssl and lighttpd *will not* use SSL/TLS where your lighttpd.conf contains ssl.* directives
2019-09-25 18:28:54: (mod_openssl.c.434) SSL: BIO_read_filename('/etc/ssl/private/server-cert.pem') failed
2019-09-25 18:28:54: (server.c.1191) Initialization of plugins failed. Going down.
controller-0:/var/log/puppet# ls /etc/ssl/private/server-cert.pem
ls: cannot access /etc/ssl/private/server-cert.pem: No such file or directory
controller-0:/var/log/puppet# ls /etc/ssl/private/
openstack registry-cert.crt registry-cert.key registry-cert-pkcs1.key self-signed-server-cert.pem
controller-0:/var/log/puppet# ls /opt/platform/
This lab from http to https. It is not installed as a https lab initially. When https is enabled the certificate “server-cert.pem” will be generated in /etc/ssl/private/. Even though this certificate file is backed up it is not restored during platform restore.
The code that needs to be modified is in stx/ansible-playbooks/playbookconfig/src/playbooks/roles/bootstrap/bringup-essential-services/tasks/setup_registry_certificate_and_keys.yml:
- block:
- name: Restore certificate and key files
command: >-
tar -C /etc/ssl/private -xpf {{ target_backup_dir }}/{{ backup_filename }} --transform='s,.*/,,'
'etc/ssl/private/registry-cert*' (change to)à 'etc/ssl/private/*cert*’
args:
warn: false
when: mode == 'restore'
It seems that when the lab is initially installed as a https lab, we don’t see this issue (We were able to unlock controller-0 but failed to unlocked other nodes LP-1844828). Does that mean the certificate “server-cert.pem” is installed automatically in /etc/ssl/private without needing to restore it?
Severity
--------
Provide the severity of the defect.
Critical: Controller-0 failed to become active after unlock
Steps to Reproduce
------------------
1. Create an environment for ansible remote host
2. Bring up the regular system with storage
3. Backup the system using ansible remotely
4. Re-install the controller with the same load
5. Restore the system using ansible remotely.
6. Unlock the active controller
Expected Behavior
------------------
The controller-0 should become Active after unlock
Actual Behavior
----------------
Controller-0 failed to become active after unlock
Reproducibility
---------------
Reproducible
System Configuration
--------------------
Regular system with storage
Branch/Pull Time/Commit
-----------------------
BUILD_ID="2019-09-24_15-36-38"
Test Activity
-------------
Feature Testing |
Brief Description
-----------------
The storage lab was converted from http to https and backup operation was performed. After active controller restore and unlock, it failed to become active.
The “helm repo add” command failed due to lighttpd.service was not up. When we tried to bring it up manually, it complained that file etc/ssl/private/server-cert.pem didn’t exist.
The “helm repo add” command is failing:
2019-09-25T15:56:56.456 ^[[mNotice: 2019-09-25 15:56:56 +0000 /Stage[main]/Platform::Helm/File[/opt/platform/helm_charts]/ensure: created^[[0m
2019-09-25T15:56:56.458 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 /Stage[main]/Platform::Helm/File[/opt/platform/helm_charts]: The container Class[Platform::Helm] will propagate my refresh event^[[0m
2019-09-25T15:56:56.460 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 Exec[restart lighttpd for helm](provider=posix): Executing 'systemctl restart lighttpd.service'^[[0m
2019-09-25T15:56:56.462 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 Executing: 'systemctl restart lighttpd.service'^[[0m
2019-09-25T15:56:56.543 ^[[mNotice: 2019-09-25 15:56:56 +0000 /Stage[main]/Platform::Helm/Exec[restart lighttpd for helm]/returns: executed successfully^[[0m
2019-09-25T15:56:56.545 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 /Stage[main]/Platform::Helm/Exec[restart lighttpd for helm]: The container Class[Platform::Helm] will propagate my refresh event^[[0m
2019-09-25T15:56:56.547 ^[[mNotice: 2019-09-25 15:56:56 +0000 /Stage[main]/Platform::Helm::Repositories/Platform::Helm::Repository[starlingx]/File[/www/pages/helm_charts/starlingx]/ensure: created^[[0m
2019-09-25T15:56:56.549 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 /Stage[main]/Platform::Helm::Repositories/Platform::Helm::Repository[starlingx]/File[/www/pages/helm_charts/starlingx]: The container Platform::Helm::Repository[starlingx] will propagate my refresh event^[[0m
2019-09-25T15:56:56.551 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 Exec[Generate index: /www/pages/helm_charts/starlingx](provider=posix): Executing 'helm repo index /www/pages/helm_charts/starlingx'^[[0m
2019-09-25T15:56:56.554 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 Executing with uid=www gid=www: 'helm repo index /www/pages/helm_charts/starlingx'^[[0m
2019-09-25T15:56:56.598 ^[[mNotice: 2019-09-25 15:56:56 +0000 /Stage[main]/Platform::Helm::Repositories/Platform::Helm::Repository[starlingx]/Exec[Generate index: /www/pages/helm_charts/starlingx]/returns: executed successfully^[[0m
2019-09-25T15:56:56.600 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 /Stage[main]/Platform::Helm::Repositories/Platform::Helm::Repository[starlingx]/Exec[Generate index: /www/pages/helm_charts/starlingx]: The container Platform::Helm::Repository[starlingx] will propagate my refresh event^[[0m
2019-09-25T15:56:56.602 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 Exec[Adding StarlingX helm repo: starlingx](provider=posix): Executing 'helm repo add starlingx http://127.0.0.1:8080/helm_charts/starlingx'^[[0m
2019-09-25T15:56:56.604 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 Executing with uid=sysadmin gid=sys_protected: 'helm repo add starlingx http://127.0.0.1:8080/helm_charts/starlingx'^[[0m
2019-09-25T15:56:56.644 ^[[mNotice: 2019-09-25 15:56:56 +0000 /Stage[main]/Platform::Helm::Repositories/Platform::Helm::Repository[starlingx]/Exec[Adding StarlingX helm repo: starlingx]/returns: Error: Looks like "http://127.0.0.1:8080/helm_charts/starlingx" is not a valid chart repository or cannot be reached: Get http://127.0.0.1:8080/helm_charts/starlingx/index.yaml: dial tcp 127.0.0.1:8080: connect: connection refused^[[0m
2019-09-25T15:56:56.646 ^[[1;31mError: 2019-09-25 15:56:56 +0000 helm repo add starlingx http://127.0.0.1:8080/helm_charts/starlingx returned 1 instead of one of [0]
controller-0:/var/log/puppet# systemctl restart lighttpd.service
controller-0:/var/log/puppet# systemctl status lighttpd.service
● lighttpd.service - Lightning Fast Webserver With Light System Requirements
Loaded: loaded (/usr/lib/systemd/system/lighttpd.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2019-09-25 18:28:37 UTC; 9s ago
Process: 135381 ExecStart=/usr/sbin/lighttpd -D -f /etc/lighttpd/lighttpd.conf (code=exited, status=255)
Main PID: 135381 (code=exited, status=255)
controller-0:/var/log/puppet# /usr/sbin/lighttpd -D -f /etc/lighttpd/lighttpd.conf
2019-09-25 18:28:54: (configfile.c.59) Warning: please add "mod_openssl" to server.modules list in lighttpd.conf. A future release of lighttpd 1.4.x *will not* automatically load mod_openssl and lighttpd *will not* use SSL/TLS where your lighttpd.conf contains ssl.* directives
2019-09-25 18:28:54: (mod_openssl.c.434) SSL: BIO_read_filename('/etc/ssl/private/server-cert.pem') failed
2019-09-25 18:28:54: (server.c.1191) Initialization of plugins failed. Going down.
controller-0:/var/log/puppet# ls /etc/ssl/private/server-cert.pem
ls: cannot access /etc/ssl/private/server-cert.pem: No such file or directory
controller-0:/var/log/puppet# ls /etc/ssl/private/
openstack registry-cert.crt registry-cert.key registry-cert-pkcs1.key self-signed-server-cert.pem
controller-0:/var/log/puppet# ls /opt/platform/
This lab from http to https. It is not installed as a https lab initially. When https is enabled the certificate “server-cert.pem” will be generated in /etc/ssl/private/. Even though this certificate file is backed up it is not restored during platform restore.
The code that needs to be modified is in stx/ansible-playbooks/playbookconfig/src/playbooks/roles/bootstrap/bringup-essential-services/tasks/setup_registry_certificate_and_keys.yml:
- block:
- name: Restore certificate and key files
command: >-
tar -C /etc/ssl/private -xpf {{ target_backup_dir }}/{{ backup_filename }} --transform='s,.*/,,'
'etc/ssl/private/registry-cert*' (change to)à 'etc/ssl/private/*cert*’
args:
warn: false
when: mode == 'restore'
It seems that when the lab is initially installed as a https lab, we don’t see this issue (We were able to unlock controller-0 but failed to unlocked other nodes LP-1844828). Does that mean the certificate “server-cert.pem” is installed automatically in /etc/ssl/private without needing to restore it?
Severity
--------
Provide the severity of the defect.
Critical: Controller-0 failed to become active after unlock
Steps to Reproduce
------------------
1. Create an environment for ansible remote host
2. Bring up the regular system with storage
3. Backup the system using ansible remotely
4. Re-install the controller with the same load
5. Restore the system using ansible remotely.
6. Unlock the active controller
Expected Behavior
------------------
The controller-0 should become Active after unlock
Actual Behavior
----------------
Controller-0 failed to become active after unlock
Reproducibility
---------------
Reproducible
System Configuration
--------------------
Regular system with storage
Branch/Pull Time/Commit
-----------------------
BUILD_ID="2019-09-24_15-36-38"
Test Activity
-------------
Feature Testing |
|