StarlingX

Bug #1845504
Activity log

Activity log for bug #1845504

Date	Who	What changed	Old value	New value	Message
2019-09-26 13:49:09	Senthil Mukundakumar	bug			added bug
2019-09-26 14:02:11	Senthil Mukundakumar	description	Brief Description ----------------- The storage lab was converted from http to https and backup operation was performed. After active controller restore and unlock, it failed to become active. The “helm repo add” command failed due to lighttpd.service was not up. When we tried to bring it up manually, it complained that file etc/ssl/private/server-cert.pem didn’t exist. controller-0:/var/log/puppet# systemctl restart lighttpd.service controller-0:/var/log/puppet# systemctl status lighttpd.service ● lighttpd.service - Lightning Fast Webserver With Light System Requirements Loaded: loaded (/usr/lib/systemd/system/lighttpd.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2019-09-25 18:28:37 UTC; 9s ago Process: 135381 ExecStart=/usr/sbin/lighttpd -D -f /etc/lighttpd/lighttpd.conf (code=exited, status=255) Main PID: 135381 (code=exited, status=255) controller-0:/var/log/puppet# /usr/sbin/lighttpd -D -f /etc/lighttpd/lighttpd.conf 2019-09-25 18:28:54: (configfile.c.59) Warning: please add "mod_openssl" to server.modules list in lighttpd.conf. A future release of lighttpd 1.4.x will not automatically load mod_openssl and lighttpd will not use SSL/TLS where your lighttpd.conf contains ssl.* directives 2019-09-25 18:28:54: (mod_openssl.c.434) SSL: BIO_read_filename('/etc/ssl/private/server-cert.pem') failed 2019-09-25 18:28:54: (server.c.1191) Initialization of plugins failed. Going down. controller-0:/var/log/puppet# ls /etc/ssl/private/server-cert.pem ls: cannot access /etc/ssl/private/server-cert.pem: No such file or directory controller-0:/var/log/puppet# ls /etc/ssl/private/ openstack registry-cert.crt registry-cert.key registry-cert-pkcs1.key self-signed-server-cert.pem controller-0:/var/log/puppet# ls /opt/platform/ This lab from http to https. It is not installed as a https lab initially. When https is enabled the certificate “server-cert.pem” will be generated in /etc/ssl/private/. Even though this certificate file is backed up it is not restored during platform restore. The code that needs to be modified is in stx/ansible-playbooks/playbookconfig/src/playbooks/roles/bootstrap/bringup-essential-services/tasks/setup_registry_certificate_and_keys.yml: - block: - name: Restore certificate and key files command: >- tar -C /etc/ssl/private -xpf {{ target_backup_dir }}/{{ backup_filename }} --transform='s,./,,' 'etc/ssl/private/registry-cert' (change to)à 'etc/ssl/private/cert’ args: warn: false when: mode == 'restore' It seems that when the lab is initially installed as a https lab, we don’t see this issue (We were able to unlock controller-0 but failed to unlocked other nodes LP-1844828). Does that mean the certificate “server-cert.pem” is installed automatically in /etc/ssl/private without needing to restore it? Severity -------- Provide the severity of the defect. Critical: Controller-0 failed to become active after unlock Steps to Reproduce ------------------ 1. Create an environment for ansible remote host 2. Bring up the regular system with storage 3. Backup the system using ansible remotely 4. Re-install the controller with the same load 5. Restore the system using ansible remotely. 6. Unlock the active controller Expected Behavior ------------------ The controller-0 should become Active after unlock Actual Behavior ---------------- Controller-0 failed to become active after unlock Reproducibility --------------- Reproducible System Configuration -------------------- Regular system with storage Branch/Pull Time/Commit ----------------------- BUILD_ID="2019-09-24_15-36-38" Test Activity ------------- Feature Testing	Brief Description ----------------- The storage lab was converted from http to https and backup operation was performed. After active controller restore and unlock, it failed to become active. The “helm repo add” command failed due to lighttpd.service was not up. When we tried to bring it up manually, it complained that file etc/ssl/private/server-cert.pem didn’t exist. The “helm repo add” command is failing: 2019-09-25T15:56:56.456 ^[[mNotice: 2019-09-25 15:56:56 +0000 /Stage[main]/Platform::Helm/File[/opt/platform/helm_charts]/ensure: created^[[0m 2019-09-25T15:56:56.458 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 /Stage[main]/Platform::Helm/File[/opt/platform/helm_charts]: The container Class[Platform::Helm] will propagate my refresh event^[[0m 2019-09-25T15:56:56.460 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 Exec[restart lighttpd for helm](provider=posix): Executing 'systemctl restart lighttpd.service'^[[0m 2019-09-25T15:56:56.462 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 Executing: 'systemctl restart lighttpd.service'^[[0m 2019-09-25T15:56:56.543 ^[[mNotice: 2019-09-25 15:56:56 +0000 /Stage[main]/Platform::Helm/Exec[restart lighttpd for helm]/returns: executed successfully^[[0m 2019-09-25T15:56:56.545 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 /Stage[main]/Platform::Helm/Exec[restart lighttpd for helm]: The container Class[Platform::Helm] will propagate my refresh event^[[0m 2019-09-25T15:56:56.547 ^[[mNotice: 2019-09-25 15:56:56 +0000 /Stage[main]/Platform::Helm::Repositories/Platform::Helm::Repository[starlingx]/File[/www/pages/helm_charts/starlingx]/ensure: created^[[0m 2019-09-25T15:56:56.549 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 /Stage[main]/Platform::Helm::Repositories/Platform::Helm::Repository[starlingx]/File[/www/pages/helm_charts/starlingx]: The container Platform::Helm::Repository[starlingx] will propagate my refresh event^[[0m 2019-09-25T15:56:56.551 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 Exec[Generate index: /www/pages/helm_charts/starlingx](provider=posix): Executing 'helm repo index /www/pages/helm_charts/starlingx'^[[0m 2019-09-25T15:56:56.554 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 Executing with uid=www gid=www: 'helm repo index /www/pages/helm_charts/starlingx'^[[0m 2019-09-25T15:56:56.598 ^[[mNotice: 2019-09-25 15:56:56 +0000 /Stage[main]/Platform::Helm::Repositories/Platform::Helm::Repository[starlingx]/Exec[Generate index: /www/pages/helm_charts/starlingx]/returns: executed successfully^[[0m 2019-09-25T15:56:56.600 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 /Stage[main]/Platform::Helm::Repositories/Platform::Helm::Repository[starlingx]/Exec[Generate index: /www/pages/helm_charts/starlingx]: The container Platform::Helm::Repository[starlingx] will propagate my refresh event^[[0m 2019-09-25T15:56:56.602 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 Exec[Adding StarlingX helm repo: starlingx](provider=posix): Executing 'helm repo add starlingx http://127.0.0.1:8080/helm_charts/starlingx'^[[0m 2019-09-25T15:56:56.604 ^[[0;36mDebug: 2019-09-25 15:56:56 +0000 Executing with uid=sysadmin gid=sys_protected: 'helm repo add starlingx http://127.0.0.1:8080/helm_charts/starlingx'^[[0m 2019-09-25T15:56:56.644 ^[[mNotice: 2019-09-25 15:56:56 +0000 /Stage[main]/Platform::Helm::Repositories/Platform::Helm::Repository[starlingx]/Exec[Adding StarlingX helm repo: starlingx]/returns: Error: Looks like "http://127.0.0.1:8080/helm_charts/starlingx" is not a valid chart repository or cannot be reached: Get http://127.0.0.1:8080/helm_charts/starlingx/index.yaml: dial tcp 127.0.0.1:8080: connect: connection refused^[[0m 2019-09-25T15:56:56.646 ^[[1;31mError: 2019-09-25 15:56:56 +0000 helm repo add starlingx http://127.0.0.1:8080/helm_charts/starlingx returned 1 instead of one of [0] controller-0:/var/log/puppet# systemctl restart lighttpd.service controller-0:/var/log/puppet# systemctl status lighttpd.service ● lighttpd.service - Lightning Fast Webserver With Light System Requirements Loaded: loaded (/usr/lib/systemd/system/lighttpd.service; enabled; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2019-09-25 18:28:37 UTC; 9s ago Process: 135381 ExecStart=/usr/sbin/lighttpd -D -f /etc/lighttpd/lighttpd.conf (code=exited, status=255) Main PID: 135381 (code=exited, status=255) controller-0:/var/log/puppet# /usr/sbin/lighttpd -D -f /etc/lighttpd/lighttpd.conf 2019-09-25 18:28:54: (configfile.c.59) Warning: please add "mod_openssl" to server.modules list in lighttpd.conf. A future release of lighttpd 1.4.x will not automatically load mod_openssl and lighttpd will not use SSL/TLS where your lighttpd.conf contains ssl.* directives 2019-09-25 18:28:54: (mod_openssl.c.434) SSL: BIO_read_filename('/etc/ssl/private/server-cert.pem') failed 2019-09-25 18:28:54: (server.c.1191) Initialization of plugins failed. Going down. controller-0:/var/log/puppet# ls /etc/ssl/private/server-cert.pem ls: cannot access /etc/ssl/private/server-cert.pem: No such file or directory controller-0:/var/log/puppet# ls /etc/ssl/private/ openstack registry-cert.crt registry-cert.key registry-cert-pkcs1.key self-signed-server-cert.pem controller-0:/var/log/puppet# ls /opt/platform/ This lab from http to https. It is not installed as a https lab initially. When https is enabled the certificate “server-cert.pem” will be generated in /etc/ssl/private/. Even though this certificate file is backed up it is not restored during platform restore. The code that needs to be modified is in stx/ansible-playbooks/playbookconfig/src/playbooks/roles/bootstrap/bringup-essential-services/tasks/setup_registry_certificate_and_keys.yml: - block: - name: Restore certificate and key files command: >- tar -C /etc/ssl/private -xpf {{ target_backup_dir }}/{{ backup_filename }} --transform='s,./,,' 'etc/ssl/private/registry-cert' (change to)à 'etc/ssl/private/cert’ args: warn: false when: mode == 'restore' It seems that when the lab is initially installed as a https lab, we don’t see this issue (We were able to unlock controller-0 but failed to unlocked other nodes LP-1844828). Does that mean the certificate “server-cert.pem” is installed automatically in /etc/ssl/private without needing to restore it? Severity -------- Provide the severity of the defect. Critical: Controller-0 failed to become active after unlock Steps to Reproduce ------------------ 1. Create an environment for ansible remote host 2. Bring up the regular system with storage 3. Backup the system using ansible remotely 4. Re-install the controller with the same load 5. Restore the system using ansible remotely. 6. Unlock the active controller Expected Behavior ------------------ The controller-0 should become Active after unlock Actual Behavior ---------------- Controller-0 failed to become active after unlock Reproducibility --------------- Reproducible System Configuration -------------------- Regular system with storage Branch/Pull Time/Commit ----------------------- BUILD_ID="2019-09-24_15-36-38" Test Activity ------------- Feature Testing
2019-09-26 14:14:58	Frank Miller	bug			added subscriber Wei Zhou
2019-09-26 14:15:09	Frank Miller	starlingx: assignee		Ovidiu Poncea (ovidiu.poncea)
2019-09-27 17:10:27	OpenStack Infra	starlingx: status	New	Fix Released
2019-10-04 02:22:25	Yang Liu	tags		stx.retestneeded
2019-10-08 20:46:24	Ghada Khalil	tags	stx.retestneeded	stx.3.0 stx.retestneeded stx.update
2019-10-08 20:46:41	Ghada Khalil	bug			added subscriber Bill Zvonar
2019-10-08 20:48:15	Ghada Khalil	starlingx: importance	Undecided	High
2019-10-30 19:44:42	Yang Liu	summary	Backup & Restore: helm repo add command failed due to lighttpd.service after active controller restore	Backup & Restore HTTPS: helm repo add command failed due to lighttpd.service after active controller restore
2019-10-31 15:48:45	Senthil Mukundakumar	tags	stx.3.0 stx.retestneeded stx.update	stx.3.0 stx.update