https certificate with tpm mode was not getting synced when standby controller comes to online

Bug #1848235 reported by Anujeyan Manokeran
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
Medium
Andy

Bug Description

Brief Description
-----------------.
   As a TPM fault injection test scenario https certificate installed with tpm mode when standby controller(c-1) was offline. As per test scenario alarm “TPM configuration failed or device not found” was seen for standby controller(c-1). Also "Service group web-services degraded; lighttpd(disabled, failed)" alarm seen. sm-dump also reporting the same this was not expected as per test scenario. Both alarms are not cleared after controller-1 was online. It was cleared when “https certificate installed with tpm” was reinstalled after standby controller(C-1) was online.

 When this was discussed with Andy Ning it was told that reinstalled is not required. Tpm certificate should be synchronized and alarm should be cleared when controller-0 is online. More details are in attached email. Below test scenario is described.

   $ sudo sm-dump
Password:
Sorry, try again.
Password:

-Service_Groups------------------------------------------------------------------------
oam-services standby standby
controller-services standby standby
cloud-services standby standby
patching-services standby standby
directory-services active active
web-services active go-active degraded
storage-services active active
storage-monitoring-services standby standby
vim-services standby standby
---------------------------------------------------------------------------------------

-Services------------------------------------------------------------------------------
oam-ip enabled-standby disabled
management-ip enabled-standby disabled
drbd-pg enabled-standby enabled-standby
drbd-rabbit enabled-standby enabled-standby
drbd-platform enabled-standby enabled-standby
pg-fs enabled-standby disabled
rabbit-fs enabled-standby disabled
nfs-mgmt enabled-standby disabled
platform-fs enabled-standby disabled
postgres enabled-standby disabled
rabbit enabled-standby disabled
platform-export-fs enabled-standby disabled
platform-nfs-ip enabled-standby disabled
sysinv-inv enabled-standby disabled
sysinv-conductor enabled-standby disabled
mtc-agent enabled-standby disabled
hw-mon enabled-standby disabled
dnsmasq enabled-standby disabled
fm-mgr enabled-standby disabled
keystone enabled-standby disabled
open-ldap enabled-active enabled-active
snmp enabled-standby disabled
lighttpd enabled-active disabled failed
horizon enabled-active enabling-throttle
patch-alarm-manager enabled-standby disabled
mgr-restful-plugin enabled-active enabled-active
ceph-manager enabled-standby disabled
vim enabled-standby disabled
vim-api enabled-standby disabled
vim-webserver enabled-standby disabled
haproxy enabled-standby disabled
pxeboot-ip enabled-standby disabled
drbd-extension enabled-standby enabled-standby
extension-fs enabled-standby disabled
extension-export-fs enabled-standby disabled
etcd enabled-standby disabled
drbd-etcd enabled-standby enabled-standby
etcd-fs enabled-standby disabled
barbican-api enabled-standby disabled
barbican-keystone-listener enabled-standby disabled
barbican-worker enabled-standby disabled
cluster-host-ip enabled-standby disabled
docker-distribution enabled-standby disabled
dockerdistribution-fs enabled-standby disabled
drbd-dockerdistribution enabled-standby enabled-standby
helmrepository-fs enabled-standby disabled
registry-token-server enabled-standby disabled
------------------------------------------------------------------------------
controller-1:~$ logout
Connection to controller-1 closed.
[sysadmin@controller-0 ~(keystone_admin)]$ system show
+----------------------+-----------------------------------------------------+
| Property | Value |
+----------------------+-----------------------------------------------------+
| contact | None |
| created_at | 2019-10-09T16:11:53.727576+00:00 |
| description | yow-cgcs-wildcat-63_66: setup by deployment manager |
| https_enabled | True |
| location | None |
| name | yow-cgcs-wildcat-63-66 |
| region_name | RegionOne |
| sdn_enabled | False |
| security_feature | spectre_meltdown_v1 |
| service_project_name | services |
| software_version | 19.10 |
| system_mode | duplex |
| system_type | Standard |
| timezone | UTC |
| updated_at | 2019-10-09T19:02:31.200648+00:00 |
| uuid | c3cdb0ad-2e5a-42b8-845c-5efc6cb9c07f |
| vswitch_type | none |
+----------------------+-----------------------------------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$ system certificate-list
+--------------------------------------+----------+---------------------------+
| uuid | certtype | expiry_date |
+--------------------------------------+----------+---------------------------+
| c2acaadf-f071-4ea7-98d9-9bcb06197e43 | tpm_mode | 2020-10-08T18:56:10+00:00 |
| eb06faec-dcba-4a30-b687-f6fca34d3493 | ssl_ca | 2021-06-05T20:28:20+00:00 |
+--------------------------------------+----------+---------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$ fm alarm-list
+----------+---------------------------------------------------------------------------------------+--------------------------+----------+------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+---------------------------------------------------------------------------------------+--------------------------+----------+------------------+
| 400.001 | Service group web-services degraded; lighttpd(disabled, failed) | service_domain= | major | 2019-10-09T20:14 |
| | | controller.service_group | | :59.129449 |
| | | =web-services.host= | | |
| | | controller-1 | | |
| | | | | |
| 400.002 | Service group web-services loss of redundancy; expected 2 active members but only 1 | service_domain= | major | 2019-10-09T20:14 |
| | active member available | controller.service_group | | :35.873420 |
| | | =web-services | | |
| | | | | |
| 500.100 | TPM configuration failed or device not found. | host=controller-1 | major | 2019-10-09T20:12 |
| | | | | :02.871772 |
| | | | | |
+----------+---------------------------------------------------------------------------------------+--------------------------+----------+---

Severity
--------
Major

Steps to Reproduce
------------------
1. Power off the standby controller(c-1)
2. Install TPM with certificate
sudo https-certificate-install -c server-with-key.pem --tpm

3. Verify the alarm . Alarm will be seen in as per description .
4. After powering on c-1 wait for controller-1 to become online.
5. Verify the alarm . Alarm will be seen in as per description.sm-dump also will show the same
6. Re install tpm with server-with-key.pem. TPM installed on both controller and alarm not seen.
sudo https-certificate-install -c server-with-key.pem --tpm

Expected Behavior
------------------
TPM should installed automatically when controller-1 is back online.
Actual Behavior
----------------
tpm need to be reinstalled after controller is online.
Reproducibility
---------------
Always reproducible
System Configuration
--------------------
AIO-DX system
Branch/Pull Time/Commit
-----------------------
BUILD_DATE= 2019-10-08 20:02:1

Last Pass
---------

Timestamp/Logs
--------------
2019-10-09T20:14
Test Activity
-------------
Regression test

Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :
Revision history for this message
Anujeyan Manokeran (anujeyan) wrote :
Yang Liu (yliu12)
tags: added: stx.retestneeded
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.3.0 / medium priority - fault scenario not handled properly.

Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.3.0 stx.config
Changed in starlingx:
status: New → Triaged
assignee: nobody → Andy (andy.wrs)
Revision history for this message
Andy (andy.wrs) wrote :

This is not a supported scenario. In order to install tpm certificate, both controllers must be enabled and unlocked.

For controller-1 to recover in this scenario, the tpm certificate can be re-installed once controller-1 boots up and in enabled and unlock state.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as Invalid based on Andy's investigation above.
@Jeyan, please update the affected test-case accordingly.

Changed in starlingx:
status: Triaged → Invalid
Yang Liu (yliu12)
tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.