https certificate with tpm mode was not getting synced when standby controller comes to online
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Invalid
|
Medium
|
Andy |
Bug Description
Brief Description
-----------------.
As a TPM fault injection test scenario https certificate installed with tpm mode when standby controller(c-1) was offline. As per test scenario alarm “TPM configuration failed or device not found” was seen for standby controller(c-1). Also "Service group web-services degraded; lighttpd(disabled, failed)" alarm seen. sm-dump also reporting the same this was not expected as per test scenario. Both alarms are not cleared after controller-1 was online. It was cleared when “https certificate installed with tpm” was reinstalled after standby controller(C-1) was online.
When this was discussed with Andy Ning it was told that reinstalled is not required. Tpm certificate should be synchronized and alarm should be cleared when controller-0 is online. More details are in attached email. Below test scenario is described.
$ sudo sm-dump
Password:
Sorry, try again.
Password:
-Service_
oam-services standby standby
controller-services standby standby
cloud-services standby standby
patching-services standby standby
directory-services active active
web-services active go-active degraded
storage-services active active
storage-
vim-services standby standby
-------
-Services-
oam-ip enabled-standby disabled
management-ip enabled-standby disabled
drbd-pg enabled-standby enabled-standby
drbd-rabbit enabled-standby enabled-standby
drbd-platform enabled-standby enabled-standby
pg-fs enabled-standby disabled
rabbit-fs enabled-standby disabled
nfs-mgmt enabled-standby disabled
platform-fs enabled-standby disabled
postgres enabled-standby disabled
rabbit enabled-standby disabled
platform-export-fs enabled-standby disabled
platform-nfs-ip enabled-standby disabled
sysinv-inv enabled-standby disabled
sysinv-conductor enabled-standby disabled
mtc-agent enabled-standby disabled
hw-mon enabled-standby disabled
dnsmasq enabled-standby disabled
fm-mgr enabled-standby disabled
keystone enabled-standby disabled
open-ldap enabled-active enabled-active
snmp enabled-standby disabled
lighttpd enabled-active disabled failed
horizon enabled-active enabling-throttle
patch-alarm-manager enabled-standby disabled
mgr-restful-plugin enabled-active enabled-active
ceph-manager enabled-standby disabled
vim enabled-standby disabled
vim-api enabled-standby disabled
vim-webserver enabled-standby disabled
haproxy enabled-standby disabled
pxeboot-ip enabled-standby disabled
drbd-extension enabled-standby enabled-standby
extension-fs enabled-standby disabled
extension-export-fs enabled-standby disabled
etcd enabled-standby disabled
drbd-etcd enabled-standby enabled-standby
etcd-fs enabled-standby disabled
barbican-api enabled-standby disabled
barbican-
barbican-worker enabled-standby disabled
cluster-host-ip enabled-standby disabled
docker-distribution enabled-standby disabled
dockerdistribut
drbd-dockerdist
helmrepository-fs enabled-standby disabled
registry-
-------
controller-1:~$ logout
Connection to controller-1 closed.
[sysadmin@
+------
| Property | Value |
+------
| contact | None |
| created_at | 2019-10-
| description | yow-cgcs-
| https_enabled | True |
| location | None |
| name | yow-cgcs-
| region_name | RegionOne |
| sdn_enabled | False |
| security_feature | spectre_meltdown_v1 |
| service_
| software_version | 19.10 |
| system_mode | duplex |
| system_type | Standard |
| timezone | UTC |
| updated_at | 2019-10-
| uuid | c3cdb0ad-
| vswitch_type | none |
+------
[sysadmin@
+------
| uuid | certtype | expiry_date |
+------
| c2acaadf-
| eb06faec-
+------
[sysadmin@
+------
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+------
| 400.001 | Service group web-services degraded; lighttpd(disabled, failed) | service_domain= | major | 2019-10-09T20:14 |
| | | controller.
| | | =web-services.host= | | |
| | | controller-1 | | |
| | | | | |
| 400.002 | Service group web-services loss of redundancy; expected 2 active members but only 1 | service_domain= | major | 2019-10-09T20:14 |
| | active member available | controller.
| | | =web-services | | |
| | | | | |
| 500.100 | TPM configuration failed or device not found. | host=controller-1 | major | 2019-10-09T20:12 |
| | | | | :02.871772 |
| | | | | |
+------
Severity
--------
Major
Steps to Reproduce
------------------
1. Power off the standby controller(c-1)
2. Install TPM with certificate
sudo https-certifica
3. Verify the alarm . Alarm will be seen in as per description .
4. After powering on c-1 wait for controller-1 to become online.
5. Verify the alarm . Alarm will be seen in as per description.sm-dump also will show the same
6. Re install tpm with server-
sudo https-certifica
Expected Behavior
------------------
TPM should installed automatically when controller-1 is back online.
Actual Behavior
----------------
tpm need to be reinstalled after controller is online.
Reproducibility
---------------
Always reproducible
System Configuration
-------
AIO-DX system
Branch/Pull Time/Commit
-------
BUILD_DATE= 2019-10-08 20:02:1
Last Pass
---------
Timestamp/Logs
--------------
2019-10-09T20:14
Test Activity
-------------
Regression test
stx.3.0 / medium priority - fault scenario not handled properly.