sm services loss of redundancy alarms on install of storage system

Bug #1797567 reported by Maria Yousaf
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Bin Qian

Bug Description

Brief Description
-----------------
On install of storage system, loss of redundancy alarms were observed for sm services.

Severity
--------
Critical

Steps to Reproduce
------------------
Install system

Expected Behavior
------------------
System should be alarm free at the end of install

Actual Behavior
----------------
The following alarms were seen:
+----------+--------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+----------+----------------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+--------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+----------+----------------------------+
| 400.002 | Service group controller-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=controller-services | major | 2018-10-11T05:26:25.835758 |
| 400.002 | Service group cloud-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=cloud-services | major | 2018-10-11T05:26:25.673766 |
| 400.002 | Service group vim-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=vim-services | major | 2018-10-11T05:26:25.512765 |
| 400.002 | Service group storage-services loss of redundancy; expected 2 active members but only 1 active member available | service_domain=controller.service_group=storage-services | major | 2018-10-11T05:26:22.911771 |
| 400.002 | Service group storage-monitoring-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=storage-monitoring-services | major | 2018-10-11T05:26:22.667809 |
| 400.002 | Service group patching-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=patching-services | major | 2018-10-11T05:02:06.811585 |
| 400.002 | Service group oam-services loss of redundancy; expected 1 standby member but no standby members available | service_domain=controller.service_group=oam-services | major | 2018-10-11T05:02:06.730590 |
| 400.002 | Service group directory-services loss of redundancy; expected 2 active members but only 1 active member available | service_domain=controller.service_group=directory-services | major | 2018-10-11T05:02:04.964594 |
| 400.002 | Service group web-services loss of redundancy; expected 2 active members but only 1 active member available | service_domain=controller.service_group=web-services | major | 2018-10-11T05:02:04.721579 |
+----------+--------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------+----------+----------------------------+

Reproducibility
---------------
Tried once.

System Configuration
--------------------
Storage system

Branch/Pull Time/Commit
-----------------------
master as of 2018-10-10_20-18-00

Timestamp/Logs
--------------
2018-10-11T05:26:25.512765

Tags: stx.2.0 stx.ha
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Targeting stx.2019.03 as this is an issue in stx master only, not in the r/2018.10 release branch

Changed in starlingx:
assignee: nobody → Bin Qian (bqian20)
importance: Undecided → High
tags: added: stx.2019.03 stx.ha
Changed in starlingx:
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-ha (master)

Fix proposed to branch: master
Review: https://review.openstack.org/610990

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-ha (master)

Reviewed: https://review.openstack.org/610990
Committed: https://git.openstack.org/cgit/openstack/stx-ha/commit/?id=5f9731c47eb1ca5e5a970a9400464b22a7bacca3
Submitter: Zuul
Branch: master

commit 5f9731c47eb1ca5e5a970a9400464b22a7bacca3
Author: Bin Qian <email address hidden>
Date: Tue Oct 16 08:59:04 2018 -0400

    Fix service groups on controller-1 stuck in initial

    Initialize fail-pending timer id.
    Also deregister the timer when program exits.

    Closes-Bug: 1797567

    Change-Id: Ief278dfff1185a6acea718b683da11934a192161
    Signed-off-by: Bin Qian <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ken Young (kenyis)
tags: added: stx.2019.05
removed: stx.2019.03
Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.