AIO-DX: RR patch orchestration failed when openstack is applied

Bug #1893124 reported by Jim Gauld
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Jim Gauld

Bug Description

Brief Description
-----------------
During RR Patch orchestration failed due to platform alarm present during controller-1 patching.

Observation that it takes about 5 minutes after the unlock for the dbmon service to go enabled and the alarms to clear. It was also noted that VIM doesn't wait for sufficient period of time after the host goes unlocked/enabled for the alarms to clear.

Severity
--------
Major

Steps to Reproduce
------------------
1) Launch VMs
2) Apply RR patch using orchestration (Alarm:relaxed, vm: stop and start)

Expected Behavior
------------------
RR patch to be successful

Actual Behavior
----------------
Patch orchestration failed

Reproducibility
---------------
yes, seen twice (2/2)

System Configuration
--------------------
AIO-DX (IPV4)
Lab:wcp78-79

Branch/Pull Time/Commit
-----------------------
N/A

Last Pass
---------
N/A

Timestamp/Logs
--------------
2020-07-29 20:37:35 alarms from platform are present

2020-07-29T20:37:35.000 controller-1 fmManager: info
{ "event_log_id" : "900.115", "reason_text" : "Software patch auto-apply failed, reason = alarms from platform are present", "entity_instance_id" : "orchestration=sw-patch", "severity" : "critical", "state" : "msg", "timestamp" : "2020-07-29 20:37:35.454444" }

2020-07-29T20:37:35.000 controller-1 fmManager: info
{ "event_log_id" : "900.101", "reason_text" : "Software patch auto-apply inprogress", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-wildcat-78-79.orchestration=sw-patch", "severity" : "major", "state" : "clear", "timestamp" : "2020-07-29 20:37:35.552285" }

2020-07-29T20:37:35.000 controller-1 fmManager: info
{ "event_log_id" : "900.121", "reason_text" : "Software patch auto-apply aborted", "entity_instance_id" : "orchestration=sw-patch", "severity" : "critical", "state" : "msg", "timestamp" : "2020-07-29 20:37:35.497654" }

2020-07-29T20:37:38.000 controller-1 fmManager: info
{ "event_log_id" : "900.103", "reason_text" : "Software patch auto-apply failed", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-wildcat-78-79.orchestration=sw-patch", "severity" : "critical", "state" : "set", "timestamp" : "2020-07-29 20:37:38.327694" }

2020-07-29T20:37:38.000 controller-1 fmManager: info { "event_log_id" : "275.001", "reason_text" : "Host controller-0 hypervisor is now unlocked-enabled", "entity_instance_id" : "host=controller-0.hypervisor=4383cbba-b902-4d6e-a255-52d3a98ef412", "severity" : "cr

Test Activity:
-------------
Developer testing

Jim Gauld (jgauld)
Changed in starlingx:
assignee: nobody → Jim Gauld (jgauld)
Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
tags: added: stx.5.0 stx.nfv
summary: - AIO-DX: RR patch orchestration failed
+ AIO-DX: RR patch orchestration failed when openstack is applied
Changed in starlingx:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nfv (master)

Reviewed: https://review.opendev.org/748232
Committed: https://git.openstack.org/cgit/starlingx/nfv/commit/?id=4c36f911c9e760829abb99176e4b915accfa3bf1
Submitter: Zuul
Branch: master

commit 4c36f911c9e760829abb99176e4b915accfa3bf1
Author: Jim Gauld <email address hidden>
Date: Wed Aug 26 09:16:27 2020 -0400

    Add wait for alarms to clear to SW patch strategy unlock hosts step

    This appends the WaitAlarmsClearStep after UnlockHostsStep on controller
    hosts when stx-openstack application is installed for SwPatchStrategy.
    This will periodically query alarms and allows the system to stabilize.
    If stx-openstack is not installed, this will do one minute wait with the
    existing SystemStabilizeStep.

    Change-Id: I6dbc4c6032a3bb9d160df79d46630a81960cbb37
    Closes-Bug: 1893124
    Signed-off-by: Jim Gauld <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.