Killing glance.api.pid process is not creating any kind of fm alarm-list

Bug #1798631 reported by Fernando Hernandez Gonzalez on 2018-10-18
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Medium
haitao wang

Bug Description

Brief Description
-----------------
There is any alarm once the process id of "cat /var/run/resource-agents/glance-api.pid" is deleted. Another process is assigned to glance-api.pid but there is no event/alarm logged.
Severity
--------
Major

Steps to Reproduce
------------------
Identify the glance-api.pid process:
       Controller-# $ cat /var/run/resource-agents/glance-api.pid

kill the process:
       Controller-# $ sudo kill -9 ####

Verify that alarm entry is represented in GUI and includes:
   alarm ID - 300.003
   severity - Major
   entity instance ID service_domain=<domain_name>.service_group=<group_name>
   proposed repair action Contact next level of support
   reason text Service group failure; <list of affected services>. Service group degraded; <list of affected services>. Service group warning; <list of affected services>.

Make sure that "glance-api.pid " got another pid.
    Controller-# $ cat /var/run/resource-agents/glance-api.pid

Expected Behavior
------------------
Once the the process id of "/glance-api.pid" is killed there should be an alarm logged.

Actual Behavior
----------------
Once the process id of "glance-api.pid" is killed and a new process is assigned to it there is no alarm/event logged in the system

Reproducibility
---------------
State if the issue is 100% reproducible

System Configuration
--------------------
Virtual Machine - Multinode - External Storage - 2 controllers + 2 computes + 2 stororages

Branch/Pull Time/Commit
-----------------------
NA

Timestamp/Logs
--------------
Logs attached

Ghada Khalil (gkhalil) on 2018-10-18
summary: - Killing glance.api.pid process is not creating any kind of fm alarm-
- list.
+ Killing glance.api.pid process is not creating any kind of fm alarm-list
Ghada Khalil (gkhalil) wrote :

Targeting stx.2019.03 - The process recovers; it's just the alarm that's missing. This is not severe enough to block the stx.2018.10 release.

Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.ha
tags: added: stx.2019.03
removed: stx.2018.10
Changed in starlingx:
status: New → Triaged
assignee: nobody → Bruce Jones (brucej)
Numan Waheed (nwaheed) wrote :

This test case passed on 10.10.2018 in 2 nodes baremetal system using stx.2018.10 build.

Hi Numan many Thanks please try with Virtual it is where the issue was found. However is good to know that is passing in BM.

Thanks!

Ghada Khalil (gkhalil) wrote :

Numan provided the info above as a data point. The design prime assigned this issue will need to try and reproduce it.

Bruce Jones (brucej) wrote :

Cindy, please assign an engineer to debug and fix this, thanks!

Changed in starlingx:
assignee: Bruce Jones (brucej) → Cindy Xie (xxie1)
haitao wang (hwang85) on 2018-10-26
Changed in starlingx:
assignee: Cindy Xie (xxie1) → haitao wang (hwang85)
haitao wang (hwang85) on 2018-11-07
Changed in starlingx:
status: Triaged → In Progress
haitao wang (hwang85) wrote :

Confirmed in virtual case. In simplex, even alarm is not logged for the first deleting of process ID.

Lin Shuicheng (shuicheng) wrote :

When the process is killed, sm will re-enable the service per setting. But sm will not send alarm log to fm, so there is no log/event created in fm alarm list. This is the expected behavior with current code. Need check with SM/FM owner to determine whether we need add this kind of log or not.

Here is the log from sm.log:
sm.log:
2018-11-26T02:32:07.000 controller-0 sm: debug time[327342.281] log<626> INFO: sm[20372]: sm_service_fsm.c(1451): Service (glance-api) process failure, pid=25486, exit_code=-65533.
2018-11-26T02:32:07.000 controller-0 sm: debug time[327342.281] log<627> INFO: sm[20372]: sm_service_fsm.c(1032): Service (glance-api) received event (process-failure) was in the enabled-active state and is now in the disabled state.
2018-11-26T02:32:07.000 controller-0 sm: debug time[327342.631] log<628> INFO: sm[20372]: sm_service_action.c(98): Plugin (/usr/lib/ocf/resource.d/openstack/glance-api) has been changed, was=00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, now=33f45fb614c61bd0de9c792074242122317125e578f10e982142915f86dca947cb6436dfd5254033cb6c09300a66ad2fc9a0b19c03b8b3c0ca2ea17b955b072f.
2018-11-26T02:32:07.000 controller-0 sm: debug time[327342.631] log<629> INFO: sm[20372]: sm_service_enable.c(459): Started enable action (11134) for service (glance-api).
2018-11-26T02:32:07.000 controller-0 sm: debug time[327342.631] log<630> INFO: sm[20372]: sm_service_fsm.c(1032): Service (glance-api) received event (enable-throttle) was in the disabled state and is now in the enabling state.
2018-11-26T02:32:07.000 controller-0 sm: debug time[327342.668] log<631> INFO: sm[20372]: sm_service_enable.c(361): Action (enable) completed with result (success), state (unknown), status (), and condition () for service (glance-api), reason_text=, exit_code=0.
2018-11-26T02:32:07.000 controller-0 sm: debug time[327342.668] log<632> INFO: sm[20372]: sm_service_fsm.c(1032): Service (glance-api) received event (enable-success) was in the enabling state and is now in the enabled-active state.

sm-customer.log:
| 2018-11-26T02:32:07.400 | 327 | service-scn | glance-api | enabled-active | disabled | process (pid=25486) failed
| 2018-11-26T02:32:07.750 | 328 | service-scn | glance-api | disabled | enabling | enabled-active state requested
| 2018-11-26T02:32:07.787 | 329 | service-scn | glance-api | enabling | enabled-active | enable success

haitao wang (hwang85) wrote :

Besides, same behavior(no fm alarm list is created) is observed when other process( e.g. Neutron, Cinder, etc.) is deleted. So it is common feature to be added if we need fix here.

haitao wang (hwang85) wrote :

Confirmed with Eric MacDonald that it is not a bug and it is expected behavior.

Changed in starlingx:
status: In Progress → Invalid
Ken Young (kenyis) on 2019-01-18
tags: added: stx.2019.05
removed: stx.2019.03
Ken Young (kenyis) on 2019-04-05
tags: added: stx.2.0
removed: stx.2019.05
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers