controller-0 degraded after unlocked

Bug #1857423 reported by Joe Chan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Won't Fix
Low
Unassigned

Bug Description

controller-0 node status is degraded after unlocked

fm alarm-list shows following info:

controller-0 experienced a service-affecting failure. Auto-recovery in progress. Manual Lock and Unlock may be required if auto-recovery is unsuccessful.

Tags: stx.metal
Revision history for this message
Joe Chan (chens141) wrote :
Revision history for this message
Austin Sun (sunausti) wrote :
Download full text (3.3 KiB)

from mtclog.
2019-12-23T03:20:14.475 [101563.00142] controller-0 mtcAgent vim mtcVimApi.cpp ( 258) mtcVimApi_state_change : Info : controller-0 sending 'host' state change to vim (enabled)
2019-12-23T03:20:14.475 [101563.00143] controller-0 mtcAgent |-| mtcNodeHdlrs.cpp (1572) enable_handler : Info : controller-0 is ENABLED
2019-12-23T03:20:14.507 fmAlarmUtils.cpp(630): Sending FM clear alarm request: alarm_id (200.011), entity_id (host=controller-0)
2019-12-23T03:20:14.547 fmAlarmUtils.cpp(669): FM Response for clear alarm: (10), alarm_id (200.011), entity_id (host=controller-0)
2019-12-23T03:23:16.475 [101563.00144] controller-0 mtcAgent hbs nodeClass.cpp (5559) log_process_failure : Warn : controller-0 pmon: 'sm' process failed and is being auto recovered
2019-12-23T03:23:59.539 [101563.00145] controller-0 mtcAgent sig daemon_signal.cpp ( 106) daemon_signal_hdlr :Latncy: ... 9231.196695 msec - base level signal handler
2019-12-23T03:23:59.992 [101563.00146] controller-0 mtcAgent hbs nodeClass.cpp (5778) critical_process_failed :Error : controller-0 has critical 'sm' process failure
2019-12-23T03:23:59.992 fmAPI.cpp(490): Enqueue raise alarm request: UUID (c982f52b-530c-4342-a46a-80566b1c67b7) alarm id (200.022) instant id (host=controller-0.status=failed)
2019-12-23T03:23:59.992 [101563.00147] controller-0 mtcAgent hbs nodeClass.cpp (6188) allStateChange : Info : controller-0 unlocked-disabled-failed (seq:15)
2019-12-23T03:23:59.992 [101563.00148] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp ( 573) enable_handler :Error : controller-0 Main Enable FSM (from failed)
2019-12-23T03:23:59.992 [101563.00149] controller-0 mtcAgent msg mtcCtrlMsg.cpp ( 878) send_hbs_command : Info : controller-0 stop host service sent to controller-0 hbsAgent
2019-12-23T03:23:59.992 [101563.00150] controller-0 mtcAgent hbs nodeClass.cpp (1681) alarm_enabled_failure :Error : controller-0 critical enable failure
2019-12-23T03:23:59.992 [101563.00151] controller-0 mtcAgent alm mtcAlarm.cpp ( 417) mtcAlarm_critical :Error : controller-0 setting critical 'In-Service' failure alarm (200.004 )
2019-12-23T03:23:59.992 fmAPI.cpp(490): Enqueue raise alarm request: UUID (03a61d0f-5116-4ff6-a817-b5605cf73845) alarm id (200.004) instant id (host=controller-0)
2019-12-23T03:23:59.992 fmAPI.cpp(490): Enqueue raise alarm request: UUID (0967d088-2741-46a7-97bf-e0dd716dedc0) alarm id (200.022) instant id (host=controller-0.state=enabled)
2019-12-23T03:23:59.992 [101563.00152] controller-0 mtcAgent hbs nodeClass.cpp (6188) allStateChange : Info : controller-0 unlocked-enabled-degraded (seq:16)
2019-12-23T03:23:59.992 [101563.00153] controller-0 mtcAgent vim mtcVimApi.cpp ( 251) mtcVimApi_state_change :Error : controller-0 {"state-change": {"administrative":"unlocked","operational":"enabled","availability":"degraded","subfunction_oper":"disabled","subfunction_avail":"not-installed"},"hostname":"controller-0","uuid":"d961ea9d-f324-4c09-b87e-911dfd63fede","subfunctions":"controller","personality":"controller"}

 pmon check 'sm' process was abnormal and raise 200.006 alarm, then mtcAgent ...

Read more...

Revision history for this message
Ghada Khalil (gkhalil) wrote :

As per triage by Austin, this appears to be a stale alarm.
The current assumption is that this is a one-of occurrence as sanity/regression tests don't report this issue.

Changed in starlingx:
importance: Undecided → Low
status: New → Triaged
tags: added: stx.metal
Revision history for this message
Ramaswamy Subramanian (rsubrama) wrote :

No progress on this bug for more than 2 years. Candidate for closure.

If there is no update, this issue is targeted to be closed as 'Won't Fix' in 2 weeks.

Revision history for this message
Ramaswamy Subramanian (rsubrama) wrote :

Changing the status to 'Won't Fix' as there is no activity.

Changed in starlingx:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.