stx: sm won't go standby

Bug #2003117 reported by Rafael Falcão
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Rafael Falcão

Bug Description

Brief Description
-----------------
We are currently removing the guest-agent service from the platform since it is no more needed. Some queries in the create_sm_db on the ha repo are causing issues in the sm. Because of those issues, sm won't go standby.

Severity
--------
Provide the severity of the defect.
Major

Steps to Reproduce
------------------
- Upgrade test (or)
- DX fresh install

Expected Behavior
------------------
No error log related to guest-agent queries should appear in the sm log file and the sm should be able to go standby

Actual Behavior
----------------
Error logs related to guest-agent queries appears in the sm log file and the sm is not able to go standby

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Founded in DX

Timestamp/Logs
--------------
2023-01-09T16:32:42.786 controller-1 sm: debug time[337.619] log<454> ERROR: sm[81038]: sm_service_action.c(72): Failed to open file (/usr/lib/ocf/resource.d/platform/guestAgent).
2023-01-09T16:32:42.786 controller-1 sm: debug time[337.619] log<455> ERROR: sm[81038]: sm_service_action.c(702): Service (guest-agent) plugin (/usr/lib/ocf/resource.d/platform/guestAgent) access failed, error=No such file or directory.
2023-01-09T16:32:42.786 controller-1 sm: debug time[337.619] log<456> ERROR: sm[81038]: sm_service_audit.c(374): Failed to run audit-enabled action for service (guest-agent), error=FAILED.
2023-01-09T16:32:42.786 controller-1 sm: debug time[337.619] log<457> ERROR: sm[81038]: sm_service_unknown_state.c(27): Failed to audit service (guest-agent), error=FAILED
2023-01-09T16:32:42.786 controller-1 sm: debug time[337.619] log<458> ERROR: sm[81038]: sm_service_fsm.c(204): Service (guest-agent) unable to enter state (unknown), error=FAILED.
2023-01-09T16:32:42.786 controller-1 sm: debug time[337.619] log<459> ERROR: sm[81038]: sm_service_fsm.c(661): Failed to enter state (unknown) service (guest-agent), error=FAILED.
2023-01-09T16:32:42.786 controller-1 sm: debug time[337.619] log<460> ERROR: sm[81038]: sm_service_initial_state.c(58): Failed to set service (guest-agent) state (unknown), error=FAILED.
2023-01-09T16:32:42.786 controller-1 sm: debug time[337.619] log<461> ERROR: sm[81038]: sm_service_fsm.c(855): Service (guest-agent) unable to handle event (audit) in state (initial), error=FAILED.
2023-01-09T16:32:42.786 controller-1 sm: debug time[337.619] log<462> ERROR: sm[81038]: sm_service_engine.c(440): Event (audit) not handled for service (guest-agent).
2023-01-09T16:32:43.300 controller-1 sm: debug time[338.133] log<463> ERROR: sm[81038]: sm_service_action.c(702): Service (guest-agent) plugin (/usr/lib/ocf/resource.d/platform/guestAgent) access failed, error=No such file or directory.
2023-01-09T16:32:43.300 controller-1 sm: debug time[338.134] log<464> ERROR: sm[81038]: sm_service_audit.c(374): Failed to run audit-enabled action for service (guest-agent), error=FAILED.
2023-01-09T16:32:43.300 controller-1 sm: debug time[338.134] log<465> ERROR: sm[81038]: sm_service_unknown_state.c(27): Failed to audit service (guest-agent), error=FAILED
2023-01-09T16:32:43.300 controller-1 sm: debug time[338.134] log<466> ERROR: sm[81038]: sm_service_fsm.c(204): Service (guest-agent) unable to enter state (unknown), error=FAILED.
2023-01-09T16:32:43.300 controller-1 sm: debug time[338.134] log<467> ERROR: sm[81038]: sm_service_fsm.c(661): Failed to enter state (unknown) service (guest-agent), error=FAILED.
2023-01-09T16:32:43.300 controller-1 sm: debug time[338.134] log<468> ERROR: sm[81038]: sm_service_initial_state.c(58): Failed to set service (guest-agent) state (unknown), error=FAILED.
2023-01-09T16:32:43.300 controller-1 sm: debug time[338.134] log<469> ERROR: sm[81038]: sm_service_fsm.c(855): Service (guest-agent) unable to handle event (audit) in state (initial), error=FAILED.
2023-01-09T16:32:43.300 controller-1 sm: debug time[338.134] log<470> ERROR: sm[81038]: sm_service_engine.c(440): Event (audit) not handled for service (guest-agent).
2023-01-09T16:32:43.813 controller-1 sm: debug time[338.646] log<471> ERROR: sm[81038]: sm_service_action.c(702): Service (guest-agent) plugin (/usr/lib/ocf/resource.d/platform/guestAgent) access failed, error=No such file or directory.

Test Activity
-------------
Developer Testing

Workaround
----------
N/A

Tags: stx.8.0 stx.ha
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ha (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/ha/+/870859

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ha (master)

Reviewed: https://review.opendev.org/c/starlingx/ha/+/870859
Committed: https://opendev.org/starlingx/ha/commit/d312729809285e2901b3bf051b54cba37389a8ae
Submitter: "Zuul (22348)"
Branch: master

commit d312729809285e2901b3bf051b54cba37389a8ae
Author: Rafael Falcao <email address hidden>
Date: Tue Jan 17 16:29:12 2023 -0300

    Remove guest-agent related queries from sm database

    The guest-agent service it is currently being activated
    in setups where stx-openstack is applied but it's not
    being used since we went to containerized openstack.
    Since this service is no longer being used we are currently
    removing the service and all related queries that are on
    the create_sm_db file.

    Test Plan:
    PASS: Perform a fresh install on a duplex environment and
    check that no error log related to guest-agent is appearing
    in the sm log file and no 400.02 alarm was raised on
    fm alarm-list.

    Closes-Bug: 2003117

    Signed-off-by: Rafael Falcao <email address hidden>
    Change-Id: I145bd8a45c12319facc4d1eff90b785a33a1d2c0

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → High
tags: added: stx.8.0 stx.ha
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Rafael Falcão (rafaelvfalc)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.