stx-openstack: Lock standby controller fails to disable services

Bug #2000483 reported by Thales Elero Cervi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Rafael Falcão

Bug Description

Brief Description
-----------------
When testing stx-openstack on Debian, the action of lock a worker node is currently not working.

Severity
--------
Major: Can not lock an AIO stand-by controller (worker node) with stx-openstack applied

Steps to Reproduce
------------------
* Apply stx-openstack application
* Lock (system host-lock <host>) an AIO stand-by controller (worker node)

Expected Behavior
------------------
Stand-by controller (worker node) should lock successfully

Actual Behavior
----------------
Stand-by controller (worker node) fails to disable services (VIM) and fails to lock

Reproducibility
---------------
Reproducible with stx-openstack applied
Not reproducible without stx-openstack

System Configuration
--------------------
AIO-DX

Branch/Pull Time/Commit
-----------------------
master:
* starlingx/master/debian/monolithic/20221221T070000Z

Last Pass
---------
N/A on Debian

Timestamp/Logs
--------------
sysinv 2022-12-26 17:27:51.102 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 host check_lock_worker
sysinv 2022-12-26 17:27:51.128 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 action=lock ihost_val_prenotify: {'ihost_action': 'lock'} ihost_val: {'ihost_action': 'lock', 'task': 'Locking'}
sysinv 2022-12-26 17:27:51.128 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 host.update.ihost_val_prenotify {'ihost_action': 'lock'}
sysinv 2022-12-26 17:27:51.128 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 action_check action=lock, notify_vim=True notify_mtce=True rc=True
sysinv 2022-12-26 17:27:51.128 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 post action_check hostupdate action=lock notify_vim=True notify_mtc=True skip_notify_mtce=False
sysinv 2022-12-26 17:27:51.129 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 stage_action lock
sysinv 2022-12-26 17:27:51.129 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 _handle_lock_action
sysinv 2022-12-26 17:27:51.129 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 Action staged: lock
sysinv 2022-12-26 17:27:51.129 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 post action_stage hostupdate action=lock notify_vim=True notify_mtc=True skip_notify_mtce=True
sysinv 2022-12-26 17:27:51.130 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 2. delta_handle ['action']
sysinv 2022-12-26 17:27:51.130 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 post delta_handle hostupdate action=lock notify_vim=True notify_mtc=True skip_notify_mtce=True
sysinv 2022-12-26 17:27:51.130 75819 INFO sysinv.api.controllers.v1.host [-] update ihost_val_prenotify: {'ihost_action': 'lock'}
sysinv 2022-12-26 17:27:51.149 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 apply ihost_val {'ihost_action': 'lock', 'task': 'Locking'}
sysinv 2022-12-26 17:27:51.150 75819 INFO sysinv.api.controllers.v1.host [-] Notify VIM host action controller-1 action=lock
sysinv 2022-12-26 17:27:51.150 75819 WARNING sysinv.api.controllers.v1.vim_api [-] vim_host_action hostname=controller-1, action=lock
sysinv 2022-12-26 17:27:51.150 75819 WARNING sysinv.api.controllers.v1.vim_api [-] vim_host_action hostname=controller-1, action=lock api_cmd=http://---/nfvi-plugins/v1/hosts/79b92---9 headers={'Content-type': 'application/json', 'User-Agent': 'sysinv/1.0'} payload={'uuid': '79b92---9', 'hostname': 'controller-1', 'action': 'lock'}
sysinv 2022-12-26 17:27:51.166 74874 INFO sysinv.conductor.manager [-] Evaluating apps reapply {'type': 'lock', 'configure_required': False}
sysinv 2022-12-26 17:27:51.178 74874 INFO sysinv.conductor.manager [-] Apps reapply order: ['cert-manager', 'platform-integ-apps', 'stx-openstack']
sysinv 2022-12-26 17:27:51.179 74874 INFO sysinv.conductor.kube_app [-] lifecycle hook for application cert-manager (1.0-1) started {'mode': 'auto', 'lifecycle_type': 'check', 'operation': 'evaluate-reapply', 'ext
ra': {'trigger': {'type': 'lock', 'configure_required': False}}}.
sysinv 2022-12-26 17:27:51.180 74874 INFO sysinv.conductor.kube_app [-] lifecycle hook for application platform-integ-apps (1.0-53) started {'mode': 'auto', 'lifecycle_type': 'check', 'operation': 'evaluate-reappl
y', 'extra': {'trigger': {'type': 'lock', 'configure_required': False}}}.
sysinv 2022-12-26 17:27:51.180 74874 INFO sysinv.conductor.kube_app [-] lifecycle hook for application stx-openstack (1.0-1.stx.4) started {'mode': 'auto', 'lifecycle_type': 'check', 'operation': 'evaluate-reapply
', 'extra': {'trigger': {'type': 'lock', 'configure_required': False}}}.
sysinv 2022-12-26 17:27:51.296 75819 INFO sysinv.api.controllers.v1.host [-] Provisioned storage node(s) []
sysinv 2022-12-26 17:27:51.310 75819 INFO sysinv.api.controllers.v1.host [-] host controller-1 ihost_patch_end_2022-12-26-17-27-51 patch
sysinv 2022-12-26 17:27:51.400 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 ihost_patch_start_2022-12-26-17-27-51 patch
sysinv 2022-12-26 17:27:51.401 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 1. delta_handle ['action', 'vim_progress_status']
sysinv 2022-12-26 17:27:51.401 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 Pending update_vim_progress_status services-disable-failed
sysinv 2022-12-26 17:27:51.402 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 action=services-disable-failed ihost_val_prenotify: {} ihost_val: {}
sysinv 2022-12-26 17:27:51.402 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 action_check action=services-disable-failed, notify_vim=False notify_mtce=True rc=True
sysinv 2022-12-26 17:27:51.402 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 post action_check hostupdate action=services-disable-failed notify_vim=False notify_mtc=True skip_notify_mtce=False
sysinv 2022-12-26 17:27:51.403 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 stage_action services-disable-failed
sysinv 2022-12-26 17:27:51.403 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 handle_vim_services_disable_failed ihost_action=lock
sysinv 2022-12-26 17:27:51.403 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 Action staged: services-disable-failed
sysinv 2022-12-26 17:27:51.404 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 post action_stage hostupdate action=services-disable-failed notify_vim=False notify_mtc=True skip_notify_mtce=True
sysinv 2022-12-26 17:27:51.404 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 2. delta_handle ['action', 'vim_progress_status']
sysinv 2022-12-26 17:27:51.404 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 post delta_handle hostupdate action=services-disable-failed notify_vim=False notify_mtc=True skip_notify_mtce=True
sysinv 2022-12-26 17:27:51.404 75819 INFO sysinv.api.controllers.v1.host [-] update ihost_val_prenotify: {'ihost_action': '', 'task': '', 'vim_progress_status': ''}
sysinv 2022-12-26 17:27:51.421 75819 INFO sysinv.api.controllers.v1.host [-] controller-1 apply ihost_val {'ihost_action': '', 'task': '', 'vim_progress_status': 'services-disable-failed'}
sysinv 2022-12-26 17:27:51.526 75819 INFO sysinv.api.controllers.v1.host [-] Provisioned storage node(s) []
sysinv 2022-12-26 17:27:51.543 75819 INFO sysinv.api.controllers.v1.host [-] host controller-1 ihost_patch_end_2022-12-26-17-27-51 patch
sysinv 2022-12-26 17:27:57.048 74874 INFO sysinv.conductor.manager [-] Node(s) are in an unstable state. Defer audit.
sysinv 2022-12-26 17:28:37.330 75819 INFO sysinv.api.controllers.v1.host [-] controller-0 ihost_patch_start_2022-12-26-17-28-37 patch
sysinv 2022-12-26 17:28:37.330 75819 INFO sysinv.api.controllers.v1.host [-] controller-0 ihost_patch_end. No changes from mtce/1.0.
sysinv 2022-12-26 17:28:56.922 74874 INFO sysinv.conductor.manager [-] Audit clearing vim_progress_status=services-disable-failed..
sysinv 2022-12-26 17:28:57.036 74874 INFO sysinv.conductor.manager [-] Node(s) are in an unstable state. Defer audit.

Test Activity
-------------
Developer Testing

Workaround
None

Changed in starlingx:
assignee: nobody → Thales Elero Cervi (tcervi)
description: updated
Revision history for this message
Rafael Falcão (rafaelvfalc) wrote :

Found some useful log that might give us a hint of what is happening:

2023-01-06T12:05:42.550 controller-0 VIM_Thread[1425779] ERROR Caught exception while trying to disable controller-1 guest services, error=[OpenStack Exception: method=PUT, url=http://localhost:2410/v1/hosts/4736a9da-97f0-4eea-ab96-729aee2170 1b/disable, headers={'Content-Type': 'application/json', 'User-Agent': 'vim/1.0'}, body={"uuid": "4736a9da-97f0-4eea-ab96-729aee21701b", "hostname": "controller-1"}, reason=]. Traceback (most recent call last): File "/usr/lib/python3/dist-packages/nfv_plugins/nfvi_plugins/nfvi_guest_api.py", line 867, in disable_host_services future.result = (yield) nfv_plugins.nfvi_plugins.openstack.exceptions.OpenStackException: [OpenStack Exception: method=PUT, url=http://localhost:2410/v1/hosts/4736a9da-97f0-4eea-ab96-729aee21701b/disable, headers={'Content-Type': 'application/json', 'User-Agent': 'vim/1.0'}, body={"uuid": "4736 a9da-97f0-4eea-ab96-729aee21701b", "hostname": "controller-1"}, reason=]

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nfv (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/nfv/+/870538

Changed in starlingx:
status: New → In Progress
Changed in starlingx:
assignee: Thales Elero Cervi (tcervi) → Rafael Falcão (rafaelvfalc)
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nfv (master)

Reviewed: https://review.opendev.org/c/starlingx/nfv/+/870538
Committed: https://opendev.org/starlingx/nfv/commit/d5f304832f1db0cb412e85ef840fea13d23e6c7f
Submitter: "Zuul (22348)"
Branch: master

commit d5f304832f1db0cb412e85ef840fea13d23e6c7f
Author: Rafael Falcao <email address hidden>
Date: Mon Jan 16 09:54:07 2023 -0300

    Remove HostTask actions of guest related services

    Since we are currently deactivating all guest
    related services [1][2][3] we now need to remove
    the add, delete, enable and disable sections of
    those services from the HostTask state machine
    to be able to correctly perform tasks like
    lock and unlock of the host.

    [1] https://review.opendev.org/c/starlingx/stx-puppet/+/869474
    [2] https://review.opendev.org/c/starlingx/tools/+/870433
    [3] https://review.opendev.org/c/starlingx/nfv/+/869817

    Test Plan:
    PASS: Generate the debian image without the code that enable
    and disable the guest services during a lock/unlock
    PASS: Perform lock/unlock of all controllers in the system
    with stx-openstack applied.
    PASS: Perform a delete/add of a host in the system with
    stx-openstack applied.

    Closes-Bug: 2000483

    Signed-off-by: Rafael Falcao <email address hidden>
    Change-Id: Iaa2d0ff71a21fb6ee03ecaa53415859982070985

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.