stx-openstack: nova-compute service and hypervisor stuck in an enable/disable loop

Bug #2015088 reported by Luan Nunes Utimura
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Luan Nunes Utimura

Bug Description

Brief Description
-----------------
After applying stx-openstack and performing a lock/unlock on controller-0, it has been observed that both nova-compute service and hypervisor are stuck in an enable/disable loop.

Severity
--------
Major.

Steps to Reproduce
------------------
On AIO-SX:

1) Apply stx-openstack;
2) Lock/unlock controller-0;
3) After the unlock, verify the intermittency by:

   Watching the compute services and hypervisors:
   $ watch -d 'openstack compute service list --long; openstack hypervisor list --long'

   Following the NFV-related logs:
   $ tail -f /var/log/nfv*.log

Expected Behavior
------------------
After the unlock, both nova-compute service and hypervisor are up and running.

Actual Behavior
----------------
After the unlock, both nova-compute service and hypervisor are stuck in an enable/disable loop, changing state every 1 minute.

Reproducibility
---------------
Reproducible on an AIO-SX.

System Configuration
--------------------
AIO-SX.

Branch/Pull Time/Commit
-----------------------
Master.

Last Pass
---------
N/A.

Timestamp/Logs
--------------

Output of `fm event-list`:

+-------------+-------+-------+------------------------------------------------------+--------------------------------------+----------+
| Time Stamp | State | Event | Reason Text | Entity Instance ID | Severity |
| | | Log | | | |
| | | ID | | | |
+-------------+-------+-------+------------------------------------------------------+--------------------------------------+----------+
| 2023-04-02T | log | 275. | Host controller-0 hypervisor is now unlocked-enabled | host=controller-0.hypervisor= | critical |
| 05:02:33. | | 001 | | 79a504d8-6769-4058-8e77-cbd9d9dcf45d | |
| 120261 | | | | | |
| | | | | | |
| 2023-04-02T | log | 275. | Host controller-0 hypervisor is now locked-disabled | host=controller-0.hypervisor= | critical |
| 05:01:17. | | 001 | | 79a504d8-6769-4058-8e77-cbd9d9dcf45d | |
| 480581 | | | | | |
| | | | | | |
| 2023-04-02T | log | 275. | Host controller-0 hypervisor is now locked-enabled | host=controller-0.hypervisor= | critical |
| 05:01:11. | | 001 | | 79a504d8-6769-4058-8e77-cbd9d9dcf45d | |
| 290467 | | | | | |
| | | | | | |
| 2023-04-02T | log | 275. | Host controller-0 hypervisor is now unlocked-enabled | host=controller-0.hypervisor= | critical |
| 04:59:50. | | 001 | | 79a504d8-6769-4058-8e77-cbd9d9dcf45d | |
| 014403 | | | | | |
| | | | | | |
| 2023-04-02T | log | 275. | Host controller-0 hypervisor is now locked-disabled | host=controller-0.hypervisor= | critical |
| 04:58:34. | | 001 | | 79a504d8-6769-4058-8e77-cbd9d9dcf45d | |
| 920412 | | | | | |
| | | | | | |
| 2023-04-02T | log | 275. | Host controller-0 hypervisor is now locked-enabled | host=controller-0.hypervisor= | critical |
| 04:58:29. | | 001 | | 79a504d8-6769-4058-8e77-cbd9d9dcf45d | |
| 131584 | | | | | |
| | | | | | |
| 2023-04-02T | log | 275. | Host controller-0 hypervisor is now unlocked-enabled | host=controller-0.hypervisor= | critical |
| 04:57:08. | | 001 | | 79a504d8-6769-4058-8e77-cbd9d9dcf45d | |
| 139377 | | | | | |
| | | | | | |
| 2023-04-02T | log | 275. | Host controller-0 hypervisor is now locked-disabled | host=controller-0.hypervisor= | critical |
| 04:55:53. | | 001 | | 79a504d8-6769-4058-8e77-cbd9d9dcf45d | |
| 755190 | | | | | |
| | | | | | |
| 2023-04-02T | log | 275. | Host controller-0 hypervisor is now locked-enabled | host=controller-0.hypervisor= | critical |
| 04:55:47. | | 001 | | 79a504d8-6769-4058-8e77-cbd9d9dcf45d | |
| 549136 | | | | | |
| | | | | | |
| 2023-04-02T | log | 275. | Host controller-0 hypervisor is now unlocked-enabled | host=controller-0.hypervisor= | critical |
| 04:54:27. | | 001 | | 79a504d8-6769-4058-8e77-cbd9d9dcf45d | |
| 669352 | | | | | |
| | | | | | |
+-------------+-------+-------+------------------------------------------------------+--------------------------------------+----------+

Test Activity
-------------
Developer Testing.

Workaround
----------
The intermittency seems to stop after disabling the guest plugin for VIM:

$ sudo sed -i 's/guest_plugin_disabled=False/guest_plugin_disabled=True/' /etc/nfv/vim/config.ini
$ sudo sm-restart-safe service vim; sudo sm-restart-safe service vim-api

Changed in starlingx:
assignee: nobody → Luan Nunes Utimura (lutimura)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/879359

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nfv (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/nfv/+/879545

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/879359
Committed: https://opendev.org/starlingx/stx-puppet/commit/c365ae5f8a248569e6eb0a4f7af8c5d5cafa0cc4
Submitter: "Zuul (22348)"
Branch: master

commit c365ae5f8a248569e6eb0a4f7af8c5d5cafa0cc4
Author: Luan Nunes Utimura <email address hidden>
Date: Mon Apr 3 11:35:51 2023 -0300

    Disable guest plugin loading in VIM

    Following the work previously done in [1] and [2] to deactivate
    guest-related services in VIM, since they were no longer being utilized
    and causing coredump issues in the platform, this commit changes the
    default value for the `guest_plugin_disable` config variable so
    that Puppet won't reinforce the guest plugin loading in VIM.

    As reported in [3], loading this plugin while having some of its
    services deactivated (or functionalities removed) has proven to be
    a problem when stx-openstack is applied, as both nova-compute service
    and hypervisor are caught in an enable/disable loop indefinitely after
    the first host lock/unlock with the application applied.

    [1] https://review.opendev.org/c/starlingx/nfv/+/869817
    [2] https://review.opendev.org/c/starlingx/nfv/+/870538
    [3] https://bugs.launchpad.net/starlingx/+bug/2015088

    Test Plan (on AIO-SX):
    PASS - Build puppet-nfv package
    PASS - Build and install ISO
    PASS - Upload and apply stx-openstack
    PASS - Verify that the `guest_plugin_disable` configuration variable
           remains `True` after the application is applied:
           $ grep 'guest_plugin_disable' /etc/nfv/vim/config.ini
    PASS - Lock and unlock controller-0
    PASS - Verify that both nova-compute service and hypervisor are no
           longer intermittent after the unlock

    Closes-Bug: 2015088

    Signed-off-by: Luan Nunes Utimura <email address hidden>
    Change-Id: Iaebc8cc37eabe7b2b685622a5772544b4bce21dc

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nfv (master)

Reviewed: https://review.opendev.org/c/starlingx/nfv/+/879545
Committed: https://opendev.org/starlingx/nfv/commit/db3f5525d81f5abddba9181fce655348202a818f
Submitter: "Zuul (22348)"
Branch: master

commit db3f5525d81f5abddba9181fce655348202a818f
Author: Luan Nunes Utimura <email address hidden>
Date: Mon Apr 3 11:27:07 2023 -0300

    NFVI: Default guest_plugin_disabled to True

    Following the work previously done in [1] and [2] to deactivate
    guest-related services in VIM, this commit changes the default value of
    config variable `guest_plugin_disabled` to `True` so that it reflects
    the change proposed in [3] (same modification but on Puppet's side).

    As reported in [3], loading this plugin while having some of its
    services deactivated (or functionalities removed) has proven to be
    a problem when stx-openstack is applied, as both nova-compute service
    and hypervisor are caught in an enable/disable loop indefinitely after
    the first host lock/unlock with the application applied.

    [1] https://review.opendev.org/c/starlingx/nfv/+/869817
    [2] https://review.opendev.org/c/starlingx/nfv/+/870538
    [3] https://review.opendev.org/c/starlingx/stx-puppet/+/879359

    Test Plan (on AIO-SX):
    PASS - Remove 'guest_plugin_disabled' from /etc/nfv/vim/config.ini,
           reload VIM services and verify that the guest plugin wasn't
           loaded by default:
           $ tail -f /var/log/nfv*.log

    Related-Bug: 2015088

    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/879359

    Signed-off-by: Luan Nunes Utimura <email address hidden>
    Change-Id: I7e254fe2db2a6bcc6b98a26cc712c5d03ef7ffad

Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.9.0 stx.config stx.nfv
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.