STX Debian: DX VM evacuation failed after reboot active controller

Bug #2030883 reported by Danilo
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Thales Elero Cervi

Bug Description

Brief Description
-----------------
Having VMs running on DX active controller; reboot -f active controller. VMs failed to evacuate to another host.

Severity
--------
Major

Steps to Reproduce
------------------
1. Boot VMs on DX active controller
2. reboot -f active controller

Expected Behaviour
------------------
vms are successfully evacuated and host is recovered after reboot

Actual Behaviour
----------------
VMs stay on same host

Reproducibility
---------------
Reproducible 100%

System Configuration
--------------------
Two node system

Branch/Pull Time/Commit
-----------------------
STX master 2023-08-04_20-01-38
STX-O master 2023-08-08 15:47:08

Last Pass
---------
2023_07_04-01_19

Timestamp/Logs
--------------
[2023-08-08 15:32:08,890] 350 DEBUG MainThread ssh.send :: Send 'nova --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne migration-list'
[2023-08-08 15:32:08,940] 552 DEBUG MainThread ssh.exec_cmd:: Expecting .*controller\-[01][:| ].*\$ in prompt
[2023-08-08 15:32:12,188] 472 DEBUG MainThread ssh.expect :: Output:
+----+--------------------------------------+--------------+--------------+----------------+--------------+---------------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+----------------------------------+----------------------------------+
| Id | UUID | Source Node | Dest Node | Source Compute | Dest Compute | Dest Host | Status | Instance UUID | Old Flavor | New Flavor | Created At | Updated At | Type | Project ID | User ID |
+----+--------------------------------------+--------------+--------------+----------------+--------------+---------------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+----------------------------------+----------------------------------+
| 8 | 771c1693-4198-462a-8d55-0323af578a55 | controller-0 | controller-1 | controller-0 | controller-1 | 192.168.206.3 | error | 67ef4e52-4059-4008-a5c9-39733e23fa15 | 28 | 28 | 2023-08-08T15:28:50.000000 | 2023-08-08T15:31:15.000000 | live-migration | b4a5f977138c4fef960c572b035f87a2 | bea9842577664d65832f142c4e733a13 |
| 7 | 404a5936-c992-4b5a-bfe5-1e839962ea3b | controller-1 | controller-0 | controller-1 | controller-0 | 192.168.206.2 | completed | e2e3689b-8d49-48c1-8b6b-50a4754dc011 | None | None | 2023-08-08T15:14:19.000000 | 2023-08-08T15:23:40.000000 | evacuation | b4a5f977138c4fef960c572b035f87a2 | cc41a25c2230409fbe4a246202ffc1bc |
| 6 | e3454455-6c89-46d2-a749-641b8b0cd53b | controller-1 | controller-0 | controller-1 | controller-0 | 192.168.206.2 | failed | e2e3689b-8d49-48c1-8b6b-50a4754dc011 | None | None | 2023-08-08T15:11:43.000000 | 2023-08-08T15:13:46.000000 | evacuation | b4a5f977138c4fef960c572b035f87a2 | cc41a25c2230409fbe4a246202ffc1bc |
| 5 | e49d10ee-0853-47d6-b05c-7f23271de42c | controller-1 | controller-0 | controller-1 | controller-0 | 192.168.206.2 | completed | 67ef4e52-4059-4008-a5c9-39733e23fa15 | None | None | 2023-08-08T15:11:43.000000 | 2023-08-08T15:23:40.000000 | evacuation | b4a5f977138c4fef960c572b035f87a2 | cc41a25c2230409fbe4a246202ffc1bc |
| 4 | 7540b04d-228b-40a2-aaf3-183b39bbd71c | controller-1 | controller-0 | controller-1 | controller-0 | 192.168.206.2 | completed | c4126def-b5d7-4c11-89c7-8b969351c0f2 | None | None | 2023-08-08T15:11:43.000000 | 2023-08-08T15:23:39.000000 | evacuation | b4a5f977138c4fef960c572b035f87a2 | cc41a25c2230409fbe4a246202ffc1bc |
| 3 | 35da2211-80d1-414a-8b04-5d5c450cafa2 | controller-1 | controller-0 | controller-1 | controller-0 | 192.168.206.2 | completed | 6b82a77c-b456-4bc8-956e-70f6d125372c | None | None | 2023-08-08T15:11:43.000000 | 2023-08-08T15:23:39.000000 | evacuation | b4a5f977138c4fef960c572b035f87a2 | cc41a25c2230409fbe4a246202ffc1bc |
+----+--------------------------------------+--------------+--------------+----------------+--------------+---------------+-----------+--------------------------------------+------------+------------+----------------------------+----------------------------+----------------+----------------------------------+----------------------------------+
]0;sysadmin@controller-0: ~sysadmin@controller-0:~$
[2023-08-08 15:32:12,189] 350 DEBUG MainThread ssh.send :: Send 'echo $?'
[2023-08-08 15:32:12,240] 472 DEBUG MainThread ssh.expect :: Output:
0
]0;sysadmin@controller-0: ~sysadmin@controller-0:~$
[2023-08-08 15:32:12,341] 22 DEBUG MainThread make_report.update_results:: ***Failure at test call: /home/cumulus/repositories/cgcs-wro-stx-sanity-duplex/CGCSAuto/keywords/vm_helper.py:1538: utils.exceptions.VMPostCheckFailed: Check failed post VM operation.

Test Activity
-------------
Sanity Testing

Workaround
N/A

Danilo (ddonasci)
description: updated
description: updated
Danilo (ddonasci)
description: updated
description: updated
Jim Beacom (jbeacom)
Changed in starlingx:
assignee: nobody → Thales Elero Cervi (tcervi)
status: New → Triaged
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Setting as high priority since this is reported from sanity

Changed in starlingx:
importance: Undecided → High
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-armada-app (master)

Reviewed: https://review.opendev.org/c/starlingx/openstack-armada-app/+/903630
Committed: https://opendev.org/starlingx/openstack-armada-app/commit/141f088a7e4b8c3571867386b01b03ba002ac196
Submitter: "Zuul (22348)"
Branch: master

commit 141f088a7e4b8c3571867386b01b03ba002ac196
Author: Thales Elero Cervi <email address hidden>
Date: Wed Dec 13 20:44:04 2023 -0300

    Add Neutron DHCP HA capabilities

    Aiming for the stx-openstack application hardening, this change
    implements the plugin logic to gather the number of available compute
    nodes and, for anything that is not an AIO-SX deployment, configure the
    neutron dhcp_agents_per_network count accordingly [1].
    When only two computes are available this config is set to 2, whenever
    three or more compute nodes are available this config is set to 3.

    There are scenarios in which the compute node fails and if it is hosting
    the DHCP agent for a given openstack network, VMs on that VM won't be
    able to receive DHCPOFFER and fail to receive ip and ping.

    Partial-Bug: 2030883

    [1] https://docs.openstack.org/neutron/latest/admin/config-dhcp-ha.html#enabling-dhcp-high-availability-by-default

    TEST PLAN:
    PASS - Build plugins (k8sapp_openstack)
    PASS - Build stx-openstack helm charts
    PASS - Apply stx-openstack and verify the Neutron config
           * Check dhcp_agents_per_network count:
           /etc/neutron/neutron.conf
           - AIO-SX = 1
           - AIO-DX = 2
    PASS - Execute stx-openstack sanity tests locally

    Change-Id: I6f32ac8a1706467be0be67920a3a750bb7c6b01e
    Signed-off-by: Thales Elero Cervi <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.