AIO-DX: After host-swact, standby oam-services are in disabled-failed status

Bug #2043506 reported by Peng Peng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Eliud Kyale

Bug Description

Brief Description
-----------------
DX system, after host-swact, standby oam-services is in disabled-failed status.
active controller did not change.

Severity
--------
Major

Steps to Reproduce
------------------
host-swact in DX system

TC-name:
test_swact.py::test_swact_controller_platform

Expected Behavior
------------------
host-swact success

Actual Behavior
----------------
host-swact, failed

Reproducibility
---------------
This is the first time saw this issue

System Configuration
--------------------
Two node system

Lab-name: SM_5-6

Branch/Pull Time/Commit
-----------------------
20231112T070059Z

Last Pass
---------
220231029T060059Z

Timestamp/Logs
--------------
[2023-11-14 15:56:15,320] 349 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list'
[2023-11-14 15:56:15,370] 551 DEBUG MainThread ssh.exec_cmd:: Expecting \[.*@controller\-[01] .*\(keystone_admin\)\]\$ in prompt
[2023-11-14 15:56:18,603] 471 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+-----------------------------+--------------+---------+
| uuid | service_group_name | hostname | state |
+--------------------------------------+-----------------------------+--------------+---------+
| d6f7f0c3-63a3-4266-b635-3477cbba587c | cloud-services | controller-0 | active |
| 5dfb31dd-a566-4018-b9a1-ade55b8f0512 | cloud-services | controller-1 | standby |
| e0becaf5-6417-4800-86fd-da902cc2ed56 | controller-services | controller-0 | active |
| 14da370c-0fec-47e5-8901-0aa5a1471303 | controller-services | controller-1 | standby |
| 5e48d411-c0cd-4d28-96a3-e8bcf01c3ed4 | directory-services | controller-0 | active |
| bd44d128-7724-459c-a7fe-04ec70d9e90c | directory-services | controller-1 | active |
| 86510df1-0942-4978-a73f-21275c3a7f9b | oam-services | controller-0 | active |
| 51f30fa9-694e-478a-91b3-5cffd8585f4c | oam-services | controller-1 | standby |
| f62a4764-e232-4cfa-b3ba-63f86c9cac5f | patching-services | controller-0 | active |
| 2b5a8a50-2bef-4803-8f70-93b34035871e | patching-services | controller-1 | standby |
| 6f463d23-40b2-4bbd-98eb-3572f6568016 | storage-monitoring-services | controller-0 | active |
| 581ccde4-d5f0-4515-afa8-291b47848e61 | storage-monitoring-services | controller-1 | standby |
| 6e01bc2d-853f-413c-a1c7-23bc8ade2080 | storage-services | controller-0 | active |
| c3796211-2506-482b-8934-2e5b1cb50b71 | storage-services | controller-1 | active |
| 7d8844ce-9387-4531-b13b-fa6f4db31e1c | vim-services | controller-0 | active |
| a8ae57b2-24df-4c55-b1ec-a130cb6c9907 | vim-services | controller-1 | standby |
| ec08880d-ff1e-430f-94c3-db7095dc82b8 | web-services | controller-0 | active |
| 89db6c85-b187-460e-afb3-933c4a8bc9de | web-services | controller-1 | active |
+--------------------------------------+-----------------------------+--------------+---------+
[sysadmin@controller-0 ~(keystone_admin)]$

[2023-11-14 15:56:18,831] 349 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-swact controller-0'

[2023-11-14 15:57:38,954] 349 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne servicegroup-list'
[2023-11-14 15:57:39,005] 551 DEBUG MainThread ssh.exec_cmd:: Expecting .*controller\-[01][:| ].*\$ in prompt
[2023-11-14 15:57:42,008] 471 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+-----------------------------+--------------+-----------------+
| uuid | service_group_name | hostname | state |
+--------------------------------------+-----------------------------+--------------+-----------------+
| d6f7f0c3-63a3-4266-b635-3477cbba587c | cloud-services | controller-0 | active |
| 5dfb31dd-a566-4018-b9a1-ade55b8f0512 | cloud-services | controller-1 | disabled |
| e0becaf5-6417-4800-86fd-da902cc2ed56 | controller-services | controller-0 | active |
| 14da370c-0fec-47e5-8901-0aa5a1471303 | controller-services | controller-1 | disabled-failed |
| 5e48d411-c0cd-4d28-96a3-e8bcf01c3ed4 | directory-services | controller-0 | active |
| bd44d128-7724-459c-a7fe-04ec70d9e90c | directory-services | controller-1 | active |
| 86510df1-0942-4978-a73f-21275c3a7f9b | oam-services | controller-0 | active-failed |
| 51f30fa9-694e-478a-91b3-5cffd8585f4c | oam-services | controller-1 | disabled-failed |
| f62a4764-e232-4cfa-b3ba-63f86c9cac5f | patching-services | controller-0 | standby |
| 2b5a8a50-2bef-4803-8f70-93b34035871e | patching-services | controller-1 | active |
| 6f463d23-40b2-4bbd-98eb-3572f6568016 | storage-monitoring-services | controller-0 | standby |
| 581ccde4-d5f0-4515-afa8-291b47848e61 | storage-monitoring-services | controller-1 | active |
| 6e01bc2d-853f-413c-a1c7-23bc8ade2080 | storage-services | controller-0 | active |
| c3796211-2506-482b-8934-2e5b1cb50b71 | storage-services | controller-1 | active |
| 7d8844ce-9387-4531-b13b-fa6f4db31e1c | vim-services | controller-0 | active |
| a8ae57b2-24df-4c55-b1ec-a130cb6c9907 | vim-services | controller-1 | disabled |
| ec08880d-ff1e-430f-94c3-db7095dc82b8 | web-services | controller-0 | active |
| 89db6c85-b187-460e-afb3-933c4a8bc9de | web-services | controller-1 | active |
+--------------------------------------+-----------------------------+--------------+-----------------+
]0;sysadmin@controller-0: ~sysadmin@controller-0:~$

Automation log:
http://128.224.186.235/auto_logs/sm_5_6/202311141540/case_8_test_swact_controller_platform/case_008_test_swact_controller_platform

collect log:

Test Activity
-------------
Sanity

Tags: stx.9.0 stx.ha
Revision history for this message
Peng Peng (ppeng) wrote :
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Issue resulted in yellow sanity, so marking as high priority. The ha team should start the investigation given the issue is related to swact

tags: added: stx.9.0 stx.ha
Changed in starlingx:
importance: Undecided → High
Ghada Khalil (gkhalil)
summary: - DX after host-swact, standby controller-services in disabled-failed
- status
+ AIO-DX: After host-swact, standby controller-services are in disabled-
+ failed status
Peng Peng (ppeng)
summary: - AIO-DX: After host-swact, standby controller-services are in disabled-
- failed status
+ AIO-DX: After host-swact, standby oam-services are in disabled-failed
+ status
description: updated
Jim Beacom (jbeacom)
Changed in starlingx:
assignee: nobody → salma police (spolice)
Jim Beacom (jbeacom)
Changed in starlingx:
status: New → Triaged
Peng Peng (ppeng)
description: updated
Jim Beacom (jbeacom)
Changed in starlingx:
assignee: salma police (spolice) → Eliud Kyale (ekyale)
status: Triaged → In Progress
Revision history for this message
Eliud Kyale (ekyale) wrote :

narrowed root cause down to the haproxy not starting on host-swact. haproxy is a critical service in the oam-services group.

Comparing logs on 2nd occurence

Revision history for this message
Eliud Kyale (ekyale) wrote :

haproxy.cfg requires dns resolution which means haproxy service now depends on dnsmasq. Need to add a dependancy between dnsmasq and haproxy

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to config-files (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config-files/+/905271

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ha (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/ha/+/905272

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ha (master)

Reviewed: https://review.opendev.org/c/starlingx/ha/+/905272
Committed: https://opendev.org/starlingx/ha/commit/0db57d60be9ba74866f22f9bebddf59a11a4897a
Submitter: "Zuul (22348)"
Branch: master

commit 0db57d60be9ba74866f22f9bebddf59a11a4897a
Author: Kyale, Eliud <email address hidden>
Date: Wed Jan 10 14:43:13 2024 -0500

    Add service dependancy haproxy dnsmasq

    haproxy uses dns resolution
    add service dependency to sm database
    to ensure that dnsmasq service is started before haproxy
    and dnsmasq is disabled after haproxy is disabled

    Test plan:

    PASS - AIO-SX: iso install
    PASS - AIO-SX: reboot test
    PASS - AIO-DX: iso install
    PASS - AIO-DX: swact test

    Closes-Bug: #2043506

    Change-Id: I494faebfe67843d34819f66a0a2fbd977657bb6b
    Signed-off-by: Kyale, Eliud <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to config-files (master)

Reviewed: https://review.opendev.org/c/starlingx/config-files/+/905271
Committed: https://opendev.org/starlingx/config-files/commit/e2588f064a5534f9196b12e987b686259d085c35
Submitter: "Zuul (22348)"
Branch: master

commit e2588f064a5534f9196b12e987b686259d085c35
Author: Kyale, Eliud <email address hidden>
Date: Wed Jan 10 14:22:07 2024 -0500

    Improve logging of haproxy init.d script

    redirect stdout and stderr logs to haproxy.log
    add logger logs to user.log
    to assist in debugging haproxy issues

    Test plan:

    PASS - AIO-SX: iso install
    PASS - AIO-SX: reboot testing
    PASS - AIO-DX: iso install
    PASS - AIO-DX: swact

    Related-Bug: #2043506
    Change-Id: I9d65bc74132e4fae56da736b46bdf55946bf5bcd
    Signed-off-by: Kyale, Eliud <email address hidden>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.