Switch Active controller might have some unnecessary delay

Bug #1895767 reported by zhipeng liu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Eric MacDonald

Bug Description

For duplex and multi-node setup, when we use system-host-swact command to switch active controller, the swact procedure will take much time.
We have done some test performance test on virtual duplex setup for this topic, and found that it need around 15+40 seconds.
1) We might reduce this 15s in below code, as there is a waiting timer delay in mtc side.
2) 40s is used for the task of disabling and enabling services, no big improve space.Current code can improve performance through adding idle cores to do this task.

In mtcNodeHdlrs.cpp, is it possible to decrease the delay time(MTC_TASK_UPDATE_DELAY/2) below, currently it is 15s.
Then maybe we can improve the SW-ACT performance.

/* Start / Init Stage */
        case MTC_SWACT__START:
        {
            plog ("%s Administrative SWACT Requested\n", node_ptr->hostname.c_str() );

            /* Cleanup and init the swact timer - start fresh */
            if ( node_ptr->mtcSwact_timer.tid )
            {
                wlog ("%s Cancelling outstanding Swact timer\n", node_ptr->hostname.c_str());
                mtcTimer_stop ( node_ptr->mtcSwact_timer );
            }
            mtcTimer_init ( node_ptr->mtcSwact_timer );

            /* reset error / control Counters to zero */
            nodeLinkClass::smgrEvent.count = 0 ;
            nodeLinkClass::smgrEvent.fails = 0 ;
            nodeLinkClass::smgrEvent.cur_retries = 0 ;

            /* Empty the event message strings */
            nodeLinkClass::smgrEvent.payload = "" ;
            nodeLinkClass::smgrEvent.response = "" ;

            /* Post a user message 'Swact: Request' and
             * then delay to allow it to be displayed */
            mtcInvApi_force_task ( node_ptr, MTC_TASK_SWACT_REQUEST );
            mtcTimer_start ( node_ptr->mtcSwact_timer, mtcTimer_handler, (MTC_TASK_UPDATE_DELAY/2) );
            node_ptr->swactStage = MTC_SWACT__QUERY ;
            break ;
        }

Changed in starlingx:
assignee: nobody → Eric MacDonald (rocksolidmtce)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to metal (master)

Fix proposed to branch: master
Review: https://review.opendev.org/752597

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to metal (master)

Reviewed: https://review.opendev.org/752597
Committed: https://git.openstack.org/cgit/starlingx/metal/commit/?id=16fcba39765a8abe56571da48f4b8590bc3d36eb
Submitter: Zuul
Branch: master

commit 16fcba39765a8abe56571da48f4b8590bc3d36eb
Author: Eric MacDonald <email address hidden>
Date: Thu Sep 17 21:30:32 2020 -0400

    Remove 15 second delay following Swact Request status update

    The Maintenance swact handler posts swact actions to the host
    controller's task status field. The Swact Request posting was
    followed by a 15 second wait so that Horizon would be
    displaying "Swact: Request" while the swact occurred.

    Unfortunately, this delayed the actual swact request for that
    entire wait period thereby adding 15 seconds to the overall
    manual swact operation.

    Since it's better to run swact faster compared to waiting for
    the status, this update removes that delay at the risk the
    "Swact: Request" status not get displayed prior to the swact
    taking place.

    Change-Id: I635c896327dca2312efbe02dec67d3e920fa3e90
    Closes-Bug: 1895767
    Signed-off-by: Eric MacDonald <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.5.0 / medium priority - optimization to reduce swact times

tags: added: stx.metal
tags: added: stx.5.0
Changed in starlingx:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.