VIM patch strategy unlock timeout sometimes occurs

Bug #1986972 reported by Al Bailey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Al Bailey

Bug Description

Brief Description
-----------------
In a large environment (many computes) if many hosts are patched in parallel, the unlock commands may timeout due to the operation taking longer than 45 seconds to generate the hieradata.

Severity
--------
Minor

Steps to Reproduce
------------------
Try to unlock 10 hosts in parallel as part of a VIM patch strategy.

Expected Behavior
------------------
It should work

Actual Behavior
----------------
VIM sometimes reports a failure and you need to run the patch strategy again to ensure any remaining hosts are patched.

Reproducibility
---------------
This scenario and environment are not common. However it can be reproduced with 10 hosts in parallel.

System Configuration
--------------------
STD Duplex with many computes (and can include storage too)

Branch/Pull Time/Commit
-----------------------
This has always been like this. Lets say June 2022.

Last Pass
---------
It sometimes passes and sometimes fails, all due to what the load is like during hiera generation.

Timestamp/Logs
--------------
Here is a 50 second sysinv-api REST operation to issue the 'unlock'
sysinv 2022-08-18 14:12:30.341 117610 INFO sysinv.api.hooks.auditor [req-7d8fe9a5-4ffe-461c-8fd1-1b4744730be7 e95a8d78b90946adafde8f6a12cd30a6 330b2d6b53c643d5abea49536b8ac3c6] fd00:32:64::2 "PATCH /v1/ihosts/0f178ff9-09c6-47b1-937e-417a7ef9b1fe HTTP/1.0" status: 200 len: 6331 time: 50.3158700466 POST: [{u'path': u'/administrative', u'value': u'unlocked', u'op': u'replace'} etc....

Test Activity
-------------
Feature Testing

Workaround
----------
Run the strategy again

I am going to just increase the 45 second sysinv API timeout in the VIM from 45 seconds to 60 seconds.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nfv (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/nfv/+/853677

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nfv (master)

Reviewed: https://review.opendev.org/c/starlingx/nfv/+/853677
Committed: https://opendev.org/starlingx/nfv/commit/1861fe9af8929c668c2ee50670173a6dee9cc673
Submitter: "Zuul (22348)"
Branch: master

commit 1861fe9af8929c668c2ee50670173a6dee9cc673
Author: Al Bailey <email address hidden>
Date: Thu Aug 18 15:35:17 2022 +0000

    Increase sysinv API timeout from 45 to 60 seconds

    When the VIM issues a sysinv command, it will terminate the thread
    and report a 'timed out' for the plugin activity if the command
    takes longer than 45 seconds.

    Under heavy load (10 hosts patching in parallel) sysinv can take longer
    than 45 seconds to generate the hieradata for an unlock, and therefore
    the patch strategy can fail.

    This fix bumps the value up to 60.

    Any future improvements will focus on speeding up the generation
    of sysinv hieradata.

    Closes-Bug: 1986972
    Signed-off-by: Al Bailey <email address hidden>
    Change-Id: I1dbb6f1e3c5529de9199dea8453f32e954da2f51

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Al Bailey (albailey1974)
importance: Undecided → Low
tags: added: stx.8.0 stx.nfv
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.