sysinv errors when vim slow to reply

Bug #1893948 reported by Al Bailey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Al Bailey

Bug Description

Brief Description
-----------------
There is a periodic task that runs in sysinv conductor that calls the VIM.
If the vim is not running, or fails to respond in 5 seconds (the timeout of the API call) stracktraces
are created in the sysinv log.

Since this is a periodic task, the next execution may work, thus this is mostly a cosmetic (logs) issue.

Severity
--------
Minor

Steps to Reproduce
------------------
Since this is a timing issue or exposes a bug in another area, there is no real way to reproduce.
Simplest method is to adjust the VIM code and add a 5 second sleep in the API call being invoked.
this ensures the 5 second timeout elapses and produced the expected error criteria.

Expected Behavior
------------------
The expected behaviour is to fail the API query and try again on the next iteration.

Actual Behavior
----------------
It fails the API query and tries again on the next iteration but the logs get polluted.
This leads delaying the install of oidc-auth-apps and platform-integ-apps

Reproducibility
---------------
Seen Once

System Configuration
--------------------
Distributed Cloud

Branch/Pull Time/Commit
-----------------------
After June 22, 2020 due to this commit that merged on that date.
https://github.com/starlingx/config/commit/8180909098ddf36ea1f5e62e6ae8dcd5f89b2b73

Last Pass
---------
Passes all the time. This issue is random, unless there are larger VIM problems

Timestamp/Logs
--------------
2020-08-25 12:19:22.283 91971 ERROR sysinv.openstack.common.periodic_task Traceback (most recent call last):
 2020-08-25 12:19:22.283 91971 ERROR sysinv.openstack.common.periodic_task File "/usr/lib64/python2.7/site-packages/sysinv/openstack/common/periodic_task.py", line 180, in run_periodic_tasks
 2020-08-25 12:19:22.283 91971 ERROR sysinv.openstack.common.periodic_task task(self, context)
 2020-08-25 12:19:22.283 91971 ERROR sysinv.openstack.common.periodic_task File "/usr/lib64/python2.7/site-packages/sysinv/conductor/manager.py", line 5483, in _k8s_application_audit
 2020-08-25 12:19:22.283 91971 ERROR sysinv.openstack.common.periodic_task if self._check_software_orchestration_in_progress():
 2020-08-25 12:19:22.283 91971 ERROR sysinv.openstack.common.periodic_task File "/usr/lib64/python2.7/site-packages/sysinv/conductor/manager.py", line 5440, in _check_software_orchestration_in_progress
 2020-08-25 12:19:22.283 91971 ERROR sysinv.openstack.common.periodic_task if vim_resp['sw-update-type'] is not None and \
 2020-08-25 12:19:22.283 91971 ERROR sysinv.openstack.common.periodic_task TypeError: 'NoneType' object has no attribute '_getitem_'
 2020-08-25 12:19:22.283 91971 ERROR sysinv.openstack.common.periodic_task

Test Activity
-------------
Developer Testing

Workaround
----------
None needed.

Al Bailey (albailey1974)
Changed in starlingx:
assignee: nobody → Al Bailey (albailey1974)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/749506

Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
tags: added: stx.config stx.distcloud
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.5.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/749506
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=1e773021753605fcafaec404a6b8215a6dd24e55
Submitter: Zuul
Branch: master

commit 1e773021753605fcafaec404a6b8215a6dd24e55
Author: albailey <email address hidden>
Date: Wed Sep 2 09:03:38 2020 -0500

    Handle failed VIM API responses in sysinv conductor k8s check

    If the VIM is not running or is slower than 5 seconds to respond
    to a sw-update query, the sysinv conductor logs get many
    stacktraces logged as the periodic task that checks this
    fails over and over.

    Now the check is more robust.

    Change-Id: I5a9b06d5852dc300cc43501cfa46bb1393701d99
    Closes-Bug: 1893948
    Signed-off-by: albailey <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.