kube creating vim kube upgrade strategy: HTTP Error 500: Internal Server Error

Bug #1979992 reported by Heitor Matsui
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Heitor Matsui

Bug Description

Brief Description
-----------------
Kubernetes version upgrade failed in subclouds due to HTTP Error 500: Internal Server Error

Severity
--------
Minor: The kub version upgrade successful after restart of VIM process

Steps to Reproduce
------------------
1. Install DC system with 1000 subclouds
2. Upgrade Kub version 250 in parallel

Expected Behavior
------------------
Kub version upgrade successful.

Actual Behavior
----------------
Kub version upgrade not successful

Reproducibility
---------------
1/1

System Configuration
--------------------
DC with 1000 subclouds

Branch/Pull Time/Commit
-----------------------
2022-01-10_02-00-35

Last Pass
---------
N/A

Timestamp/Logs
--------------
From system controller orchestrator.log

2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread [req-478ddc23-bae4-47b4-a180-c6fe7c0b30bd - - - - -] (kubernetes) Failed! Stage: 3, State: kube creating vim kube upgrade strategy, Subcloud: subcloud695: HTTPError: HTTP Error 500: Internal Server Error
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread Traceback (most recent call last):
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib/python2.7/site-packages/dcmanager/orchestrator/orch_thread.py", line 507, in perform_state_action
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread next_state = state_operator.perform_state_action(strategy_step)
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib/python2.7/site-packages/dcmanager/orchestrator/states/creating_vim_strategy.py", line 73, in perform_state_action
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread raise_error_if_missing=False)
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib/python2.7/site-packages/dccommon/drivers/openstack/vim.py", line 131, in get_strategy
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread strategy_name=strategy_name)
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib64/python2.7/site-packages/nfv_client/openstack/sw_update.py", line 191, in get_strategies
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread response = rest_api.request(token_id, "GET", api_cmd, api_cmd_headers)
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib64/python2.7/site-packages/nfv_client/openstack/rest_api.py", line 38, in request
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread timeout=timeout_in_secs)
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib64/python2.7/site-packages/nfv_client/openstack/rest_api.py", line 38, in request
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread timeout=timeout_in_secs)
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread return opener.open(url, data, timeout)
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib64/python2.7/urllib2.py", line 437, in open
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread response = meth(req, response)
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib64/python2.7/urllib2.py", line 550, in http_response
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread 'http', request, response, code, msg, hdrs)
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib64/python2.7/urllib2.py", line 475, in error
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread return self._call_chain(*args)
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread result = func(*args)
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib64/python2.7/urllib2.py", line 558, in http_error_default
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread HTTPError: HTTP Error 500: Internal Server Error
2022-05-01 06:52:36.728 447201 ERROR dcmanager.orchestrator.orch_thread

Subcloud nfv-vim.log

2022-04-26T03:42:13.069 controller-0 VIM_Thread[100783] ERROR vim.py.171 Alarm subsystem is not sane, exiting
2022-04-26T03:42:13.070 controller-0 VIM_Sw-Mgmt-Worker-0_Thread[101025] INFO _thread.py.232 Thread Sw-Mgmt-Worker-0: shutting down.
2022-04-26T03:42:13.071 controller-0 VIM_Sw-Mgmt-Worker-0_Thread[101025] INFO _thread.py.236 Thread Sw-Mgmt-Worker-0: shutdown.
2022-04-26T03:42:13.078 controller-0 VIM_Identity-Worker-0_Thread[101003] INFO _thread.py.232 Thread Identity-Worker-0: shutting down.
2022-04-26T03:42:13.078 controller-0 VIM_Identity-Worker-0_Thread[101003] INFO _thread.py.236 Thread Identity-Worker-0: shutdown.
2022-04-26T03:42:13.085 controller-0 VIM_Infrastructure-Worker-0_Thread[101010] INFO _thread.py.232 Thread Infrastructure-Worker-0: shutting down.
2022-04-26T03:42:13.085 controller-0 VIM_Infrastructure-Worker-0_Thread[101010] INFO _thread.py.236 Thread Infrastructure-Worker-0: shutdown.

++++++++++++++++++++++++++++++++++++

Reproducible when upgrading from kub version 1.20 to 1.21

system controller logs:
orchestrator.log

2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread [req-0661084f-0f7e-4f35-83e8-2d3369be80ad - - - - -] (kubernetes) Failed! Stage: 2, State: kube creating vim kube upgrade strategy, Subcloud: subcloud566: HTTPError: HTTP Error 500: Internal Server Error
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread Traceback (most recent call last):
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib/python2.7/site-packages/dcmanager/orchestrator/orch_thread.py", line 507, in perform_state_action
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread next_state = state_operator.perform_state_action(strategy_step)
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib/python2.7/site-packages/dcmanager/orchestrator/states/creating_vim_strategy.py", line 73, in perform_state_action
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread raise_error_if_missing=False)
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib/python2.7/site-packages/dccommon/drivers/openstack/vim.py", line 131, in get_strategy
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread strategy_name=strategy_name)
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib64/python2.7/site-packages/nfv_client/openstack/sw_update.py", line 191, in get_strategies
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread response = rest_api.request(token_id, "GET", api_cmd, api_cmd_headers)
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib64/python2.7/site-packages/nfv_client/openstack/rest_api.py", line 38, in request
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread timeout=timeout_in_secs)
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib64/python2.7/urllib2.py", line 154, in urlopen
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread return opener.open(url, data, timeout)
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib64/python2.7/urllib2.py", line 437, in open
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread response = meth(req, response)
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib64/python2.7/urllib2.py", line 550, in http_response
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread 'http', request, response, code, msg, hdrs)
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib64/python2.7/urllib2.py", line 475, in error
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread return self._call_chain(*args)
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib64/python2.7/urllib2.py", line 409, in _call_chain
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread result = func(*args)
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread File "/usr/lib64/python2.7/urllib2.py", line 558, in http_error_default
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
2022-05-04 20:56:00.917 196681 ERROR dcmanager.orchestrator.orch_thread HTTPError: HTTP Error 500: Internal Server Error

subcloud logs:

subcloud566 / nfv-vim.log
2022-05-04T12:22:32.510 controller-0 VIM_Thread[96722] ERROR _thread.py.100 Thread Alarm stalled, progress_marker=0, elapsed_secs=532.
2022-05-04T12:23:03.590 controller-0 VIM_Thread[96722] ERROR _thread.py.100 Thread Event-Log stalled, progress_marker=0, elapsed_secs=563.
2022-05-04T12:23:03.590 controller-0 VIM_Thread[96722] ERROR _thread.py.100 Thread Alarm stalled, progress_marker=0, elapsed_secs=563.
2022-05-04T12:23:35.016 controller-0 VIM_Thread[96722] ERROR _thread.py.100 Thread Event-Log stalled, progress_marker=0, elapsed_secs=594.
2022-05-04T12:23:35.016 controller-0 VIM_Thread[96722] ERROR _thread.py.100 Thread Alarm stalled, progress_marker=0, elapsed_secs=594.
2022-05-04T12:23:41.258 controller-0 VIM_Thread[96722] ERROR vim.py.171 Alarm subsystem is not sane, exiting
2022-05-04T12:23:41.259 controller-0 VIM_Sw-Mgmt-Worker-0_Thread[97116] INFO _thread.py.232 Thread Sw-Mgmt-Worker-0: shutting down.
2022-05-04T12:23:41.260 controller-0 VIM_Sw-Mgmt-Worker-0_Thread[97116] INFO _thread.py.236 Thread Sw-Mgmt-Worker-0: shutdown.
2022-05-04T12:23:41.263 controller-0 VIM_Identity-Worker-0_Thread[96984] INFO _thread.py.232 Thread Identity-Worker-0: shutting down.
2022-05-04T12:23:41.263 controller-0 VIM_Identity-Worker-0_Thread[96984] INFO _thread.py.236 Thread Identity-Worker-0: shutdown.
2022-05-04T12:23:41.270 controller-0 VIM_Infrastructure-Worker-0_Thread[97067] INFO _thread.py.232 Thread Infrastructure-Worker-0: shutting down.
2022-05-04T12:23:41.270 controller-0 VIM_Infrastructure-Worker-0_Thread[97067] INFO _thread.py.236 Thread Infrastructure-Worker-0: shutdown.

Test Activity
-------------
Feature Testing

Workaround
----------
Restart VIM process on subcloud

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nfv (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/nfv/+/847981

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nfv (master)

Reviewed: https://review.opendev.org/c/starlingx/nfv/+/847981
Committed: https://opendev.org/starlingx/nfv/commit/fc1aff636def785b2c18c3db0db093208523fec6
Submitter: "Zuul (22348)"
Branch: master

commit fc1aff636def785b2c18c3db0db093208523fec6
Author: Heitor Matsui <email address hidden>
Date: Tue Jun 28 11:45:04 2022 -0300

    Detect and kill hung vim process

    There is a specific scenario when vim starts where threads keep
    stalled and then after 600s the subsystems are marked as not sane.
    In this situation vim is supposed to quit, but instead it stays
    alive but in a failed state.

    This commit adds an extra step in vim monitoring ocf script to
    test if vim rpc port is open and responding, and if not, vim
    is forcefully killed.

    Test Plan
    PASS: install/bootstrap/unlock AIO-SX
    PASS: get vim to a not-sane state and observe it being killed
          and restarted in this scenario

    Closes-bug: 1979992
    Change-Id: I55777a0858507634a362e58b9da281492809a03d
    Signed-off-by: Heitor Matsui <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
tags: added: stx.7.0 stx.nfv
Changed in starlingx:
assignee: nobody → Heitor Matsui (heitormatsui)
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.