fm-rest-api CrashLoopBackOff on standard, standard ext configurations

Bug #1951428 reported by Alexandru Dimofte
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Critical
Heitor Matsui

Bug Description

Brief Description
-----------------
fm-rest-api CrashLoopBackOff on standard, standard ext configurations. Because of this issue stx-openstack apply fails.

Severity
--------
<Critical: System/Feature is not usable due to the defect>

Steps to Reproduce
------------------
Install Stx 20211118T032155Z on standard/ standard ext configurations.

Expected Behavior
------------------
Stx install should work fine

Actual Behavior
----------------
stx-openstack apply fails because of fm-rest-api CrashLoopBackOff

 These pods were not ready=['fm-rest-api-d8f469447-l26v9', 'fm-rest-api-d8f469447-s7gp2']
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada [-] Chart deploy [openstack-fm-rest-api] failed: armada.exceptions.k8s_exceptions.KubernetesWatchTimeoutException: Timed out waiting for pods (namespace=openstack, labels=(release_group=osh-openstack-fm-rest-api)). These pods were not ready=['fm-rest-api-d8f469447-l26v9', 'fm-rest-api-d8f469447-s7gp2']
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada Traceback (most recent call last):
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada File "/usr/local/lib/python3.6/dist-packages/armada/handlers/armada.py", line 170, in handle_result
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada result = get_result()
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada File "/usr/local/lib/python3.6/dist-packages/armada/handlers/armada.py", line 181, in <lambda>
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada if (handle_result(chart, lambda: deploy_chart(chart, 1))):
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada File "/usr/local/lib/python3.6/dist-packages/armada/handlers/armada.py", line 159, in deploy_chart
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada concurrency)
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada File "/usr/local/lib/python3.6/dist-packages/armada/handlers/chart_deploy.py", line 55, in execute
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada ch, cg_test_all_charts, prefix, known_releases)
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada File "/usr/local/lib/python3.6/dist-packages/armada/handlers/chart_deploy.py", line 267, in _execute
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada chart_wait.wait(timer)
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada File "/usr/local/lib/python3.6/dist-packages/armada/handlers/wait.py", line 142, in wait
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada wait.wait(timeout=timeout)
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada File "/usr/local/lib/python3.6/dist-packages/armada/handlers/wait.py", line 302, in wait
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada modified = self._wait(deadline)
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada File "<decorator-gen-2>", line 2, in _wait
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada File "/usr/local/lib/python3.6/dist-packages/retry/api.py", line 74, in retry_decorator
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada logger)
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada File "/usr/local/lib/python3.6/dist-packages/retry/api.py", line 33, in __retry_internal
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada return f()
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada File "/usr/local/lib/python3.6/dist-packages/armada/handlers/wait.py", line 372, in _wait
2021-11-18 11:02:23.703 682 ERROR armada.handlers.armada raise k8s_exceptions.KubernetesWatchTimeoutException(error)

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
I observed this on Standard, standard ext baremetal configurations till now.

Branch/Pull Time/Commit
-----------------------
20211118T032155Z

Last Pass
---------
20211117T032111Z

Timestamp/Logs
--------------
will be attached

Test Activity
-------------
Sanity

Workaround
----------
we suspect: https://review.opendev.org/c/starlingx/fault/+/815381

Revision history for this message
Alexandru Dimofte (adimofte) wrote :
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to Heitor Matsui since he has a similar LP: https://bugs.launchpad.net/starlingx/+bug/1951579 with reviews posted to address.

Changed in starlingx:
assignee: nobody → Heitor Matsui (heitormatsui)
importance: Undecided → Critical
status: New → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.6.0 / critical - this is causing a failure to install the stx-openstack application

Ghada Khalil (gkhalil)
tags: added: stx.6.0 stx.fault
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Based on the latest sanity report with the 2021-November-23 load, this is still an issue
http://lists.starlingx.io/pipermail/starlingx-discuss/2021-November/012440.html

So perhaps this is a different issue than https://bugs.launchpad.net/starlingx/+bug/1951579 which should be fixed in the above load.

@Heitor Matsui, please investigate further.

Revision history for this message
Heitor Matsui (heitormatsui) wrote :

Just checked the charts used on the sanity (http://mirror.starlingx.cengn.ca/mirror/starlingx/master/centos/flock/20211123T005754Z/outputs/helm-charts/helm-charts-stx-openstack-centos-stable-versioned.tgz) and they were using the fm-rest-api container image built on nov 19th, below an excerpt from the stx-openstack.yaml from the charts used, containing the fm-rest-api images overrides:

--- stx-openstack.yaml ---
    images:
      tags:
        fm_rest_api: docker.io/starlingx/stx-fm-rest-api:master-centos-stable-20211119T042917Z.0
        ks_user: docker.io/starlingx/stx-heat:master-centos-stable-20211119T042917Z.0
        ks_service: docker.io/starlingx/stx-heat:master-centos-stable-20211119T042917Z.0
        ks_endpoints: docker.io/starlingx/stx-heat:master-centos-stable-20211119T042917Z.0
        fm_db_sync: docker.io/starlingx/stx-fm-rest-api:master-centos-stable-20211119T042917Z.0
        db_init: docker.io/starlingx/stx-heat:master-centos-stable-20211119T042917Z.0
        db_drop: docker.io/starlingx/stx-heat:master-centos-stable-20211119T042917Z.0
-------------------------

The container images built on nov 23th (yesterday) onwards will contain the fix. Because the charts were using old images I suggest we wait the next sanity report.

Revision history for this message
Heitor Matsui (heitormatsui) wrote :
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Thanks Heitor. I've marked this LP as Fix Released and a duplicate of https://bugs.launchpad.net/starlingx/+bug/1951579

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.