stx-openstack: Some pods are failing readiness/liveness probes

Bug #2030908 reported by Luan Nunes Utimura
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Lucas de Ataides Barreto

Bug Description

Brief Description
-----------------
It has been observed that some pods are *constantly* failing readiness/liveness probes, namely:

  * cinder-api;
  * glance-api;
  * heat-api.

Functionality may have been affected.

Severity
--------
Major.

Steps to Reproduce
------------------
Upload and apply stx-openstack, and check the pods response to probes with:
  $ kubectl -n openstack describe pod/<pod_name>

Expected Behavior
------------------
Pods are responding to probes as expected.

Actual Behavior
----------------
Pods are failing to respond to probes.

Reproducibility
---------------
Reproducible.

System Configuration
--------------------
Seen on AIO-DX.
Possibly happening on all system setups.

Branch/Pull Time/Commit
-----------------------
StarlingX (master)
StarlingX OpenStack (master)

Last Pass
---------
N/A.

Timestamp/Logs
--------------
As an example, logs from glance-api:
```
  Warning ProbeWarning 18m (x11057 over 31h) kubelet Liveness probe warning: {"versions": [{"id": "v2.9", "status": "CURRENT", "links": [{"rel": "self", "href": "http://glance.openstack.svc.cluster.local/v2/"}]}, {"id": "v2.7", "status": "SUPPORTED", "links": [{"rel": "self", "href": "http://glance.openstack.svc.cluster.local/v2/"}]}, {"id": "v2.6", "status": "SUPPORTED", "links": [{"rel": "self", "href": "http://glance.openstack.svc.cluster.local/v2/"}]}, {"id": "v2.5", "status": "SUPPORTED", "links": [{"rel": "self", "href": "http://glance.openstack.svc.cluster.local/v2/"}]}, {"id": "v2.4", "status": "SUPPORTED", "links": [{"rel": "self", "href": "http://glance.openstack.svc.cluster.local/v2/"}]}, {"id": "v2.3", "status": "SUPPORTED", "links": [{"rel": "self", "href": "http://glance.openstack.svc.cluster.local/v2/"}]}, {"id": "v2.2", "status": "SUPPORTED", "links": [{"rel": "self", "href": "http://glance.openstack.svc.cluster.local/v2/"}]}, {"id": "v2.1", "status": "SUPPORTED", "links": [{"rel": "self", "href": "http://glance.openstack.svc.cluster.local/v2/"}]}, {"id": "v2.0", "status": "SUPPORTED", "links": [{"rel": "self", "href": "http://glance.openstack.svc.cluster.local/v2/"}]}]}

  Warning ProbeWarning 3m35s (x11147 over 31h) kubelet Readiness probe warning: {"versions": [{"id": "v2.9", "status": "CURRENT", "links": [{"rel": "self", "href": "http://glance.openstack.svc.cluster.local/v2/"}]}, {"id": "v2.7", "status": "SUPPORTED", "links": [{"rel": "self", "href": "http://glance.openstack.svc.cluster.local/v2/"}]}, {"id": "v2.6", "status": "SUPPORTED", "links": [{"rel": "self", "href": "http://glance.openstack.svc.cluster.local/v2/"}]}, {"id": "v2.5", "status": "SUPPORTED", "links": [{"rel": "self", "href": "http://glance.openstack.svc.cluster.local/v2/"}]}, {"id": "v2.4", "status": "SUPPORTED", "links": [{"rel": "self", "href": "http://glance.openstack.svc.cluster.local/v2/"}]}, {"id": "v2.3", "status": "SUPPORTED", "links": [{"rel": "self", "href": "http://glance.openstack.svc.cluster.local/v2/"}]}, {"id": "v2.2", "status": "SUPPORTED", "links": [{"rel": "self", "href": "http://glance.openstack.svc.cluster.local/v2/"}]}, {"id": "v2.1", "status": "SUPPORTED", "links": [{"rel": "self", "href": "http://glance.openstack.svc.cluster.local/v2/"}]}, {"id": "v2.0", "status": "SUPPORTED", "links": [{"rel": "self", "href": "http://glance.openstack.svc.cluster.local/v2/"}]}]}
```

Similar logs can be seen on cinder-api and heat-api.

Test Activity
-------------
Developer Testing.

Workaround
----------
N/A.

tags: added: stx.9.0 stx.distro.openstack
Changed in starlingx:
assignee: nobody → Lucas de Ataides Barreto (ldeataid)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-armada-app (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-armada-app (master)

Reviewed: https://review.opendev.org/c/starlingx/openstack-armada-app/+/910987
Committed: https://opendev.org/starlingx/openstack-armada-app/commit/6a7c163cb0310fefc58e8efa7595d081568a985b
Submitter: "Zuul (22348)"
Branch: master

commit 6a7c163cb0310fefc58e8efa7595d081568a985b
Author: Lucas de Ataides <email address hidden>
Date: Mon Mar 4 12:14:48 2024 -0300

    Add missing initial delay in readiness probes

    After applying STX-Openstack, some API pods are failing to pass the
    readiness probe. This change adds a initialDelaySeconds value to the
    deployment files of the services that are having this warning.

    The liveness probe for these deployments already have a
    initialDelaySeconds of 30 seconds, adding this value to the readiness
    probe supresses this issue.

    For the glance-api deployment, it was also required to add the
    initialDelaySeconds to the liveness probe.

    A review was proposed to the upstream openstack-helm repository [1], and
    is currently under review. If the change is merged, this patch can be
    ignored on future STX-Openstack development.

    [1] https://review.opendev.org/c/openstack/openstack-helm/+/911015

    Test Plan:
    PASS: Build openstack-helm package
    PASS: Build STX-Openstack application tarball
    PASS: Upload / apply STX-Openstack application
    PASS: Check the cinder-api, glance-api and heat-api pods and verify that
          there are no more readiness probe failures.

    Closes-Bug: 2030908

    Change-Id: If273f095ae0c589fa71faff7756d5f6861cfd264
    Signed-off-by: Lucas de Ataides <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.