pod with hugepages stays in error after upgrade

Bug #1943113 reported by Daniel Safta
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Daniel Safta

Bug Description

Brief Description
Following a simplex upgrade with managed sriov pods, there may be some pods left in an Errored state:

default huge-pages-example-deployment-6757cb845d-gx8mb 0/1 OutOfhugepages-1Gi 0 149m

default huge-pages-example-deployment-6757cb845d-vs5nr 1/1 Running 0 41m

Severity

Minor: System/Feature is usable with minor issue.

The managed pod would have a replica running, this is just to cleanup any Errored managed pods following a simplex upgrade.

Expected Behavior

default huge-pages-example-deployment-6757cb845d-vs5nr 1/1 Running 0 41m

Actual Behavior

default huge-pages-example-deployment-6757cb845d-gx8mb 0/1 OutOfhugepages-1Gi 0 149m

default huge-pages-example-deployment-6757cb845d-vs5nr 1/1 Running 0 41m

Reproducibility

Reproducible

System Configuration

DC Simplex Subcloud

Branch/Pull Time/Commit

-
Last Pass
-

Timestamp/Logs
-

Test Activity

Feature Testing: Upgrades

Workaround

Delete the pod in error state; which already has a Running replica:

kubectl delete -n <namespace> huge-pages-example-deployment-6757cb845d-gx8mb

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/807557
Committed: https://opendev.org/starlingx/integ/commit/3b397cd14b9032c6ec1d0161bfa8b93de07dd8e4
Submitter: "Zuul (22348)"
Branch: master

commit 3b397cd14b9032c6ec1d0161bfa8b93de07dd8e4
Author: Daniel Safta <email address hidden>
Date: Mon Sep 6 09:07:34 2021 +0000

    Clear pods in OutOfhugepages* state

    Following an upgrade, some pods using
    hugepages will still be in Running
    state, but will have a replica that stays in
    OutOfhugepages state.

    k8s-pod-recovery can detect those pods and
    delete them.

    Closes-bug: 1943113
    Signed-off-by: Daniel Safta <email address hidden>
    Change-Id: Idba510cabd66cd8b796563e3e6efa9baa5b4401e

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Daniel Safta (dsafta)
importance: Undecided → Low
tags: added: stx.6.0 stx.containers
tags: added: stx.update
Changed in starlingx:
importance: Low → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.