BnR: SX - restore succeeded but many pods are evicted

Bug #2043491 reported by Joshua Kraitberg
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Joshua Kraitberg

Bug Description

Brief Description
-----------------
After successful run of restore playbook, lots of pods from every namespace are evicted w/ no alarms raised. System is able to reschedule after and operation is green.

Eviction occurred during the restore playbook, before the unlock.

Severity
--------
Minor

Steps to Reproduce
------------------
Run optimized restore

Expected Behavior
------------------
No evicted pods

Actual Behavior
----------------
Evicted pods

Reproducibility
---------------
Unknown

System Configuration
--------------------
AIO-SX

Branch/Pull Time/Commit
-----------------------
11-11-2023

Last Pass
---------
N/A

Timestamp/Logs
--------------
N/A

Test Activity
-------------
Automated Testing

Workaround
----------
Manual clean up evicted pods

Changed in starlingx:
status: New → In Progress
Changed in starlingx:
assignee: nobody → Joshua Kraitberg (jkraitbe-wr)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/900849
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/6b3566a358bf07677e8449a9f4f334aaad09aa34
Submitter: "Zuul (22348)"
Branch: master

commit 6b3566a358bf07677e8449a9f4f334aaad09aa34
Author: Joshua Kraitberg <email address hidden>
Date: Mon Nov 13 22:40:09 2023 -0500

    Fix: LV sizes not restored

    This change did not work as stated in test plan:
    https://review.opendev.org/c/starlingx/ansible-playbooks/+/873377

    LV sizes were not being restored pre-unlock.

    To restore LV sizes, the original sizes are added to the runtime
    configuration after being pulled from controller0 puppet hieradata,
    which is currently not being used when doing the puppet apply step.

    TEST PLAN
    PASS: Optimized upgrade on AIO-SX, stx6 to stx8
    PASS: Optimized upgrade on AIO-SX subcloud, stx6 to stx8
    PASS: Optimized restore on AIO-SX, stx8
    ** For all tests confirm LV's are sized correctly before and after
    unlock
    ** Before all tests increase the size of all partitions:
    - system host-fs-modify controller-0 kubelet=22
    - system controllerfs-modify docker-distribution=21
    - etc.

    Closes-Bug: 2043491
    Signed-off-by: Joshua Kraitberg <email address hidden>
    Change-Id: Ic3fcc341371b165db0f7d0564dd34e561a470ae7

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Low
tags: added: stx.9.0 stx.update
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.