Bootstrap playbook execution fails with K8s 1.25

Bug #2052300 reported by Saba Touheed Mujawar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Saba Touheed Mujawar

Bug Description

Brief Description
-----------------
During fresh install of a simplex with k8s 1.25.3, bootstrap playbook fails due to helm-controller pod in "CrashLoopBackOff" state.

Severity
--------
Major

Steps to Reproduce
------------------
Install iso with k8s 1.25.3

Expected Behavior
------------------
No failures during bootstrap playbook.

Actual Behavior
----------------
Playbook execution stopped due to helm-controller pod error.

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
Simplex

Timestamp/Logs
--------------
Collect: cgts4:/folk/cgts_logs/CGTS-57018

2024-01-15 10:28:10,589 p=3435 u=sysadmin n=ansible | TASK [common/fluxcd-controllers : Fail if the helm and source controllers are not ready by this time] ***
2024-01-15 10:28:10,589 p=3435 u=sysadmin n=ansible | Monday 15 January 2024 10:28:10 +0000 (0:00:33.681) 0:31:08.768 ********
2024-01-15 10:28:10,637 p=3435 u=sysadmin n=ansible | failed: [localhost] (item={'cmd': ['kubectl', '--kubeconfig=/etc/kubernetes/admin.conf', 'wait', '--namespace=flux-helm', '--for=condition=Available', 'deployment', 'helm-controller', '--timeout=30s'], 'stdout': '', 'stderr': 'error: timed out waiting for the condition on deployments/helm-controller', 'rc': 1, 'start': '2024-01-15 10:27:36.408888', 'end': '2024-01-15 10:28:06.604054', 'delta': '0:00:30.195166', 'changed': True, 'failed': False, 'msg': 'non-zero return code', 'invocation': {'module_args': {'_raw_params': 'kubectl --kubeconfig=/etc/kubernetes/admin.conf wait --namespace=flux-helm --for=condition=Available deployment helm-controller --timeout=30s', 'warn': True, '_uses_shell': False, 'stdin_add_newline': True, 'strip_empty_ends': True, 'argv': None, 'chdir': None, 'executable': None, 'creates': None, 'removes': None, 'stdin': None}}, 'finished': 1, 'ansible_job_id': '165581820752.38925', 'stdout_lines': [], 'stderr_lines': ['error: timed out waiting for the condition on deployments/helm-controller'], 'attempts': 6, 'failed_when_result': False, 'item': {'started': 1, 'finished': 0, 'ansible_job_id': '165581820752.38925', 'results_file': '/root/.ansible_async/165581820752.38925', 'changed': True, 'failed': False, 'item':

2024-01-15 10:28:10,653 p=3435 u=sysadmin n=ansible | PLAY RECAP *********************************************************************
2024-01-15 10:28:10,653 p=3435 u=sysadmin n=ansible | localhost : ok=410 changed=212 unreachable=0 failed=1 skipped=405 rescued=0 ignored=0
2024-01-15 10:28:10,653 p=3435 u=sysadmin n=ansible | Monday 15 January 2024 10:28:10 +0000 (0:00:00.064) 0:31:08.833 ********
2024-01-15 10:28:10,653 p=3435 u=sysadmin n=ansible | ===============================================================================
2024-01-15 10:28:10,654 p=3435 u=sysadmin n=ansible | bootstrap/persist-config : Wait for service endpoints reconfiguration to complete - 476.57s
2024-01-15 10:28:10,655 p=3435 u=sysadmin n=ansible | bootstrap/apply-manifest : Applying puppet bootstrap manifest --------- 398.66s
2024-01-15 10:28:10,655 p=3435 u=sysadmin n=ansible | common/push-docker-images : Download images and push to local registry - 324.55s
2024-01-15 10:28:10,655 p=3435 u=sysadmin n=ansible | bootstrap/bringup-essential-services : Wait for 120 seconds to ensure kube-system pods are all started - 120.61s
2024-01-15 10:28:10,655 p=3435 u=sysadmin n=ansible | bootstrap/persist-config : Find old registry secrets in Barbican ------- 54.64s
2024-01-15 10:28:10,655 p=3435 u=sysadmin n=ansible | bootstrap/validate-config : Generate config ini file for python sysinv db population script – 46.40s
2024-01-15 10:28:10,655 p=3435 u=sysadmin n=ansible | common/fluxcd-controllers : Get wait tasks results --------------------- 33.68s
2024-01-15 10:28:10,655 p=3435 u=sysadmin n=ansible | bootstrap/persist-config : Saving config in sysinv database ------------ 25.50s
2024-01-15 10:28:10,655 p=3435 u=sysadmin n=ansible | bootstrap/bringup-essential-services : Add loopback interface ---------- 23.22s
2024-01-15 10:28:10,655 p=3435 u=sysadmin n=ansible | bootstrap/bringup-essential-services : Check controller-0 is in online state – 21.99s
2024-01-15 10:28:10,655 p=3435 u=sysadmin n=ansible | common/bringup-kubemaster : Initializing Kubernetes master ------------- 16.25s
2024-01-15 10:28:10,655 p=3435 u=sysadmin n=ansible | bootstrap/persist-config : Restart sysinv-agent and sysinv-api to pick up sysinv.conf update – 14.76s
2024-01-15 10:28:10,655 p=3435 u=sysadmin n=ansible | bootstrap/persist-config : Add ssl_ca certificate ----------------------- 9.58s
2024-01-15 10:28:10,655 p=3435 u=sysadmin n=ansible | bootstrap/persist-config : Wait for sysinv inventory -------------------- 9.57s
2024-01-15 10:28:10,656 p=3435 u=sysadmin n=ansible | bootstrap/persist-config : Wait for certificate install ----------------- 8.54s
2024-01-15 10:28:10,656 p=3435 u=sysadmin n=ansible | common/bringup-kubemaster : Activate Calico Networking ------------------ 7.32s
2024-01-15 10:28:10,656 p=3435 u=sysadmin n=ansible | bootstrap/apply-manifest : Generating static config data ---------------- 6.34s
2024-01-15 10:28:10,656 p=3435 u=sysadmin n=ansible | common/create-etcd-certs : Generate private key for etcd server and client — 4.63s
2024-01-15 10:28:10,656 p=3435 u=sysadmin n=ansible | bootstrap/bringup-essential-services : Get wait tasks results ----------- 4.54s
2024-01-15 10:28:10,656 p=3435 u=sysadmin n=ansible | bootstrap/validate-config : Check if the supplied address is a valid domain name or ip address — 4.54s{}

Changed in starlingx:
assignee: nobody → Saba Touheed Mujawar (smujawar)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/907542
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/df65243c27bc1a350f84ee2ecce9dc4ba2079adb
Submitter: "Zuul (22348)"
Branch: master

commit df65243c27bc1a350f84ee2ecce9dc4ba2079adb
Author: Saba Touheed Mujawar <email address hidden>
Date: Fri Feb 2 06:59:38 2024 -0500

    Upversion flux helm and source controller for k8s 1.25.3

    During fresh install of a simplex with k8s 1.25.3,
    bootstrap playbook fails due to helm-controller preventing
    the nginx app from being applied.
    With the update of FluxCD release v2.0.1, there is a need to
    upgrade helm-controller and source-controller for k8s 1.25 which
    resolves the issue.

    Reference:
    https://review.opendev.org/c/starlingx/ansible-playbooks/+/890987

    Test Plan:
    PASS: Install ISO with k8s 1.25 .
    PASS: Applications applied successfully.

    Closes-Bug: 2052300

    Change-Id: Ifad5f975580dab785ae102969e14c7ef1b00a827
    Signed-off-by: Saba Touheed Mujawar <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.9.0 stx.containers
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.