B&R Optimized restore, calico cni aren't brought up

Bug #2000184 reported by Joshua Kraitberg
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Joshua Kraitberg

Bug Description

Brief Description
-----------------
During optimized restore, while attempting to bring up kubernetes calico cni fails to come up.

kubectl describe -n armada pod armada-api-75f46c4bcf-fh5zt
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m16s default-scheduler Successfully assigned armada/armada-api-75f46c4bcf-fh5zt to controller-0
Warning FailedCreatePodSandBox 6m15s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b382b76bc742405457075f0b31c6eef65270fe8d427cd8cba3579cf000fd3c6d": plugin type="multus" name="multus-cni-network" failed (add): Multus: [armada/armada-api-75f46c4bcf-fh5zt]: error getting pod: Get "https://[fd04::1]:443/api/v1/namespaces/armada/pods/armada-api-75f46c4bcf-fh5zt?timeout=1m0s": dial tcp [fd04::1]:443: connect: network is unreachable
...
Warning FailedCreatePodSandBox 52s (x17 over 4m19s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "9ecf5e809ed3bacfc1e3907d6b0cac2d92626cc6ac56ecf177f2df70c858c336": plugin type="multus" name="multus-cni-network" failed (add): Multus: [armada/armada-api-75f46c4bcf-fh5zt]: error getting pod: Get "https://[fd04::1]:443/api/v1/namespaces/armada/pods/armada-api-75f46c4bcf-fh5zt?timeout=1m0s": dial tcp [fd04::1]:443: connect: network is unreachable

Severity
--------
Critical

Steps to Reproduce
------------------
Run optimized restore

Expected Behavior
------------------
Pass
Actual Behavior
----------------
Fail

Reproducibility
---------------
100% on certain system configurations

System Configuration
--------------------
AIO-SX

Branch/Pull Time/Commit
-----------------------
master branch. BUILD_ID="2022-12-19_02-22-00"

Last Pass
---------
Yes, it still passes except on certain systems.

Timestamp/Logs
--------------
See above.

Test Activity
-------------
Developer Testing

Workaround
----------
Run:
sudo systemctl restart networking kubelet

Do this after networking is restore but before kubernetes is brought up.

Changed in starlingx:
assignee: nobody → Joshua Kraitberg (jkraitbe-wr)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/868251
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/24c75495b32278daf16291327d828952ccd2029b
Submitter: "Zuul (22348)"
Branch: master

commit 24c75495b32278daf16291327d828952ccd2029b
Author: Joshua Kraitberg <email address hidden>
Date: Tue Dec 20 20:26:07 2022 -0500

    Fix calico bring-up failure in optimized restore

    On certain system, the networking service can be restarted.
    On systems were it is possible it should be restored that way.
    Otherwise, fallback to the original method of restoring
    networking by using ifup/ifdown manually.

    It was seen that if restarting networking service is possible
    but is not done, Calico cni will fail to be created when Kubernetes
    is brought up later in the playbook.

    This only impacts certain systems.

    TEST PLAN
    PASS: Run optimized restore with --registry-images (AIO-SX)
      * Run on affected and unaffected systems
    PASS: Run optimized restore without --registry-images (AIO-SX)
      * Run on affected and unaffected systems

    Closes-Bug: 2000184
    Signed-off-by: Joshua Kraitberg <email address hidden>
    Change-Id: I97b9fc2bbbe47ed93edbdae3f4b02b6243181677

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
description: updated
tags: added: stx.8.0 stx.update
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.