upgrade-k8s-networking.yml reports 'kubelet_vol_plugin_dir' is undefined

Bug #1870038 reported by Lin Shuicheng
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Lin Shuicheng

Bug Description

Brief Description
-----------------
After do swact, there is ansible playbook failure in sysinv.log like below:

TASK [Create Calico config file] ***********************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'kubelet_vol_plugin_dir' is undefined"}

PLAY RECAP *********************************************************************
localhost : ok=53 changed=10 unreachable=0 failed=1

.
sysinv 2020-04-01 02:35:15.497 104869 ERROR sysinv.conductor.manager [-] Failed to upgrade/downgrade kubernetes networking images: ansible-playbook returned an error: 2: Exception: ansible-playbook returned an error: 2

Severity
--------
<Minor: System/Feature is usable with minor issue>

Steps to Reproduce
------------------
1. Deploy a AIO-DX system
2. do swact
3. check sysinv.log in active controller

Expected Behavior
------------------
no ansible playbook failure

Actual Behavior
----------------
Error message is shown.

Reproducibility
---------------
100%

System Configuration
--------------------
AIO-DX

Branch/Pull Time/Commit
-----------------------
latest code

Last Pass
---------
N/A

Timestamp/Logs
--------------
Log attached upper.

Test Activity
-------------
Developer Testing

 Workaround
 ----------
N/A

Revision history for this message
Bart Wensley (bartwensley) wrote :
Changed in starlingx:
assignee: nobody → Bob Church (rchurch)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Bart Wensley, Do you know you why this playbook is running on a swact?
upgrade/downgrade kubernetes networking images

tags: added: stx.config stx.containers
Revision history for this message
Bart Wensley (bartwensley) wrote :

It runs whenever the sysinv-conductor starts up. That is how we would be able to supply new network images in a patch. The image versions are patched and then the sysinv-conductor runs the playbook which would upgrade them.

Revision history for this message
Lin Shuicheng (shuicheng) wrote :

Patch is already submitted, not sure why it is not shown here.
https://review.opendev.org/716520

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/716520
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=5c542524e4cd9fb65da698c1d4cba4d50f56bdab
Submitter: Zuul
Branch: master

commit 5c542524e4cd9fb65da698c1d4cba4d50f56bdab
Author: Shuicheng Lin <email address hidden>
Date: Wed Apr 1 15:58:07 2020 +0800

    Add kubelet_vol_plugin_dir definition to fix ansible failure

    When do host-swact, upgrade-k8s-networking.yml will be called to check
    calico upgrade. And kubelet_vol_plugin_dir is missed in definition
    and cause ansible fail. Add definition from main.yml to fix it.

    Closes-Bug: 1870038
    Change-Id: I30287ebca7f0d4a1d3c5ee656136375a7b1c182f
    Signed-off-by: Shuicheng Lin <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to Shuicheng Lin since he provided the fix for this issue.

Marked as stx.4.0 gating since the issue was introduced by a recent commit to upversion calico

tags: added: stx.4.0
Changed in starlingx:
importance: Undecided → Medium
assignee: Bob Church (rchurch) → Lin Shuicheng (shuicheng)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/729809

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (f/centos8)
Download full text (22.6 KiB)

Reviewed: https://review.opendev.org/729809
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=73027425d4501a6b7785e91024c9e8ddbc03115d
Submitter: Zuul
Branch: f/centos8

commit 55c9afd075194f7669fa2a87e546f61034679b04
Author: Dan Voiculeasa <email address hidden>
Date: Wed May 13 14:19:52 2020 +0300

    Restore: disconnect etcd from ceph

    At the moment etcd is restored only if ceph data is kept.
    Etcd should be restored regardless if ceph data is kept or wiped.

    Story: 2006770
    Task 39751
    Change-Id: I9dfb1be0a83c3fdc5f1b29cbb974c5e0e2236ad3
    Signed-off-by: Dan Voiculeasa <email address hidden>

commit 003ddff574c74adf11cf8e4758e93ba0eed45a6a
Author: Don Penney <email address hidden>
Date: Fri May 8 11:35:58 2020 -0400

    Add playbook for updating static images

    This commit introduces a new playbook, upgrade-static-images.yml, used
    for downloading updating images and pushing to the local registry.

    Change-Id: I8884440261a5a4e27b40398e5a75c9d03b09d4ba
    Story: 2006781
    Task: 39706
    Signed-off-by: Don Penney <email address hidden>

commit 26fd273cf5175ba4bdd31d6b6b777814f1a6c860
Author: Matt Peters <email address hidden>
Date: Thu May 7 14:29:02 2020 -0500

    Add kube-apiserver port to calico failsafe rules

    An invalid GlobalNetworkPolicy or NetworkPolicy may prevent
    calico-node from communicating with the kube-apiserver.
    Once the communication is broken, calico-node is no longer
    able to update the policies since it cannot communicate to
    read the updated policies. It can also prevent the pod
    from starting since the policies will prevent it from
    reading the configuration.

    To ensure that this scenario does not happen, the kube-apiserver
    port is being added to the failsafe rules to ensure communication
    is always possible, regardless of the network policy configuration.

    Change-Id: I1b065a74e7ad0ba9b1fdba4b63136b97efbe98ce
    Closes-Bug: 1877166
    Related-Bug: 1877383
    Signed-off-by: Matt Peters <email address hidden>

commit bd0f14a7dfb206ccaa3ce0f5e7d9034703b3403c
Author: Robert Church <email address hidden>
Date: Tue May 5 15:11:15 2020 -0400

    Provide an update strategy for Tiller deployment

    In the case of a simplex controller configuration the current patching
    strategy for the Tiller environment will fail as the tiller ports will
    be in use when the new deployment is attempted to be applied. The
    resulting tiller pod will be stuck in a Pending state.

    This will be observed if the node becomes ready after 'helm init'
    installs the initial deployment and before the deployment is patched for
    environment checks.

    The deployment strategy provided by 'helm init' is unspecified. This
    change will allow one additional pod (current + new) and one unavailable
    pod (current) during an update. The maxUnavailable setting allows the
    tiller pod to be deleted which will release its ports, thus allowing the
    patch deployment to spin up an new pod to a Running state.

    Change-Id: I83c43c52a77...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.