Sysinv kubernetes label audit re-adds removed node label

Bug #1869058 reported by Kevin Smith
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Kristine Bujold

Bug Description

Brief Description
-----------------
Kubernetes host-label removed via system host-label-remove, yet label is still present on the kubernetes node.

Severity
--------
Minor: System/Feature is usable with minor issue

Steps to Reproduce
------------------
Race condition and may be hard to reproduce.

Expected Behavior
------------------
When a kubernetes node label is removed via system host-label-remove it should stay removed.

Actual Behavior
----------------
When a kubernetes node label is removed via system host-label-remove there is the small possibility of it being re-added by the audit.

Reproducibility
---------------
Rare

System Configuration
--------------------
Any

Branch/Pull Time/Commit
-----------------------
Master 2020-03-24.

Last Pass
---------
N/A

Timestamp/Logs
--------------
Sysinv.log:
sysinv 2020-03-25 15:59:15.780 98963 INFO sysinv.api.controllers.v1.host [-] controller-0 ihost_patch_end. No changes from mtce/1.0.
sysinv 2020-03-25 15:59:37.938 93999 INFO sysinv.conductor.manager [-] Platform managed application oidc-auth-apps: Prerequisites not met.
sysinv 2020-03-25 15:59:38.222 93999 INFO sysinv.conductor.manager [-] update_kubernetes_label: label_dict={u'elastic-data': None}
sysinv 2020-03-25 15:59:38.295 98964 INFO sysinv.api.controllers.v1.rest_api [-] PATCH cmd:http://localhost:30001/nfvi-plugins/v1/hosts hdr:{'Content-type': 'application/json', 'User-Agent': 'sysinv/1.0'} payload:{"hostname": "compute-0", "uuid": "079e5990-9b89-4ed0-bf74-e2202d3f9696"}
sysinv 2020-03-25 15:59:38.297 98964 INFO sysinv.api.controllers.v1.rest_api [-] Response={u'status': u'success'}
sysinv 2020-03-25 15:59:38.386 93999 INFO sysinv.conductor.manager [-] Label audit: creating elastic-data=enabled on node compute-0
sysinv 2020-03-25 16:00:37.919 93999 INFO sysinv.conductor.manager [-] Platform managed application oidc-auth-apps: Prerequisites not met.
sysinv 2020-03-25 16:01:37.938 93999 INFO sysinv.conductor.manager [-] Platform managed application oidc-auth-apps: Prerequisites not met.
sysinv 2020-03-25 16:02:02.919 98963 INFO sysinv.api.controllers.v1.host [-] Provisioned storage node(s) []

bash.log:
2020-03-25T15:59:36.000 controller-0 -sh: info HISTORY: PID=1728414 UID=42425 system host-label-remove compute-0 elastic-data

Database:
sysinv=# select * from label;
         created_at | updated_at | deleted_at | id | uuid | host_id | label_key | label_value
----------------------------+------------+------------+----+--------------------------------------+---------+--------------------+-------------
 2020-03-24 18:20:09.282295 | | | 1 | b095a5cc-6c84-4121-b119-b3c1c97ba490 | 4 | elastic-data | enabled
 2020-03-24 18:20:09.308784 | | | 2 | fa933fee-e7e3-4096-a85f-6ab8b4c2d6cc | 4 | elastic-controller | enabled
 2020-03-24 18:20:09.316646 | | | 3 | 3d0ad683-0fa5-4716-8bba-b20881bf5133 | 4 | elastic-client | enabled
 2020-03-24 18:20:09.324641 | | | 4 | 2ff0ac28-bcd4-406f-8dbc-74e847c7e4b3 | 4 | elastic-master | enabled
 2020-03-24 18:20:15.868789 | | | 5 | 97cf8e49-1328-4f85-ad81-8d24499c871f | 1 | elastic-data | enabled
 2020-03-24 18:20:15.879633 | | | 6 | f57a229c-2e73-491e-9937-9f99f5ab6eaa | 1 | elastic-controller | enabled
 2020-03-24 18:20:15.899537 | | | 7 | 2f9100f3-7064-4bde-b246-f354268135ea | 1 | elastic-client | enabled
 2020-03-24 18:20:15.915653 | | | 8 | 63a55a60-4fcb-48db-8eda-99094c83b4b8 | 1 | elastic-master | enabled
 2020-03-24 18:20:32.21488 | | | 9 | c764eab0-48c0-4331-9b22-e691e22cabbc | 2 | elastic-master | enabled
 2020-03-25 14:55:30.542282 | | | 11 | e44c1d69-f98d-4e47-be08-8614ccbe5de9 | 3 | elastic-master | enabled
(10 rows)

Node labels:

controller-0:/usr/lib64/python2.7/site-packages/sysinv/conductor$ kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
compute-0 Ready <none> 24h v1.16.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,elastic-data=enabled,elastic-master=enabled,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-0,kubernetes.io/os=linux
compute-1 Ready <none> 24h v1.16.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,elastic-master=enabled,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-1,kubernetes.io/os=linux
controller-0 Ready master 25h v1.16.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,elastic-client=enabled,elastic-controller=enabled,elastic-data=enabled,elastic-master=enabled,kubernetes.io/arch=amd64,kubernetes.io/hostname=controller-0,kubernetes.io/os=linux,node-role.kubernetes.io/master=
controller-1 Ready master 24h v1.16.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,elastic-client=enabled,elastic-controller=enabled,elastic-data=enabled,elastic-master=enabled,kubernetes.io/arch=amd64,kubernetes.io/hostname=controller-1,kubernetes.io/os=linux,node-role.kubernetes.io/master=

Test Activity
-------------
Developer Testing

Workaround
----------
The kubernetes node label must be manually removed via kubectl.

Ghada Khalil (gkhalil)
tags: added: stx.containers
Ghada Khalil (gkhalil)
Changed in starlingx:
status: New → Triaged
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / medium priority - workaround exists, but should be investigated further

Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.4.0 stx.config
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Kristine Bujold (kbujold)
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/736800

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/736800
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=9dd9b1f83e44cb7b286f7c0c5d7162252b2591ef
Submitter: Zuul
Branch: master

commit 9dd9b1f83e44cb7b286f7c0c5d7162252b2591ef
Author: Kristine Bujold <email address hidden>
Date: Thu Jun 18 14:43:40 2020 -0400

    Fix deleted labels being re-added in k8s

    Changed the order the labels were being deleted from sysinv db and
    kubernetes. The sysinv kubernetes label audit was re-adding removed
    kubernetes node labels. This could happen when;

    - a label is removed via cli
    - the api deletes the label from kubernetes
    - the audit executes before the sysinv db label is deleted and re-adds
    it to kubernetes
    - the api deletes the label from sysinv db

    Closes-Bug: 1869058
    Change-Id: I0b3b2db7bd923b32627501b33ebf9fe947ad4707
    Signed-off-by: Kristine Bujold <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.