affine-tasks.sh erronously waiting if openstack not deployed

Bug #1843713 reported by Brent Rowsell
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Jim Gauld

Bug Description

Brief Description
-----------------
Script affine-tasks.sh is erroneously waiting if openstack is not deployed,

 Wait until K8s pods have recovered and nova-compute is running
    t0=${SECONDS}
    until is_k8s_platform_steady_state_ready; do
        dt=$(( ${SECONDS} - ${t0} ))
        if [ ${dt} -ge ${PRINT_INTERVAL_SECONDS} ]; then
            t0=${SECONDS}
            LOG "Recovery wait, elapsed ${SECONDS} seconds." \
                "Reason: ${NOT_READY_REASON}"
        fi
        sleep ${CHECK_INTERVAL_SECONDS}
    done

This delays re-affining on platform tasks non openstack deployments.

Severity
--------
Major - affects

Steps to Reproduce
------------------
Install system with k8s only (no openstack). Check the platform thread affinity

Expected Behavior
------------------
The platform threads are affined to the platform cores as expected

Actual Behavior
----------------
The platform threads are not affined for a while.

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Any system

Branch/Pull Time/Commit
-----------------------
master as of 2019-09-11

Last Pass
---------
Unknown

Timestamp/Logs
--------------
Not required. Issue is easily reproducible.

Test Activity
-------------
Other - regular lab usage

Ghada Khalil (gkhalil)
description: updated
Changed in starlingx:
assignee: nobody → Jim Gauld (jgauld)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.3.0 / medium priority - issue impacts system performance

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.3.0 stx.containers
Jim Gauld (jgauld)
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
Jim Gauld (jgauld) wrote :

Existing algorithm has several criteria to determine whether platform services are ready. This was waiting forever since one kube-system pods would not be Ready/Complete.

Intention is to modify the do the following:
- Wait on openstack only if it is deployed
- Wait for core kube services, e.g.,
  -- Wait for kubelet
  -- Wait for pods kube-apiserver, kube-controller-manager-controller, kube-scheduler-controller, kube-proxy, coredns

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to utilities (master)

Fix proposed to branch: master
Review: https://review.opendev.org/684837

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to utilities (master)

Reviewed: https://review.opendev.org/684837
Committed: https://git.openstack.org/cgit/starlingx/utilities/commit/?id=3112907d4f142ef1be332d3ba0ea2c49087bacce
Submitter: Zuul
Branch: master

commit 3112907d4f142ef1be332d3ba0ea2c49087bacce
Author: Jim Gauld <email address hidden>
Date: Wed Sep 25 15:36:01 2019 -0400

    affine-tasks.sh script should only wait for core k8s pods

    The affine-tasks.sh script dynamically changes affinity of platform
    tasks on AIO nodes based on the readiness of platform services.

    The recovery criteria is modified to wait for a subset of core
    kube-system services (instead of all kube-system), i.e.,
    - wait for kubelet
    - wait for Running pods: kube-apiserver, kube-controller-manager,
      kube-scheduler, kube-proxy, coredns

    Change-Id: Id138031806abf9ef7c40a9fc2e339cd76403ccda
    Closes-bug: 1843713
    Signed-off-by: Jim Gauld <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.