First time application-apply stx-openstack failed due to timeouts during application apply on AIO configurations

Bug #1826592 reported by Peng Peng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Critical
Al Bailey

Bug Description

Brief Description
-----------------
as title

Severity
--------
Critical

Steps to Reproduce
------------------
....
TC-name:

Expected Behavior
------------------

Actual Behavior
----------------

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Two node system

Lab-name: Wp_1-2

Branch/Pull Time/Commit
-----------------------
stx master as of 20190426T013000Z

Last Pass
---------
20190410T013000Z

Timestamp/Logs
--------------

2019-04-26 07:51:47.135 101497 INFO sysinv.conductor.kube_app [-] Application (stx-openstack) upload completed.

2019-04-26 07:51:48 [admin@admin]> RUNNING: system application-apply stx-openstack
+---------------+----------------------------------+
| Property | Value |
+---------------+----------------------------------+
| created_at | 2019-04-26T07:51:22.075458+00:00 |
| manifest_file | manifest.yaml |
| manifest_name | armada-manifest |
| name | stx-openstack |
| progress | None |
| status | applying |
| updated_at | 2019-04-26T07:51:47.132718+00:00 |
+---------------+----------------------------------+

2019-04-26 08:52:46.085 101497 INFO sysinv.conductor.kube_app [-] Exiting progress monitoring thread for app stx-openstack
2019-04-26 08:52:46.095 101497 ERROR sysinv.conductor.kube_app [-] Application apply aborted!.

wrsroot@controller-0 ~(keystone_admin)]$ system application-show stx-openstack
+---------------+------------------------------------------+
| Property | Value |
+---------------+------------------------------------------+
| created_at | 2019-04-26T07:51:22.075458+00:00 |
| manifest_file | manifest.yaml |
| manifest_name | armada-manifest |
| name | stx-openstack |
| progress | operation aborted, check logs for detail |
| status | apply-failed |
| updated_at | 2019-04-26T08:52:46.088998+00:00 |
+---------------+------------------------------------------+
[wrsroot@controller-0 ~(keystone_admin)]$

sudo docker exec armada_service cat stx-openstack-apply.log

The problem was that compute-kit did not come up in 15 minutes.

Here’s the relevant logs
2019-04-26 08:22:26.346 41 INFO armada.handlers.chart_deploy [-] [chart=openvswitch]: Processing Chart, release=osh-openstack-openvswitch^
….
2019-04-26 08:37:26.293 41 ERROR armada.handlers.wait [-] [chart=openvswitch]: Timed out waiting for jobs (namespace=openstack, labels=()). These jobs were not ready=['neutron-db-sync', 'nova-cell-setup']^[[00m

openvswitch is the only chart in compute-kit group that has a 15 minute timeout

Test Activity
-------------
lab_setup

Tags: stx.2.0
Revision history for this message
Peng Peng (ppeng) wrote :
Revision history for this message
Frank Miller (sensfan22) wrote : Re: First time application-apply stx-openstack failed due to timeouts during application downloads on AIO configurations

Marking as release gating. For some AIO configurations timeouts are seen when applying the stx-openstack application. Likely a side effect of the # of platform cores and amount of parallel activity going on when downloading docker images and starting up all the pods.

summary: First time application-apply stx-openstack failed due to timeouts during
- application downloads
+ application downloads on AIO configurations
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Al Bailey (albailey1974)
tags: added: stx.2.0 stx.retestneeded
Changed in starlingx:
importance: High → Critical
Al Bailey (albailey1974)
summary: First time application-apply stx-openstack failed due to timeouts during
- application downloads on AIO configurations
+ application apply on AIO configurations
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/656025

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/656025
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=e513baad44181f667085886007632d0ebf79eeb0
Submitter: Zuul
Branch: master

commit e513baad44181f667085886007632d0ebf79eeb0
Author: Al Bailey <email address hidden>
Date: Fri Apr 26 13:26:32 2019 -0500

    Change platform core pinning in AIO

    The reason for this change is that the docker and kubernetes
    processes are affined to the platform cores, and cpu
    starvation can occur when those processes are loaded
    with work, such as applying an armada manifest or downloading
    docker images.

    With this change, AIO will use the entire host cpu set rather than
    just the platform cpu set.

    Fixes-Bug: 1826592
    Change-Id: Ic357e46804cac27b007fa58c52052970b2932780
    Signed-off-by: Al Bailey <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Peng Peng (ppeng) wrote :

Not see this issue again recently

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.