First time application-apply stx-openstack failed due to waiting on non-existent ovs container

Bug #1824829 reported by Peng Peng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
cheng li

Bug Description

Brief Description
-----------------
application-apply stx-openstack failed at first time during lab setup period, but re-apply success.

Severity
--------
Critical

Steps to Reproduce
------------------
- Bring up the system following the containers installation guide
- Run the initial application apply for stx-openstack
        system application-apply stx-openstack

Expected Behavior
------------------
The first application-apply runs without error

Actual Behavior
----------------
The first application-apply fails and has to be re-executed

Reproducibility
---------------
Reproducible
10/10

System Configuration
--------------------
One node system
Two node system
vswitch type is ovs-dpdk

Lab-name: WP_1-2 & SM-2

Branch/Pull Time/Commit
-----------------------
stx master as of "20190412T013000Z"

Last Pass
---------
20190410T013000Z

Timestamp/Logs
--------------
[wrsroot@controller-0 ~(keystone_admin)]$ system application-show stx-openstack
+---------------+------------------------------------------+
| Property | Value |
+---------------+------------------------------------------+
| created_at | 2019-04-15T07:53:11.458960+00:00 |
| manifest_file | manifest.yaml |
| manifest_name | armada-manifest |
| name | stx-openstack |
| progress | operation aborted, check logs for detail |
| status | apply-failed |
| updated_at | 2019-04-15T08:58:20.839780+00:00 |
+---------------+------------------------------------------+

Test Activity
-------------
installation

Revision history for this message
Peng Peng (ppeng) wrote :
Revision history for this message
Maria Yousaf (myousaf) wrote :

This is also reproducible on a storage system, using build 20190412T013000Z.

Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → High
Revision history for this message
Al Bailey (albailey1974) wrote :

2019-04-15 08:58:20.842 104327 ERROR sysinv.conductor.kube_app [-] Application apply aborted!.

From the hiera data
var/extra/platform/puppet/19.01/hieradata/system.yaml:platform::params::vswitch_type: !!python/unicode 'ovs-dpdk'

Austin sent an email indicating that vswitch type: ovs-dpdk was not working for him due to this change
https://review.openstack.org/#/c/651380/2/kubernetes/applications/stx-openstack/stx-openstack-helm/stx-openstack-helm/manifests/manifest.yaml

I will re-comment out those wait lines. It will cause a warning in the armada logs, but that setup should work again.

Revision history for this message
Al Bailey (albailey1974) wrote :

Uploaded this review to get sanity passing
https://review.openstack.org/#/c/652745/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-config (master)

Reviewed: https://review.openstack.org/652745
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=9f78bc667d9ed8911a55a11614146642e3e33ae9
Submitter: Zuul
Branch: master

commit 9f78bc667d9ed8911a55a11614146642e3e33ae9
Author: Al Bailey <email address hidden>
Date: Mon Apr 15 12:43:32 2019 -0500

    Remove a manifest wait when setting up ovs-dpdk pods

    This commit added support for wait in the pods
    https://review.openstack.org/#/c/651380

    However, when ovs-dpdk vswitch type is enabled like this:
      system modify --vswitch_type ovs-dpdk
    the wait causes armada to timeout.

    This fix is to re-comment out the wait.

    Note: this causes the armada logs to show:

    WARNING armada.handlers.wait [-] [chart=openvswitch]:
     "label_selector" not specified,
     waiting with no labels may cause unintended consequences.

    This submission will get sanity to pass. A later submission
    by someone with ovs expertise can update the openvswitch.py
    helm code to add a meta_override to eliminate the warning logs.

    Partial-Bug: 1824829
    Change-Id: I1e08b2dd98d859d0b37612aba3de70d969653cda
    Signed-off-by: Al Bailey <email address hidden>

Cindy Xie (xxie1)
Changed in starlingx:
assignee: nobody → Al Bailey (albailey1974)
status: New → Fix Committed
Frank Miller (sensfan22)
tags: added: stx.2.0 stx.networking stx.retestneeded
tags: added: stx.containers
summary: - First time application-apply stx-openstack failed
+ First time application-apply stx-openstack failed due to waiting on non-
+ existent ovs container
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Re-opening this bug as only a temporary fix was submitted. As per the review comments from Bob Church in https://review.openstack.org/#/c/652745/, the proper way to do this is to introduce a meta_override for the wait field based on the vswitch type value. An example is available in helm/garbd.py which is providing a _meta_overrides() function to tweak the armada manifest. Something similar should be done in helm/openvswitch.py

Assigning to Forrest's team to implement the recommended solution.

Changed in starlingx:
status: Fix Committed → Confirmed
assignee: Al Bailey (albailey1974) → Forrest Zhao (forrest.zhao)
description: updated
description: updated
Revision history for this message
cheng li (chengli3) wrote :

The fix patch is in review https://review.openstack.org/#/c/653932

Changed in starlingx:
assignee: Forrest Zhao (forrest.zhao) → cheng li (chengli3)
Changed in starlingx:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/653932
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=f6fb3cd53e5b9fb9c68797fe1f8563e6c461514c
Submitter: Zuul
Branch: master

commit f6fb3cd53e5b9fb9c68797fe1f8563e6c461514c
Author: chengli3 <email address hidden>
Date: Fri Apr 19 17:02:40 2019 +0800

    Change the way of disabling ovs container

    For the case vswitch_type!='none', ovs doesn't run in container. So ovs
    pod/container should not run. We controlled ovs container by label, but
    a patch[1] broke it.
    This patch is to change the method in which we control ovs container.
    With this patch, we remove openvswitch chart from compute-kit chart
    group so that no ovs container created. If we need to run ovs in
    container, we add the openvswitch chart.

    [1]
    https://review.opendev.org/#/c/651380/2/kubernetes/applications/stx-openstack/stx-openstack-helm/stx-openstack-helm/manifests/manifest.yaml

    Change-Id: I3ba8a3ab45a6e6c1a67b78d335656ed5c0d654a7
    Closes-bug: #1824829
    Signed-off-by: chengli3 <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Peng Peng (ppeng) wrote :

Issue was not seen recently

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.