application-apply stx-openstack stuck at processing chart: osh-openstack-nginx-ports-control

Bug #1834070 reported by Peng Peng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
yong hu

Bug Description

Brief Description
-----------------
After host-lock/unlock standby controller, standby controller reboot. After sb controller is online, stx-openstack application applying stuck at processing chart: osh-openstack-nginx-ports-control and never get completed

Severity
--------
Major

Steps to Reproduce
------------------
As description

TC-name: mtc/test_lock_unlock_host.py::test_lock_unlock_host[controller]

Expected Behavior
------------------

Actual Behavior
----------------

Reproducibility
---------------
Intermittent

System Configuration
--------------------
Two node system

Lab-name: WP_1-2

Branch/Pull Time/Commit
-----------------------
stx master as of 20190623T233000Z

Last Pass
---------
Lab: WP_1_2
Load: 20190623T013000Z

Timestamp/Logs
--------------
[2019-06-24 10:12:49,685] 268 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2019-06-24 10:12:52,048] 387 DEBUG MainThread ssh.expect :: Output:
+---------------------+--------------------------------+-------------------------------+--------------------+---------+-----------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+--------------------------------+-------------------------------+--------------------+---------+-----------+
| platform-integ-apps | 1.0-7 | platform-integration-manifest | manifest.yaml | applied | completed |
| stx-openstack | 1.0-16-centos-stable-versioned | armada-manifest | stx-openstack.yaml | applied | completed |
+---------------------+--------------------------------+-------------------------------+--------------------+---------+-----------+

[2019-06-24 10:12:52,636] 268 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock controller-1'

[2019-06-24 10:27:25,187] 268 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne hypervisor list'
[2019-06-24 10:27:27,172] 387 DEBUG MainThread ssh.expect :: Output:
+----+---------------------+-----------------+---------------+-------+
| ID | Hypervisor Hostname | Hypervisor Type | Host IP | State |
+----+---------------------+-----------------+---------------+-------+
| 4 | controller-0 | QEMU | 192.168.206.3 | up |
| 6 | controller-1 | QEMU | 192.168.206.4 | up |
+----+---------------------+-----------------+---------------+-------+

[2019-06-24 10:30:52,119] 268 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2019-06-24 10:30:54,949] 387 DEBUG MainThread ssh.expect :: Output:
+---------------------+--------------------------------+-------------------------------+--------------------+----------+--------------------------------------------------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+--------------------------------+-------------------------------+--------------------+----------+--------------------------------------------------------------------------------+
| platform-integ-apps | 1.0-7 | platform-integration-manifest | manifest.yaml | applied | completed |
| stx-openstack | 1.0-16-centos-stable-versioned | armada-manifest | stx-openstack.yaml | applying | processing chart: osh-openstack-nginx-ports-control, overall completion: 12.0% |

Test Activity
-------------
Sanity

Revision history for this message
Peng Peng (ppeng) wrote :
Ghada Khalil (gkhalil)
tags: added: stx.containers
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to Yong Hu to triage as there was a recent commit related to nginx which was merged on 2019-06-20
https://review.opendev.org/#/c/662748/

Changed in starlingx:
assignee: nobody → yong hu (yhu6)
Revision history for this message
yong hu (yhu6) wrote :

The cause was: lock/unlock a standby controller will trigger "application-apply" again, and in this case, an existing globalnetworkpolicy "gnp-for-nginx-ports", which was from the previously applied stx-openstack, was blocking the progress and eventually led to "application-apply" abort.

Here is a workaround before a fix is available: BEFORE lock/unlock standby controller, run the following cmd to delete "gnp-for-nginx-ports" manually:

# kubectl delete globalnetworkpolicies.crd.projectcalico.org gnp-for-nginx-ports

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/667678

Changed in starlingx:
status: New → In Progress
Revision history for this message
yong hu (yhu6) wrote :

My patch would resolve the block caused by "nginx-ports-control" chart.
And going ahead, stx-openstack re-apply might be blocked by another chart group of "mariadb", in which openvswitch pods are not running up correctly, and details in "https://bugs.launchpad.net/starlingx/+bug/1833718"

yong hu (yhu6)
Changed in starlingx:
importance: Undecided → High
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.2.0 gating -- application re-apply fails.

Note: The other application re-apply bugs are also marked as release gating. All the scenarios will need to be fixed.

tags: added: stx.2.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/667678
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=b69e39d95aa7f6336eb748fa29737ea09b9aaa4c
Submitter: Zuul
Branch: master

commit b69e39d95aa7f6336eb748fa29737ea09b9aaa4c
Author: yhu6 <email address hidden>
Date: Wed Jun 26 16:28:54 2019 +0000

    delete nginx-ports-control chart before stx-openstack re-apply

    Several cases might trigger stx-openstack re-apply, for example,
    lock and unlock standby controller. During the re-apply process,
    nginx-ports-control helm chart has to be removed first, otherwise
    re-apply process will be blocked because a previously applied GNP
    (GlobalNetworkPolicy) in nginx-ports-control chart has existed.

    Closes-Bug: 1834070

    Change-Id: I10805f052914a5157edc9b53699a94a2c7fd7953
    Signed-off-by: yhu6 <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.