platform-integ-apps apply-failed after host-unlock

Bug #1881150 reported by Peng Peng
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Dan Voiculeasa

Bug Description

Brief Description
-----------------
IN SX system, after host-unlock, platform-integ-apps was in "applied" status. After system assign labels, the status of platform-integ-apps became "applying" again and eventually "apply-failed".

Severity
--------
Major

Steps to Reproduce
------------------
host-unlock
make sure platform-integ-apps is in applied status
assign labels,
check the status of platform-integ-apps

Expected Behavior
------------------
platform-integ-apps is in applied status

Actual Behavior
----------------
platform-integ-apps is in applying and apply-failed status

Reproducibility
---------------
Unknown - first time this is seen in sanity, will monitor

System Configuration
--------------------
One node system

Lab-name: wcp_122

Branch/Pull Time/Commit
-----------------------
2020-05-27_20-00-00

Last Pass
---------
2020-05-25_20-00-00

Timestamp/Logs
--------------
[2020-05-28 09:51:41,556] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2020-05-28 09:51:42,835] 436 DEBUG MainThread ssh.expect :: Output:
+--------------------------+-----------------------------+-----------------------------------+----------------------------------------+----------+-----------+
| application | version | manifest name | manifest file | status | progress |
+--------------------------+-----------------------------+-----------------------------------+----------------------------------------+----------+-----------+
| cert-manager | 1.0-1 | cert-manager-manifest | certmanager-manifest.yaml | applied | completed |
| nginx-ingress-controller | 1.0-0 | nginx-ingress-controller-manifest | nginx_ingress_controller_manifest.yaml | applied | completed |
| oidc-auth-apps | 1.0-0 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-8 | platform-integration-manifest | manifest.yaml | applied | completed |
| stx-openstack | 1.0-1-centos-stable- | armada-manifest | stx-openstack.yaml | applied | completed |
| | versioned | | | | |
| | | | | | |
+--------------------------+-----------------------------+-----------------------------------+----------------------------------------+----------+-----------+

[2020-05-28 09:51:43,315] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock controller-0'

[2020-05-28 10:03:19,975] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2020-05-28 10:03:21,098] 436 DEBUG MainThread ssh.expect :: Output:
+--------------------------+-----------------------------+-----------------------------------+----------------------------------------+----------+-----------+
| application | version | manifest name | manifest file | status | progress |
+--------------------------+-----------------------------+-----------------------------------+----------------------------------------+----------+-----------+
| cert-manager | 1.0-1 | cert-manager-manifest | certmanager-manifest.yaml | applied | completed |
| nginx-ingress-controller | 1.0-0 | nginx-ingress-controller-manifest | nginx_ingress_controller_manifest.yaml | applied | completed |
| oidc-auth-apps | 1.0-0 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-8 | platform-integration-manifest | manifest.yaml | applied | completed |
| stx-openstack | 1.0-1-centos-stable- | armada-manifest | stx-openstack.yaml | applied | completed |
| | versioned | | | | |
| | | | | | |
+--------------------------+-----------------------------+-----------------------------------+----------------------------------------+----------+-----------+

[2020-05-28 10:03:30,244] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-label-assign controller-0 elastic-controller=enabled elastic-data=enabled elastic-master=enabled elastic-client=enabled'

[2020-05-28 10:03:32,939] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2020-05-28 10:03:34,078] 436 DEBUG MainThread ssh.expect :: Output:
+--------------------------+-----------------------------+-----------------------------------+----------------------------------------+----------+-----------+
| application | version | manifest name | manifest file | status | progress |
+--------------------------+-----------------------------+-----------------------------------+----------------------------------------+----------+-----------+
| cert-manager | 1.0-1 | cert-manager-manifest | certmanager-manifest.yaml | applied | completed |
| nginx-ingress-controller | 1.0-0 | nginx-ingress-controller-manifest | nginx_ingress_controller_manifest.yaml | applied | completed |
| oidc-auth-apps | 1.0-0 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-8 | platform-integration-manifest | manifest.yaml | applying | completed |
| stx-openstack | 1.0-1-centos-stable- | armada-manifest | stx-openstack.yaml | applied | completed |
| | versioned | | | | |
| | | | | | |
+--------------------------+-----------------------------+-----------------------------------+----------------------------------------+----------+-----------+
controller-0:~$

[2020-05-28 10:05:35,987] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2020-05-28 10:05:37,383] 436 DEBUG MainThread ssh.expect :: Output:
+--------------------------+-----------------------------+-----------------------------------+----------------------------------------+--------------+--------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+--------------------------+-----------------------------+-----------------------------------+----------------------------------------+--------------+--------------------------------------+
| cert-manager | 1.0-1 | cert-manager-manifest | certmanager-manifest.yaml | applied | completed |
| nginx-ingress-controller | 1.0-0 | nginx-ingress-controller-manifest | nginx_ingress_controller_manifest.yaml | applied | completed |
| oidc-auth-apps | 1.0-0 | oidc-auth-manifest | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-8 | platform-integration-manifest | manifest.yaml | apply-failed | operation aborted, check logs for |
| | | | | | detail |
| | | | | | |
| stx-openstack | 1.0-1-centos-stable- | armada-manifest | stx-openstack.yaml | applied | completed |
| | versioned | | | | |
| | | | | | |
+--------------------------+-----------------------------+-----------------------------------+----------------------------------------+--------------+--------------------------------------+
controller-0:~$

sysinv 2020-05-28 10:05:35.139 92036 INFO sysinv.conductor.kube_app [-] platform-integ-apps app failed applying. Retrying.
sysinv 2020-05-28 10:05:35.139 92036 ERROR sysinv.conductor.kube_app [-] Failed to apply platform-integ-apps application.: PlatformApplicationApplyFailure: Failed to apply platform-integ-apps application.

sysinv 2020-05-28 10:05:35.286 92036 ERROR sysinv.conductor.kube_app [-] Application apply aborted!.: PlatformApplicationApplyFailure: Failed to apply platform-integ-apps application.

Test Activity
-------------
Sanity

Revision history for this message
Peng Peng (ppeng) wrote :
tags: added: stx.retestneeded
Ghada Khalil (gkhalil)
tags: added: stx.apps
Revision history for this message
Frank Miller (sensfan22) wrote :

This issue is another occurrence of: https://bugs.launchpad.net/starlingx/+bug/1877582

System was unlocked. Platform managed applications try to upload. The armada container attempts to start but fails. It is in an exited status but appears to be running after the reboot (likely due to a port conflict: bind: address already in use: unknown)

Key logs to confirm this is a dup are in daemon.log.4:
2020-05-28T10:04:04.935 controller-0 dockerd[83556]: info time="2020-05-28T10:04:04.935429681Z" level=error msg="e650bd9ec8d3e23c07a2b51b425b36b6ad5e27b685d5507abee5236577dbb5e8 cleanup: failed to delete container from containerd: no such container"
2020-05-28T10:04:04.957 controller-0 dockerd[83556]: info time="2020-05-28T10:04:04.957852906Z" level=error msg="Handler for POST /v1.35/containers/e650bd9ec8d3e23c07a2b51b425b36b6ad5e27b685d5507abee5236577dbb5e8/restart returned error: Cannot restart container e650bd9ec8d3e23c07a2b51b425b36b6ad5e27b685d5507abee5236577dbb5e8: task e650bd9ec8d3e23c07a2b51b425b36b6ad5e27b685d5507abee5236577dbb5e8 already exists: unknown"

The workaround for this is to run the following, then the apps will auto-upload/auto-apply
- docker rm <container> -- use id for armada container

Ghada Khalil (gkhalil)
tags: added: stx.4.0 stx.containers
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Duplicate LP (1877582) has been addressed.
https://review.opendev.org/735374
Merged in stx master on 2020-06-15

Changed in starlingx:
importance: Undecided → High
status: New → Fix Released
assignee: nobody → Dan Voiculeasa (dvoicule)
Peng Peng (ppeng)
tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.