hello-kitty application apply failed after controller swact

Bug #1837774 reported by Peng Peng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Angie Wang

Bug Description

Brief Description
-----------------
5 mins after hello-kitty application application-apply, the status was end with "apply-failed"

Severity
--------
Major

Steps to Reproduce
------------------
Upload hello-kitty helm charts
Apply hello-kitty

TC-name: z_containers/test_custom_containers.py::test_launch_app_via_sysinv

Expected Behavior
------------------

Actual Behavior
----------------

Reproducibility
---------------
Seen once

System Configuration
--------------------
Two node system

Lab-name:IP_5-6

Branch/Pull Time/Commit
-----------------------
stx master as of 20190724T013000Z

Last Pass
---------
Lab: IP_5_6
Load: 20190721T233000Z

Timestamp/Logs
--------------
[2019-07-24 15:09:38,191] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2019-07-24 15:09:40,708] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+------------------------------+-------------------------------+--------------------+--------------+-----------------------------------------------------------------------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+------------------------------+-------------------------------+--------------------+--------------+-----------------------------------------------------------------------------------------------------+
| hello-kitty | 1.0 | hello-kitty | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-7 | platform-integration-manifest | manifest.yaml | applied | completed |
| stx-openstack | 1.0-17-centos-stable- | armada-manifest | stx-openstack.yaml | apply-failed | Unexpected process termination while application-apply was in progress. The application status has |
| | versioned | | | | changed from 'applying' to 'apply-failed'. |
| | | | | | |

[2019-07-24 15:08:57,889] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-upload -n hello-kitty -v 1.0 /home/sysadmin//custom_apps/hello-kitty.tgz'

[2019-07-24 15:09:40,813] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-apply hello-kitty'

[2019-07-24 15:09:38,191] 301 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne application-list'
[2019-07-24 15:09:40,708] 423 DEBUG MainThread ssh.expect :: Output:
+---------------------+------------------------------+-------------------------------+--------------------+--------------+-----------------------------------------------------------------------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+------------------------------+-------------------------------+--------------------+--------------+-----------------------------------------------------------------------------------------------------+
| hello-kitty | 1.0 | hello-kitty | manifest.yaml | uploaded | completed |
| platform-integ-apps | 1.0-7 | platform-integration-manifest | manifest.yaml | applied | completed |
| stx-openstack | 1.0-17-centos-stable- | armada-manifest | stx-openstack.yaml | apply-failed | Unexpected process termination while application-apply was in progress. The application status has |
| | versioned | | | | changed from 'applying' to 'apply-failed'. |
| | | | | | |

Test Activity
-------------
Sanity

Revision history for this message
Peng Peng (ppeng) wrote :
Revision history for this message
Angie Wang (angiewang) wrote :

The root cause is tiller pod was not ready after host-swact.

hello-kitty app apply was triggered after swact to controller-1, but looks like it takes 20mins to bring up tiller pod after swact.

2019-07-24T14:58:46.000 system host swact from controller-0 to controller-1
2019-07-24 15:00:33.057 447459 sysinv-conductor starts on controller-1
2019-07-24 15:09:43.091 447308 Armada apply command = /bin/bash -c 'set -o pipefail; armada apply --debug /manifests/hello-kitty/1.0/hello-kitty-manifest.yaml --values /overrides/hello-kitty/1.0/default-kitty-1.yaml --tiller-host tiller-deploy.kube-system.svc.cluster.local | tee /logs/hello-kitty-apply.log'

From the following tiller log, we know that tiller was ready on 15:22:32

{"log":"[main] 2019/07/24 15:22:32 Starting Tiller v2.13.1 (tls=false)\n","stream":"stderr","time":"2019-07-24T15:22:32.08652286Z"}
{"log":"[main] 2019/07/24 15:22:32 GRPC listening on :44134\n","stream":"stderr","time":"2019-07-24T15:22:32.086573828Z"}
{"log":"[main] 2019/07/24 15:22:32 Probes listening on :44135\n","stream":"stderr","time":"2019-07-24T15:22:32.086585783Z"}
{"log":"[main] 2019/07/24 15:22:32 Storage driver is ConfigMap\n","stream":"stderr","time":"2019-07-24T15:22:32.086593709Z"}
{"log":"[main] 2019/07/24 15:22:32 Max history per release is 0\n","stream":"stderr","time":"2019-07-24T15:22:32.086601303Z"}
{"log":"[storage] 2019/07/24 15:22:36 listing all releases with filter\n","stream":"stderr","time":"2019-07-24T15:22:36.126542029Z"}
...

We had a similar tiller issue which fixed long time ago https://review.opendev.org/#/c/657087/

Revision history for this message
Angie Wang (angiewang) wrote :

Looks like this LP is similar to https://bugs.launchpad.net/starlingx/+bug/1837055

Ghada Khalil (gkhalil)
tags: added: stx.containers
summary: - hello-kitty application apply failed
+ hello-kitty application apply failed after controller swact
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as a duplicate of https://bugs.launchpad.net/starlingx/+bug/1837055

Fixed by:
https://review.opendev.org/672741
https://review.opendev.org/672742

Merged on 2019-07-25

Note: 1837055 is left open for now for further investigation of the root-cause, but the symptoms should be addressed as of 2019-07-25

Changed in starlingx:
assignee: nobody → Angie Wang (angiewang)
importance: Undecided → High
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.