Bug #1830290 “platform-integ-apps apply failed” : Bugs : StarlingX

Revision history for this message

Juan Carlos Alonso (juancarlosa) wrote on 2019-05-23:

#1

simplex-sysinv.log Edit (165.4 KiB, text/plain)

Revision history for this message

Juan Carlos Alonso (juancarlosa) wrote on 2019-05-23:

#2

Changes on ISO: 20190523T013000Z

http://mirror.starlingx.cengn.ca/mirror/starlingx/master/centos/20190523T013000Z/outputs/CHANGELOG.txt

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-05-24:

#3

The following commit will make the application apply more deterministic on All-in-one systems:
https://review.opendev.org/#/c/660918/

This was merged on May 23. Please re-test with the cengn May 24 build.

Changed in starlingx:
status:	New → Incomplete

Revision history for this message

Juan Carlos Alonso (juancarlosa) wrote on 2019-05-24:

#4

Tested with May 24 build. Issue is still present.

[wrsroot@controller-0 ~(keystone_admin)]$ cat /etc/build.info
###
### StarlingX
### Built from master
###

OS="centos"
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20190524T013000Z"

JOB="STX_build_master_master"
<email address hidden>"
BUILD_NUMBER="114"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-05-24 01:30:00 +0000"

[wrsroot@controller-0 ~(keystone_admin)]$ system application-list
+---------------------+---------+-------------------------------+---------------+---------------+------------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+---------+-------------------------------+---------------+---------------+------------------------------------------+
| platform-integ-apps | 1.0-5 | platform-integration-manifest | manifest.yaml | upload-failed | operation aborted, check logs for detail |
+---------------------+---------+-------------------------------+---------------+---------------+------------------------------------------+

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-05-24:

#5

Are the two occurrences the same? The first failure you mention is an apply-failed. The second failure is upload-failed. Is the upload-failure consistent/reproducible with the new load? Do you have logs from the second failure?

summary:

- platform-integ-apps apply failed
+ Simplex: platform-integ-apps apply failed

Revision history for this message

Al Bailey (albailey1974) wrote on 2019-05-24: Re: Simplex: platform-integ-apps apply failed

#6

Please provide full collect logs for the second failure.
The sysinv logs from the 23rd indicate that the service running on port 9001 (local docker registry?) may not have been running or configured properly.

2019-05-23 22:02:14.120 99896 INFO sysinv.conductor.kube_app [-] Application overrides generated.
2019-05-23 22:02:14.165 99896 INFO sysinv.conductor.kube_app [-] Armada manifest file has no img tags for chart helm-toolkit
2019-05-23 22:02:14.183 99896 INFO sysinv.conductor.kube_app [-] Image 192.168.204.2:9001/quay.io/external_storage/rbd-provisioner:v2.1.1-k8s1.11 download started from local registry
2019-05-23 22:02:14.212 99896 INFO sysinv.conductor.kube_app [-] Image 192.168.204.2:9001/docker.io/port/ceph-config-helper:v1.10.3 download started from local registry
2019-05-23 22:02:14.384 92020 INFO sysinv.agent.manager [req-9ba7d1b3-c924-4da3-9f92-535efb16040a admin None] Runtime manifest apply completed for classes [u'openstack::keystone::endpoint::runtime', u'platform::firewall::runtime', u'platform::sysinv::runtime'].
2019-05-23 22:02:14.385 92020 INFO sysinv.agent.manager [req-9ba7d1b3-c924-4da3-9f92-535efb16040a admin None] Agent config applied 79e3d68c-85bc-4f58-854e-aa0dc51bf3fa
2019-05-23 22:02:14.409 99896 INFO sysinv.conductor.manager [req-9ba7d1b3-c924-4da3-9f92-535efb16040a admin None] SYS_I Clear system config alarm: controller-0 target config 79e3d68c-85bc-4f58-854e-aa0dc51bf3fa
2019-05-23 22:02:24.469 99896 ERROR sysinv.conductor.kube_app [-] Image 192.168.204.2:9001/docker.io/port/ceph-config-helper:v1.10.3 download failed from local registry: 500 Server Error: Internal Server Error ("Get https://192.168.204.2:9001/v2/: net/http: TLS handshake timeout")
2019-05-23 22:02:24.480 99896 ERROR sysinv.conductor.kube_app [-] Image 192.168.204.2:9001/quay.io/external_storage/rbd-provisioner:v2.1.1-k8s1.11 download failed from local registry: 500 Server Error: Internal Server Error ("Get https://192.168.204.2:9001/v2/: net/http: TLS handshake timeout")
2019-05-23 22:02:24.480 99896 ERROR sysinv.conductor.kube_app [-] Deployment of application platform-integ-apps (1.0-5) failed: failed to download one or more image(s).
2019-05-23 22:02:24.480 99896 TRACE sysinv.conductor.kube_app Traceback (most recent call last):
2019-05-23 22:02:24.480 99896 TRACE sysinv.conductor.kube_app File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 1197, in perform_app_apply
2019-05-23 22:02:24.480 99896 TRACE sysinv.conductor.kube_app self._download_images(app)
2019-05-23 22:02:24.480 99896 TRACE sysinv.conductor.kube_app File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 534, in _download_images
2019-05-23 22:02:24.480 99896 TRACE sysinv.conductor.kube_app reason="failed to download one or more image(s).")
2019-05-23 22:02:24.480 99896 TRACE sysinv.conductor.kube_app KubeAppApplyFailure: Deployment of application platform-integ-apps (1.0-5) failed: failed to download one or more image(s).
2019-05-23 22:02:24.480 99896 TRACE sysinv.conductor.kube_app
2019-05-23 22:02:24.491 99896 ERROR sysinv.conductor.kube_app [-] Application apply aborted!.

Please provide full collect logs for the second failure.
The sysinv logs from the 23rd indicate that the service running on port 9001 (local docker registry?) may not have been running or configured properly.

2019-05-23 22:02:14.120 99896 INFO sysinv.conductor.kube_app [-] Application overrides generated.
2019-05-23 22:02:14.165 99896 INFO sysinv.conductor.kube_app [-] Armada manifest file has no img tags for chart helm-toolkit
2019-05-23 22:02:14.183 99896 INFO sysinv.conductor.kube_app [-] Image 192.168.204.2:9001/quay.io/external_storage/rbd-provisioner:v2.1.1-k8s1.11 download started from local registry
2019-05-23 22:02:14.212 99896 INFO sysinv.conductor.kube_app [-] Image 192.168.204.2:9001/docker.io/port/ceph-config-helper:v1.10.3 download started from local registry
2019-05-23 22:02:14.384 92020 INFO sysinv.agent.manager [req-9ba7d1b3-c924-4da3-9f92-535efb16040a admin None] Runtime manifest apply completed for classes [u'openstack::keystone::endpoint::runtime', u'platform::firewall::runtime', u'platform::sysinv::runtime'].
2019-05-23 22:02:14.385 92020 INFO sysinv.agent.manager [req-9ba7d1b3-c924-4da3-9f92-535efb16040a admin None] Agent config applied  79e3d68c-85bc-4f58-854e-aa0dc51bf3fa
2019-05-23 22:02:14.409 99896 INFO sysinv.conductor.manager [req-9ba7d1b3-c924-4da3-9f92-535efb16040a admin None] SYS_I Clear system config alarm: controller-0 target config 79e3d68c-85bc-4f58-854e-aa0dc51bf3fa
2019-05-23 22:02:24.469 99896 ERROR sysinv.conductor.kube_app [-] Image 192.168.204.2:9001/docker.io/port/ceph-config-helper:v1.10.3 download failed from local registry: 500 Server Error: Internal Server Error ("Get https://192.168.204.2:9001/v2/: net/http: TLS handshake timeout")
2019-05-23 22:02:24.480 99896 ERROR sysinv.conductor.kube_app [-] Image 192.168.204.2:9001/quay.io/external_storage/rbd-provisioner:v2.1.1-k8s1.11 download failed from local registry: 500 Server Error: Internal Server Error ("Get https://192.168.204.2:9001/v2/: net/http: TLS handshake timeout")
2019-05-23 22:02:24.480 99896 ERROR sysinv.conductor.kube_app [-] Deployment of application platform-integ-apps (1.0-5) failed: failed to download one or more image(s).
2019-05-23 22:02:24.480 99896 TRACE sysinv.conductor.kube_app Traceback (most recent call last):
2019-05-23 22:02:24.480 99896 TRACE sysinv.conductor.kube_app   File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 1197, in perform_app_apply
2019-05-23 22:02:24.480 99896 TRACE sysinv.conductor.kube_app     self._download_images(app)
2019-05-23 22:02:24.480 99896 TRACE sysinv.conductor.kube_app   File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 534, in _download_images
2019-05-23 22:02:24.480 99896 TRACE sysinv.conductor.kube_app     reason="failed to download one or more image(s).")
2019-05-23 22:02:24.480 99896 TRACE sysinv.conductor.kube_app KubeAppApplyFailure: Deployment of application platform-integ-apps (1.0-5) failed: failed to download one or more image(s).
2019-05-23 22:02:24.480 99896 TRACE sysinv.conductor.kube_app 
2019-05-23 22:02:24.491 99896 ERROR sysinv.conductor.kube_app [-] Application apply aborted!.

Revision history for this message

Juan Carlos Alonso (juancarlosa) wrote on 2019-05-24:

#7

First failure: 'apply-failed', using proxy.
Second failure: 'upload-failed', using local registry.

Talking with team, use of proxy is the best option in virtual environments.

Re-tested with ISO May 24, first failure faced again.

[wrsroot@controller-0 ~(keystone_admin)]$ cat /etc/build.info
###
### StarlingX
### Built from master
###

OS="centos"
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20190524T013000Z"

JOB="STX_build_master_master"
<email address hidden>"
BUILD_NUMBER="114"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-05-24 01:30:00 +0000"

Revision history for this message

Juan Carlos Alonso (juancarlosa) wrote on 2019-05-24:

#8

sysinv.log Edit (170.5 KiB, text/plain)

Revision history for this message

Juan Carlos Alonso (juancarlosa) wrote on 2019-05-24:

#9

controller-0_20190524.151626.tar Edit (26.4 MiB, application/x-tar)

Revision history for this message

Al Bailey (albailey1974) wrote on 2019-05-27:

#10

The collect logs show the same TLS Handshake timeout.

I see many other projects with bugs reported related to go net/http: tls handshake timeout

One suggested an MTU mismatch (I am uncertain how to check this)

Another bug raised against kubernetes indicates that https_proxy is causing the problem
https://github.com/kubernetes/kubernetes/issues/13382

Revision history for this message

Jerry Sun (jerry-sun-u) wrote on 2019-05-27:

#11

Do you have both http and https proxy configured for the registry? We believe that could be causing issues (see https://bugs.launchpad.net/starlingx/+bug/1830436)

Revision history for this message

Juan Carlos Alonso (juancarlosa) wrote on 2019-05-27:

#12

Yes, I am using:

docker_http_proxy: http://proxy-chain.intel.com:911
docker_https_proxy: http://proxy-chain.intel.com:912

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-05-27:

#13

Hi Juan, Is there a reason you need two proxies? Please retest with only one (either http or https proxy) and let us know the results.

Revision history for this message

Cristopher Lemus (cjlemusc) wrote on 2019-05-28:

#14

controller-0_20190528.125052.tar Edit (27.3 MiB, application/x-tar)

Working on Baremetal, Standard (2+2) and using 20190527T233000Z and ansible (with corresponding workarounds), platform-integ-apps remains on Uploaded status.

If we try to manually apply it (system application-apply platform-integ-apps) it fails:

The first error encountered on /var/log/sysinv.log is as follow:

2019-05-28 12:43:28.431 106547 ERROR sysinv.conductor.kube_app [-] Received a false positive response from Docker/Armada. Failed to apply application manifest /manifests/platform-integ-apps/1.0-5/platform-integ-apps-manifest.yaml: 2019-05-28 12:37:19.800 42 DEBUG armada.handlers.document [-] Resolving reference /manifests/platform-integ-apps/1.0-5/platform-integ-apps-manifest.yaml. resolve_reference /usr/local/lib/python3.6/dist-packages/armada/handlers/document.py:49

A full collect is attached.

Revision history for this message

Bart Wensley (bartwensley) wrote on 2019-05-28:

#15

Download full text (17.7 KiB)

This is also failing for me in a 2+2 virtual box configuration. This is a non-openstack install - I am not applying the openstack related labels to any of the hosts.

The platform-integ-apps application repeatedly fails to apply. The signature is slightly different than in the collect from Christopher. I will attach a collect.

Here are the armada logs for the failed apply:

2019-05-28 19:50:27.869 41 DEBUG armada.handlers.document [-] Resolving reference /manifests/platform-integ-apps/1.0-5/platform-integ-apps-manifest.yaml. resolve_reference /usr/local/lib/python3.6/dist-packages/armada/handlers/document.py:49
2019-05-28 19:50:27.904 41 DEBUG armada.handlers.tiller [-] Using Tiller namespace: kube-system _get_tiller_namespace /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:174
2019-05-28 19:50:27.963 41 DEBUG armada.handlers.tiller [-] Found at least one Running Tiller pod. _get_tiller_pod /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:150
2019-05-28 19:50:27.963 41 DEBUG armada.handlers.tiller [-] Using Tiller pod IP: 192.168.204.3 _get_tiller_ip /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:165
2019-05-28 19:50:27.963 41 DEBUG armada.handlers.tiller [-] Using Tiller host port: 44134 _get_tiller_port /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:170
2019-05-28 19:50:27.964 41 DEBUG armada.handlers.tiller [-] Tiller getting gRPC insecure channel at 192.168.204.3:44134 with options: [grpc.max_send_message_length=429496729, grpc.max_receive_message_length=429496729] get_channel /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:124
2019-05-28 19:50:27.999 41 DEBUG armada.handlers.tiller [-] Armada is using Tiller at: None:44134, namespace=kube-system, timeout=300 __init__ /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:104
2019-05-28 19:50:28.010 41 INFO armada.handlers.lock [-] Acquiring lock
2019-05-28 19:50:28.024 41 INFO armada.handlers.lock [-] Lock Custom Resource Definition not found, creating now
2019-05-28 19:50:28.047 41 DEBUG armada.handlers.lock [-] Encountered known issue while creating CRD, continuing create_definition /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:297
2019-05-28 19:50:28.050 41 INFO armada.handlers.lock [-] Lock Custom Resource Definition not found, creating now
2019-05-28 19:50:28.109 41 DEBUG armada.utils.validate [-] Validating document [armada/Chart/v1] helm-toolkit validate_armada_document /usr/local/lib/python3.6/dist-packages/armada/utils/validate.py:152
2019-05-28 19:50:28.111 41 DEBUG armada.utils.validate [-] Validating document [armada/Chart/v1] kube-system-rbd-provisioner validate_armada_document /usr/local/lib/python3.6/dist-packages/armada/utils/validate.py:152
2019-05-28 19:50:28.111 41 DEBUG armada.utils.validate [-] Validating document [armada/Chart/v1] kube-system-ceph-pools-audit validate_armada_document /usr/local/lib/python3.6/dist-packages/armada/utils/validate.py:152
2019-05-28 19:50:28.112 41 DEBUG armada.utils.validate [-] Validating document [armada/ChartGroup/v1] starlingx-ceph-charts validate_armada_document /usr/local/lib/python3.6/dist-packages/armada/utils/valida...

This is also failing for me in a 2+2 virtual box configuration. This is a non-openstack install - I am not applying the openstack related labels to any of the hosts.

The platform-integ-apps application repeatedly fails to apply. The signature is slightly different than in the collect from Christopher. I will attach a collect.

Here are the armada logs for the failed apply:

2019-05-28 19:50:27.869 41 DEBUG armada.handlers.document [-] Resolving reference /manifests/platform-integ-apps/1.0-5/platform-integ-apps-manifest.yaml. resolve_reference /usr/local/lib/python3.6/dist-packages/armada/handlers/document.py:49
2019-05-28 19:50:27.904 41 DEBUG armada.handlers.tiller [-] Using Tiller namespace: kube-system _get_tiller_namespace /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:174
2019-05-28 19:50:27.963 41 DEBUG armada.handlers.tiller [-] Found at least one Running Tiller pod. _get_tiller_pod /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:150
2019-05-28 19:50:27.963 41 DEBUG armada.handlers.tiller [-] Using Tiller pod IP: 192.168.204.3 _get_tiller_ip /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:165
2019-05-28 19:50:27.963 41 DEBUG armada.handlers.tiller [-] Using Tiller host port: 44134 _get_tiller_port /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:170
2019-05-28 19:50:27.964 41 DEBUG armada.handlers.tiller [-] Tiller getting gRPC insecure channel at 192.168.204.3:44134 with options: [grpc.max_send_message_length=429496729, grpc.max_receive_message_length=429496729] get_channel /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:124
2019-05-28 19:50:27.999 41 DEBUG armada.handlers.tiller [-] Armada is using Tiller at: None:44134, namespace=kube-system, timeout=300 __init__ /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:104
2019-05-28 19:50:28.010 41 INFO armada.handlers.lock [-] Acquiring lock
2019-05-28 19:50:28.024 41 INFO armada.handlers.lock [-] Lock Custom Resource Definition not found, creating now
2019-05-28 19:50:28.047 41 DEBUG armada.handlers.lock [-] Encountered known issue while creating CRD, continuing create_definition /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:297
2019-05-28 19:50:28.050 41 INFO armada.handlers.lock [-] Lock Custom Resource Definition not found, creating now
2019-05-28 19:50:28.109 41 DEBUG armada.utils.validate [-] Validating document [armada/Chart/v1] helm-toolkit validate_armada_document /usr/local/lib/python3.6/dist-packages/armada/utils/validate.py:152
2019-05-28 19:50:28.111 41 DEBUG armada.utils.validate [-] Validating document [armada/Chart/v1] kube-system-rbd-provisioner validate_armada_document /usr/local/lib/python3.6/dist-packages/armada/utils/validate.py:152
2019-05-28 19:50:28.111 41 DEBUG armada.utils.validate [-] Validating document [armada/Chart/v1] kube-system-ceph-pools-audit validate_armada_document /usr/local/lib/python3.6/dist-packages/armada/utils/validate.py:152
2019-05-28 19:50:28.112 41 DEBUG armada.utils.validate [-] Validating document [armada/ChartGroup/v1] starlingx-ceph-charts validate_armada_document /usr/local/lib/python3.6/dist-packages/armada/utils/validate.py:152
2019-05-28 19:50:28.112 41 DEBUG armada.utils.validate [-] Validating document [armada/Manifest/v1] platform-integration-manifest validate_armada_document /usr/local/lib/python3.6/dist-packages/armada/utils/validate.py:152
2019-05-28 19:50:28.114 41 DEBUG armada.utils.validate [-] Validating document [armada/Chart/v1] helm-toolkit validate_armada_document /usr/local/lib/python3.6/dist-packages/armada/utils/validate.py:152
2019-05-28 19:50:28.114 41 DEBUG armada.utils.validate [-] Validating document [armada/Chart/v1] kube-system-rbd-provisioner validate_armada_document /usr/local/lib/python3.6/dist-packages/armada/utils/validate.py:152
2019-05-28 19:50:28.116 41 DEBUG armada.utils.validate [-] Validating document [armada/Chart/v1] kube-system-ceph-pools-audit validate_armada_document /usr/local/lib/python3.6/dist-packages/armada/utils/validate.py:152
2019-05-28 19:50:28.117 41 DEBUG armada.utils.validate [-] Validating document [armada/ChartGroup/v1] starlingx-ceph-charts validate_armada_document /usr/local/lib/python3.6/dist-packages/armada/utils/validate.py:152
2019-05-28 19:50:28.117 41 DEBUG armada.utils.validate [-] Validating document [armada/Manifest/v1] platform-integration-manifest validate_armada_document /usr/local/lib/python3.6/dist-packages/armada/utils/validate.py:152
2019-05-28 19:50:28.118 41 INFO armada.handlers.armada [-] Performing pre-flight operations.
2019-05-28 19:50:28.118 41 DEBUG armada.handlers.tiller [-] Using Tiller namespace: kube-system _get_tiller_namespace /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:174
2019-05-28 19:50:28.140 41 DEBUG armada.handlers.tiller [-] Found at least one Running Tiller pod. _get_tiller_pod /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:150
2019-05-28 19:50:28.141 41 DEBUG armada.handlers.tiller [-] Using Tiller pod IP: 192.168.204.3 _get_tiller_ip /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:165
2019-05-28 19:50:28.141 41 DEBUG armada.handlers.tiller [-] Getting Tiller Status: Tiller exists tiller_status /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:182
2019-05-28 19:50:28.142 41 INFO armada.handlers.armada [-] Downloading tarball from: http://controller:8080/helm_charts/stx-platform/rbd-provisioner-0.1.0.tgz
2019-05-28 19:50:28.142 41 WARNING armada.handlers.armada [-] Disabling server validation certs to extract charts
2019-05-28 19:50:28.194 41 INFO armada.handlers.armada [-] Downloading tarball from: http://controller:8080/helm_charts/stx-platform/helm-toolkit-0.1.0.tgz
2019-05-28 19:50:28.195 41 WARNING armada.handlers.armada [-] Disabling server validation certs to extract charts
2019-05-28 19:50:28.228 41 INFO armada.handlers.armada [-] Downloading tarball from: http://controller:8080/helm_charts/stx-platform/ceph-pools-audit-0.1.0.tgz
2019-05-28 19:50:28.229 41 WARNING armada.handlers.armada [-] Disabling server validation certs to extract charts
2019-05-28 19:50:28.262 41 DEBUG armada.handlers.tiller [-] Tiller ListReleases() with timeout=300, request=limit: 32
status_codes: UNKNOWN
status_codes: DEPLOYED
status_codes: DELETED
status_codes: DELETING
status_codes: FAILED
status_codes: PENDING_INSTALL
status_codes: PENDING_UPGRADE
status_codes: PENDING_ROLLBACK
 get_results /usr/local/lib/python3.6/dist-packages/armada/handlers/tiller.py:210
2019-05-28 19:50:28.293 41 INFO armada.handlers.armada [-] Processing ChartGroup: starlingx-ceph-charts (StarlingX Ceph Charts), sequenced=True
2019-05-28 19:50:28.293 41 INFO armada.handlers.chart_deploy [-] [chart=rbd-provisioner]: Processing Chart, release=stx-rbd-provisioner
2019-05-28 19:50:28.294 41 INFO armada.handlers.chart_deploy [-] [chart=rbd-provisioner]: known: [], release_name: stx-rbd-provisioner
2019-05-28 19:50:28.295 41 INFO armada.handlers.chartbuilder [-] [chart=rbd-provisioner]: Building dependency chart helm-toolkit for release rbd-provisioner.
2019-05-28 19:50:28.319 41 INFO armada.handlers.chart_deploy [-] [chart=rbd-provisioner]: Installing release stx-rbd-provisioner in namespace kube-system, wait=True, timeout=1800s
2019-05-28 19:50:28.324 41 INFO armada.handlers.tiller [-] [chart=rbd-provisioner]: Helm install release: wait=True, timeout=1800
2019-05-28 19:51:25.695 41 INFO armada.handlers.chart_deploy [-] [chart=rbd-provisioner]: Install completed with results from Tiller: {'release': 'stx-rbd-provisioner', 'namespace': 'kube-system', 'status': 'DEPLOYED', 'description': 'Install complete', 'version': 1}
2019-05-28 19:51:25.695 41 INFO armada.handlers.wait [-] [chart=rbd-provisioner]: Waiting for resource type=job, namespace=kube-system labels=app=rbd-provisioner for 1743s (k8s wait 1 times, sleep 1s)
2019-05-28 19:51:25.696 41 DEBUG armada.handlers.wait [-] [chart=rbd-provisioner]: Starting to wait on: namespace=kube-system, resource type=job, label_selector=(app=rbd-provisioner), timeout=1743 _watch_resource_completions /usr/local/lib/python3.6/dist-packages/armada/handlers/wait.py:273
2019-05-28 19:51:25.720 41 DEBUG armada.handlers.wait [-] [chart=rbd-provisioner]: Skipping wait, no job resources found. _watch_resource_completions /usr/local/lib/python3.6/dist-packages/armada/handlers/wait.py:292
2019-05-28 19:51:25.720 41 INFO armada.handlers.wait [-] [chart=rbd-provisioner]: Waiting for resource type=pod, namespace=kube-system labels=app=rbd-provisioner for 1743s (k8s wait 1 times, sleep 1s)
2019-05-28 19:51:25.720 41 DEBUG armada.handlers.wait [-] [chart=rbd-provisioner]: Starting to wait on: namespace=kube-system, resource type=pod, label_selector=(app=rbd-provisioner), timeout=1743 _watch_resource_completions /usr/local/lib/python3.6/dist-packages/armada/handlers/wait.py:273
2019-05-28 19:51:28.180 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 19:52:28.263 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 19:53:28.356 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 19:54:28.445 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 19:55:28.540 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 19:56:28.631 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 19:57:28.717 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 19:58:28.814 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 19:59:28.901 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:00:28.997 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:01:29.104 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:02:29.194 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:03:29.290 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:04:29.375 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:05:29.465 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:06:29.543 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:07:29.632 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:08:29.727 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:09:29.815 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:10:29.906 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:11:29.989 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:12:30.074 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:13:30.169 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:14:30.252 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:15:30.331 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:16:30.410 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:17:30.513 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:18:30.612 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:19:30.691 41 DEBUG armada.handlers.lock [-] Updating lock update_lock /usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py:155
2019-05-28 20:20:28.756 41 ERROR armada.handlers.wait [-] [chart=rbd-provisioner]: Timed out waiting for pods (namespace=kube-system, labels=(app=rbd-provisioner)). None found! Are `wait.labels` correct? Does `wait.resources` need to exclude %s?
2019-05-28 20:20:28.756 41 ERROR armada.handlers.armada [-] Chart deploy [rbd-provisioner] failed: armada.exceptions.k8s_exceptions.KubernetesWatchTimeoutException: Timed out waiting for pods (namespace=kube-system, labels=(app=rbd-provisioner)). None found! Are `wait.labels` correct? Does `wait.resources` need to exclude {}?
2019-05-28 20:20:28.756 41 ERROR armada.handlers.armada Traceback (most recent call last):
2019-05-28 20:20:28.756 41 ERROR armada.handlers.armada   File "/usr/local/lib/python3.6/dist-packages/armada/handlers/armada.py", line 222, in handle_result
2019-05-28 20:20:28.756 41 ERROR armada.handlers.armada     result = get_result()
2019-05-28 20:20:28.756 41 ERROR armada.handlers.armada   File "/usr/local/lib/python3.6/dist-packages/armada/handlers/armada.py", line 233, in <lambda>
2019-05-28 20:20:28.756 41 ERROR armada.handlers.armada     if (handle_result(chart, lambda: deploy_chart(chart))):
2019-05-28 20:20:28.756 41 ERROR armada.handlers.armada   File "/usr/local/lib/python3.6/dist-packages/armada/handlers/armada.py", line 211, in deploy_chart
2019-05-28 20:20:28.756 41 ERROR armada.handlers.armada     prefix, known_releases)
2019-05-28 20:20:28.756 41 ERROR armada.handlers.armada   File "/usr/local/lib/python3.6/dist-packages/armada/handlers/chart_deploy.py", line 232, in execute
2019-05-28 20:20:28.756 41 ERROR armada.handlers.armada     chart_wait.wait(timer)
2019-05-28 20:20:28.756 41 ERROR armada.handlers.armada   File "/usr/local/lib/python3.6/dist-packages/armada/handlers/wait.py", line 104, in wait
2019-05-28 20:20:28.756 41 ERROR armada.handlers.armada     wait.wait(timeout=timeout)
2019-05-28 20:20:28.756 41 ERROR armada.handlers.armada   File "/usr/local/lib/python3.6/dist-packages/armada/handlers/wait.py", line 244, in wait
2019-05-28 20:20:28.756 41 ERROR armada.handlers.armada     raise k8s_exceptions.KubernetesWatchTimeoutException(error)
2019-05-28 20:20:28.756 41 ERROR armada.handlers.armada armada.exceptions.k8s_exceptions.KubernetesWatchTimeoutException: Timed out waiting for pods (namespace=kube-system, labels=(app=rbd-provisioner)). None found! Are `wait.labels` correct? Does `wait.resources` need to exclude {}?
2019-05-28 20:20:28.756 41 ERROR armada.handlers.armada
2019-05-28 20:20:28.796 41 ERROR armada.handlers.armada [-] Chart deploy(s) failed: ['rbd-provisioner']
2019-05-28 20:20:29.776 41 INFO armada.handlers.lock [-] Releasing lock
2019-05-28 20:20:29.787 41 ERROR armada.cli [-] Caught internal exception: armada.exceptions.armada_exceptions.ChartDeployException: Exception deploying charts: ['rbd-provisioner']
2019-05-28 20:20:29.787 41 ERROR armada.cli Traceback (most recent call last):
2019-05-28 20:20:29.787 41 ERROR armada.cli   File "/usr/local/lib/python3.6/dist-packages/armada/cli/__init__.py", line 39, in safe_invoke
2019-05-28 20:20:29.787 41 ERROR armada.cli     self.invoke()
2019-05-28 20:20:29.787 41 ERROR armada.cli   File "/usr/local/lib/python3.6/dist-packages/armada/cli/apply.py", line 209, in invoke
2019-05-28 20:20:29.787 41 ERROR armada.cli     resp = self.handle(documents, tiller)
2019-05-28 20:20:29.787 41 ERROR armada.cli   File "/usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py", line 63, in func_wrapper
2019-05-28 20:20:29.787 41 ERROR armada.cli     return future.result()
2019-05-28 20:20:29.787 41 ERROR armada.cli   File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
2019-05-28 20:20:29.787 41 ERROR armada.cli     return self.__get_result()
2019-05-28 20:20:29.787 41 ERROR armada.cli   File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
2019-05-28 20:20:29.787 41 ERROR armada.cli     raise self._exception
2019-05-28 20:20:29.787 41 ERROR armada.cli   File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
2019-05-28 20:20:29.787 41 ERROR armada.cli     result = self.fn(*self.args, **self.kwargs)
2019-05-28 20:20:29.787 41 ERROR armada.cli   File "/usr/local/lib/python3.6/dist-packages/armada/cli/apply.py", line 252, in handle
2019-05-28 20:20:29.787 41 ERROR armada.cli     return armada.sync()
2019-05-28 20:20:29.787 41 ERROR armada.cli   File "/usr/local/lib/python3.6/dist-packages/armada/handlers/armada.py", line 249, in sync
2019-05-28 20:20:29.787 41 ERROR armada.cli     raise armada_exceptions.ChartDeployException(failures)
2019-05-28 20:20:29.787 41 ERROR armada.cli armada.exceptions.armada_exceptions.ChartDeployException: Exception deploying charts: ['rbd-provisioner']
2019-05-28 20:20:29.787 41 ERROR armada.cli

summary:

- Simplex: platform-integ-apps apply failed
+ platform-integ-apps apply failed

Revision history for this message

Bart Wensley (bartwensley) wrote on 2019-05-28:

#16

A couple more things. This is happening in a designer load built on May 28:
SW_VERSION="19.01"
BUILD_TARGET="Unknown"
BUILD_TYPE="Informal"
BUILD_ID="n/a"
JOB="n/a"
BUILD_BY="bwensley"
BUILD_NUMBER="n/a"
BUILD_HOST="yow-bwensley-lx-vm2"
BUILD_DATE="2019-05-28 06:45:22 -0500"
BUILD_DIR="/"
WRS_SRC_DIR="/localdisk/designer/bwensley/starlingx-1/cgcs-root"
WRS_GIT_BRANCH="HEAD"
CGCS_SRC_DIR="/localdisk/designer/bwensley/starlingx-1/cgcs-root/stx"
CGCS_GIT_BRANCH="HEAD"

Also, this happened twice today. I have not been able to do a successful installation.

Revision history for this message

Bart Wensley (bartwensley) wrote on 2019-05-28:

#17

controller-0_20190528.204149.tar Edit (27.3 MiB, application/x-tar)

Revision history for this message

Cristopher Lemus (cjlemusc) wrote on 2019-05-29:

#18

Just a note, the collect and logs that I uploaded, correspond to a baremetal server using mirror (local) registry. Initially, this bug is reported for proxy. As suggested, I created a new launchpad to track the registry issue separately: https://bugs.launchpad.net/starlingx/+bug/1830826

Ghada Khalil (gkhalil) on 2019-05-29

Changed in starlingx:
assignee:	nobody → Bob Church (rchurch)
importance:	Undecided → High
status:	Incomplete → Confirmed

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-05-29:

#19

Marking as release gating; impacts container deployment.

tags:

added: stx.2.0

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-05-29: Fix proposed to config (master)

#20

Fix proposed to branch: master
Review: https://review.opendev.org/662075

Changed in starlingx:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-05-30: Fix merged to config (master)

#21

Reviewed: https://review.opendev.org/662075
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=12ff7c16f8850355fc9e0afa7a406083b4d42deb
Submitter: Zuul
Branch: master

commit 12ff7c16f8850355fc9e0afa7a406083b4d42deb
Author: Robert Church <email address hidden>
Date: Wed May 29 02:04:35 2019 -0400

Update rbd-provisioner replicas based on installed controllers

Currently the number of rbd-provisioner replicas is driven by the
stx-openstack application's 'openstack-control-plane' labels.

On systems where this label has not been applied to the controllers,
this will result in zero provisioners being installed.

    Break the dependency on the stx-openstack app and set the number of
    replicas based on the number of installed controllers as the
    rbd-provisioner node selector will install in k8s masters (i.e.
    controllers).

Also update the provisioner's storage-init pod to align with the same
node selection criteria as the rbd-provisioner pod.

    Change-Id: Ida180fd12a4923c8cdd5bccf25a1a1e2af4f8a90
    Closes-Bug: #1830290
    Signed-off-by: Robert Church <email address hidden>

Changed in starlingx:
status:	In Progress → Fix Released

Revision history for this message

Juan Carlos Alonso (juancarlosa) wrote on 2019-05-30:

#22

Just for your information, after the switch to Ansible, I was able to reproduce this issue in all configs.
All hosts can be unlocked, enabled and available and provisioning fails in the same point.

Revision history for this message

Cristopher Lemus (cjlemusc) wrote on 2019-06-12: Kubernetes cheat sheet

#23

Kubernetes-Cheat-Sheet_04032019.pdf Edit (163.8 KiB, application/pdf; name="Kubernetes-Cheat-Sheet_04032019.pdf")

Revision history for this message

Cristopher Lemus (cjlemusc) wrote on 2019-06-28: Sanity logs

#24

En un rato te mando external en virtual

Revision history for this message

Cristopher Lemus (cjlemusc) wrote on 2019-06-28:

#25

Somehow, a slack comment managed to update this bug, please, disregard.

StarlingX

platform-integ-apps apply failed

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches