Comment 9 for bug 1828896

Revision history for this message
Bob Church (rchurch) wrote :

Issues observed here:

1) The stx-openstack app is being applied prior to the completion of the platform-integ-apps
   - Per: http://lists.starlingx.io/pipermail/starlingx-discuss/2019-May/004447.html

     Prior to running any additional user applications (including the stx-openstack
     application), you will want to make sure that the platform application has been
     applied [4] to ensure that persistent volume claims will be serviced. Other than
     this check, no other additional changes are required from an automation
     perspective to launch the stx-openstack application.

   - https://opendev.org/starlingx/config/commit/4758cdfbd864826d46e6e06571d40693dd040b14 will prevent this apply if attempted too soon

2) The stx-openstack apply aborts because the secret created by the platform-integ-apps didn't occur yet
   - Update for #1 will avoid this

3) platform-integ-apps overrides are being overwritten for the helm toolkit when the stx-openstack upload/apply occurs. This seems to cause the abort of platform-integ-apps
   - We need to land https://review.opendev.org/#/c/660498/ to isolate the app overrides. The difference is shown here as the toolkit is present in both helm repos as they are required by both apps

     [wrsroot@controller-0 19.05(keystone_admin)]$ diff helm-toolkit-helm-toolkit.yaml ~/openstack-save/helm-toolkit-helm-toolkit.yaml
     3c3
     < location: http://controller:8080/helm_charts/stx-platform/helm-toolkit-0.1.0.tgz
     ---
     > location: http://controller:8080/helm_charts/starlingx/helm-toolkit-0.1.0.tgz

In summary, we have a sequencing issue here which can no longer happen based on the inter_app dependency code that I added in https://opendev.org/starlingx/config/commit/4758cdfbd864826d46e6e06571d40693dd040b14

Timeline:
---------------------------------------------
# Ceph client is accessable

2019-05-21 07:13:06.530 98217 INFO ceph_client [-] Request params: url=https://controller-0:5001/request?wait=1, json={'prefix': 'fsid', 'format': 'text'}
2019-05-21 07:13:06.546 98217 INFO ceph_client [-] Result: {u'waiting': [], u'has_failed': False, u'state': u'success', u'is_waiting': False, u'running': [], u'failed': [], u'finished': [{u'outb': u'326ed215-c644-4855-b5f9-eaeb0328ff73\n', u'outs': u'', u'command': u'fsid format=text'}], u'is_finished': True, u'id': u'140310473308432'}

# Audit task triggers creation/upload of platform-integ-apps

2019-05-21 07:13:20.082 99992 INFO sysinv.conductor.manager [-] Platform managed application platform-integ-apps: Creating...
2019-05-21 07:13:21.428 99992 INFO sysinv.conductor.manager [-] Platform managed application platform-integ-apps: Uploading...
2019-05-21 07:13:21.430 99992 INFO sysinv.conductor.kube_app [-] Application (platform-integ-apps) upload started.
2019-05-21 07:13:23.633 99992 INFO sysinv.conductor.kube_app [-] Manifest file /manifests/platform-integ-apps-manifest.yaml was successfully validated.
2019-05-21 07:13:24.178 99992 INFO sysinv.conductor.kube_app [-] Application platform-integ-apps will load charts to chart repo stx-platform
2019-05-21 07:13:27.362 99992 INFO sysinv.conductor.kube_app [-] Generating application overrides...
2019-05-21 07:13:28.054 99992 INFO sysinv.conductor.kube_app [-] Application (platform-integ-apps) upload completed.

# Audit task triggers application apply of platform-integ-apps

2019-05-21 07:14:21.632 99992 INFO sysinv.conductor.manager [-] Platform managed application platform-integ-apps: Applying...
2019-05-21 07:14:21.634 99992 INFO sysinv.conductor.kube_app [-] Application (platform-integ-apps) apply started.
2019-05-21 07:14:21.861 99992 INFO sysinv.conductor.kube_app [-] Generating application overrides...
2019-05-21 07:14:22.017 99992 INFO sysinv.conductor.kube_app [-] Application overrides generated.

# stx-openstack is uploaded

2019-05-21 07:15:46.083 99992 INFO sysinv.conductor.kube_app [-] Application (stx-openstack) upload started.
2019-05-21 07:16:01.119 99992 INFO sysinv.conductor.kube_app [-] Generating application overrides...
2019-05-21 07:16:10.900 99992 INFO sysinv.conductor.kube_app [-] Application (stx-openstack) upload completed.

# platform-integ-apps docker images are downloaded and the Armada apply starts

2019-05-21 07:16:13.471 99992 INFO sysinv.conductor.kube_app [-] All docker images for application platform-integ-apps were successfully downloaded in 111 seconds
2019-05-21 07:16:13.480 99992 INFO sysinv.conductor.kube_app [-] Armada apply command = /bin/bash -c 'armada apply --debug /manifests/platform-integ-apps-manifest.yaml --values /overrides/helm-toolkit-helm-toolkit.yaml --values /overrides/kube-system-rbd-provisioner.yaml --values /overrides/kube-system-ceph-pools-audit.yaml | tee platform-integ-apps-apply.log'

# stx-openstack apply starts prior to the completion of platform-integ-apps

2019-05-21 07:16:14.305 99992 INFO sysinv.conductor.kube_app [-] Application (stx-openstack) apply started.
2019-05-21 07:16:14.839 99992 INFO sysinv.conductor.kube_app [-] Starting progress monitoring thread for app platform-integ-apps

# The secret, created by the successful apply of the platform-integ-apps is not available (yet) so the copy fails and the stx-openstack apply is aborted

2019-05-21 07:16:15.288 99992 ERROR sysinv.common.kubernetes [req-e227bf77-d019-49ee-a8cf-184efa10e7d5 admin admin] Failed to copy Secret ceph-pool-kube-rbd from Namespace kube-system to Namespace openstack: (404)
2019-05-21 07:16:15.310 99992 ERROR sysinv.conductor.kube_app [-] Application apply aborted!.

# The platform-integ-apps apply aborts complaining about an invalid helm-toolkit-helm-toolkit.yaml. Possibly related to stx-openstack apply changing the contents of this file that was created by the platform-integ-apps apply

2019-05-21 07:16:15.374 52 ERROR armada.cli [-] Caught internal exception: armada.exceptions.override_exceptions.InvalidOverrideFileException: /overrides/helm-toolkit-helm-toolkit.yaml is not a valid override file.
2019-05-21 07:16:15.374 52 ERROR armada.cli PermissionError: [Errno 13] Permission denied: '/overrides/helm-toolkit-helm-toolkit.yaml'
2019-05-21 07:16:15.374 52 ERROR armada.cli armada.exceptions.override_exceptions.InvalidOverrideFileException: /overrides/helm-toolkit-helm-toolkit.yaml is not a valid override file.
2019-05-21 07:16:15.488 99992 ERROR sysinv.conductor.kube_app [-] Received a false positive response from Docker/Armada. Failed to apply application manifest /manifests/platform-integ-apps-manifest.yaml: 2019-05-21 07:16:14.278 52 DEBUG armada.handlers.document [-] Resolving reference /manifests/platform-integ-apps-manifest.yaml. resolve_reference /usr/local/lib/python3.6/dist-packages/armada/handlers/document.py:49[00m
2019-05-21 07:16:15.489 99992 INFO sysinv.conductor.kube_app [-] Exiting progress monitoring thread for app platform-integ-apps
2019-05-21 07:16:15.494 99992 ERROR sysinv.conductor.kube_app [-] Application apply aborted!.