Application upload failed oidc-auth and platform-integ-apps

Bug #1877582 reported by Nimalini Rasa
36
This bug affects 2 people
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Dan Voiculeasa

Bug Description

Brief Description
-----------------
Application upload for oidc-auth-app and platform-integ-apps failed after fresh install.

Severity
--------
Major

Steps to Reproduce
------------------
After fresh install, upload oidc-auth-app and platform-integ-apps

Expected Behavior
------------------
Application upload to work

Actual Behavior
----------------
Application upload failed.

Reproducibility
---------------
Seen once

System Configuration
--------------------
one node system

Branch/Pull Time/Commit
-----------------------
2020-05-07

Last Pass
---------
2020-05-06

Timestamp/Logs
--------------
2020-05-08 11:59:29.061
sysinv 2020-05-08 11:59:29.061 99806 ERROR sysinv.conductor.kube_app [-] Upload of application platform-integ-apps (1.0-8) failed: Failed to validate application manifest.: KubeAppUploadFailure: Upload of application platform-integ-apps (1.0-8) failed: Failed to validate application manifest.
2020-05-08 11:59:29.061 99806 ERROR sysinv.conductor.kube_app Traceback (most recent call last):
2020-05-08 11:59:29.061 99806 ERROR sysinv.conductor.kube_app File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 1928, in perform_app_upload
2020-05-08 11:59:29.061 99806 ERROR sysinv.conductor.kube_app reason="Failed to validate application manifest.")
2020-05-08 11:59:29.061 99806 ERROR sysinv.conductor.kube_app KubeAppUploadFailure: Upload of application platform-integ-apps (1.0-8) failed: Failed to validate application manifest.
2020-05-08 11:59:29.061 99806 ERROR sysinv.conductor.kube_app
sysinv 2020-05-08 11:59:29.223 99806 INFO sysinv.api.controllers.v1.rest_api [-] Response={u'pd': {}}
sysinv 2020-05-08 11:59:29.224 99806 INFO sysinv.conductor.manager [-] Platform managed application oidc-auth-apps: Creating...
sysinv 2020-05-08 11:59:29.226 99806 ERROR sysinv.conductor.kube_app [-] Application upload aborted!.: KubeAppUploadFailure: Upload of application platform-integ-apps (1.0-8) failed: Failed to validate application manifest.
sysinv 2020-05-08 11:59:29.277 99806 INFO sysinv.common.utils [-] Checksum file is included and validated.
sysinv 2020-05-08 11:59:29.279 99806 INFO sysinv.api.controllers.v1.kube_app [-] No patch required for application oidc-auth-apps (1.0-0).
sysinv 2020-05-08 11:59:29.349 99806 INFO sysinv.conductor.manager [-] Platform managed application oidc-auth-apps: Uploading...
sysinv 2020-05-08 11:59:29.720 99806 INFO sysinv.conductor.kube_app [-] Application oidc-auth-apps (1.0-0) upload started.
sysinv 2020-05-08 11:59:29.776 99806 INFO sysinv.conductor.manager [-] Setting config target of host 'controller-0' to '0972a620-0ea7-48ff-8287-f40d2db54298'.
sysinv 2020-05-08 11:59:29.803 99806 WARNING sysinv.conductor.manager [-] controller-0: iconfig out of date: target 0972a620-0ea7-48ff-8287-f40d2db54298, applied c3b3baa2-ba7c-4fb4-8088-456d2f3ba5fc
sysinv 2020-05-08 11:59:29.804 99806 WARNING sysinv.conductor.manager [-] SYS_I Raise system config alarm: host controller-0 config applied: c3b3baa2-ba7c-4fb4-8088-456d2f3ba5fc vs. target: 0972a620-0ea7-48ff-8287-f40d2db54298.
sysinv 2020-05-08 11:59:29.812 99806 INFO sysinv.conductor.kube_app [-] Restarting Armada service...
sysinv 2020-05-08 11:59:29.826 99806 INFO sysinv.conductor.manager [-] _config_update_hosts config_uuid=0972a620-0ea7-48ff-8287-f40d2db54298
sysinv 2020-05-08 11:59:29.827 99806 INFO sysinv.conductor.manager [-] applying runtime manifest config_uuid=0972a620-0ea7-48ff-8287-f40d2db54298, classes: ['openstack::keystone::endpoint::runtime', 'platform::firewall::runtime']
sysinv 2020-05-08 11:59:29.842 99806 INFO sysinv.puppet.puppet [-] Updating hiera for host: controller-0 with config_uuid: 0972a620-0ea7-48ff-8287-f40d2db54298
sysinv 2020-05-08 11:59:29.954 99806 INFO sysinv.conductor.kube_app [-] Starting Armada service...
sysinv 2020-05-08 11:59:29.954 99806 INFO sysinv.conductor.kube_app [-] kube_config=/opt/platform/armada/20.04/admin.conf, manifests_dir=/opt/platform/armada/20.04, overrides_dir=/opt/platform/helm/20.04, logs_dir=/var/log/armada.
sysinv 2020-05-08 11:59:29.975 99806 ERROR sysinv.conductor.kube_app [-] Upload of application oidc-auth-apps (1.0-0) failed: Failed to validate application manifest.: KubeAppUploadFailure: Upload of application oidc-auth-apps (1.0-0) failed: Failed to validate application manifest.
2020-05-08 11:59:29.975 99806 ERROR sysinv.conductor.kube_app Traceback (most recent call last):
2020-05-08 11:59:29.975 99806 ERROR sysinv.conductor.kube_app File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 1928, in perform_app_upload
2020-05-08 11:59:29.975 99806 ERROR sysinv.conductor.kube_app reason="Failed to validate application manifest.")
2020-05-08 11:59:29.975 99806 ERROR sysinv.conductor.kube_app KubeAppUploadFailure: Upload of application oidc-auth-apps (1.0-0) failed: Failed to validate application manifest.
2020-05-08 11:59:29.975 99806 ERROR sysinv.conductor.kube_app
sysinv 2020-05-08 11:59:30.153 99806 ERROR sysinv.conductor.kube_app [-] Application upload aborted!.: KubeAppUploadFailure: Upload of application oidc-auth-apps (1.0-0) failed: Failed to validate application manifest.

Test Activity
-------------
Regression Testing

Revision history for this message
Nimalini Rasa (nrasa) wrote :
Revision history for this message
Bob Church (rchurch) wrote :
Download full text (9.0 KiB)

System was unlocked. Platform managed application try to upload. The armada container attempts to start but fails. It is in an exited status but appears to be running after the reboot and we have a port conflict: bind: address already in use: unknown"
The workaround for this is to run the following, then the apps will auto-upload/auto-apply
- docker rm 4435db3ceb9f
- system application-delete platform-integ-apps; system application-delete oidc-auth-apps

$ sudo docker ps -a
Password:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4435db3ceb9f registry.local:9001/quay.io/airshipit/armada:8a1638098f88d92bf799ef4934abe569789b885e-ubuntu_bionic "./entrypoint.sh ser…" 3 hours ago Exited (128) 3 hours ago armada_service

root 88294 1 0 11:55 ? 00:00:00 containerd-shim -namespace moby -workdir /var/lib/docker/io.containerd.runtime.v1.linux/moby/4435db3ceb9f5e72676dfb60ca6285751a69f5306dad457a64c6c23a412eac64 -address /var/run/containerd/containerd.sock -containerd-binary /usr/bin/containerd -runtime-root /var/run/docker/runtime-runc

2020-05-08T11:49:20.231 + Host Info +--------------------------------------+
2020-05-08T11:49:20.231 | action : unlock
2020-05-08T11:49:20.231 | personality: controller
2020-05-08T11:49:20.231 | hostname : controller-0
2020-05-08T11:49:20.231 | task : none
2020-05-08T11:49:20.231 | info : none
2020-05-08T11:49:20.231 | ip : fd01:14::3
2020-05-08T11:49:20.231 | mac : 48:df:37:22:c5:f0
2020-05-08T11:49:20.231 | uuid : 25fed003-05ff-44c6-ac1e-88d67a8cc808
2020-05-08T11:49:20.232 | adminState: locked
2020-05-08T11:49:20.232 | operState: disabled
2020-05-08T11:49:20.232 | availStatus: online
2020-05-08T11:49:20.232 | bm ip : none
2020-05-08T11:49:20.232 | bm un : none
2020-05-08T11:49:20.232 | bm type : none
2020-05-08T11:49:20.232 | subFunction: controller,worker
2020-05-08T11:49:20.232 | operState: disabled
2020-05-08T11:49:20.232 | availStatus: online
2020-05-08T11:49:20.232 +------------+--------------------------------------+

2020-05-08T11:49:54.404 subcloud7 containerd[116275]: info time="2020-05-08T11:49:54.404297821Z" level=info msg="shim reaped" id=4435db3ceb9f5e72676dfb60ca6285751a69f5306dad457a64c6c23a412eac64
2020-05-08T11:49:54.433 subcloud7 systemd[1]: info Unmounted /var/lib/docker/containers/4435db3ceb9f5e72676dfb60ca6285751a69f5306dad457a64c6c23a412eac64/mounts/shm.

reboot system boot 3.10.0-1127.el7. Fri May 8 11:52 - 14:44 (02:51)

2020-05-08T11:55:55.709 controller-0 containerd[1962]: info time="2020-05-08T11:55:55.709515678Z" level=info msg="shim reaped" id=4435db3ceb9f5e72676dfb60ca6285751a69f5306dad457a64c6c23a412eac64
2020-05-08T11:55:56.452 controller-0 dockerd[88042]: info time="2020-05-08T11:55:56.451978430Z" level=error msg="4435db3ceb9f5e72676dfb60ca6285751a69f5306dad457a64c6c23a412eac64 cleanup: failed to delete container from containerd: no such container"
2020-05-08T11:55:56.452 controller-0...

Read more...

Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / medium - intermittent issue. Needs further investigation.

tags: added: stx.4.0 stx.containers
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Bob Church (rchurch)
Revision history for this message
Peng Peng (ppeng) wrote :

Issue was reproduced on
Lab: WCP_122
Load: 2020-06-03_20-00-00

log added at
https://files.starlingx.kube.cengn.ca/launchpad/1877582

Revision history for this message
Ghada Khalil (gkhalil) wrote :

There is a new occurrence of this reported in: https://bugs.launchpad.net/starlingx/+bug/1882546

Revision history for this message
Frank Miller (sensfan22) wrote :

Expectation is this issue will no longer be an issue once helm v3 commits merge.

Revision history for this message
Frank Miller (sensfan22) wrote :

Looking at this issue, a sysinv fix would be a good idea even if helm v3 is delivered in case this issue does occur. Assigning to Dan to implement and deliver a fix.

Changed in starlingx:
assignee: Bob Church (rchurch) → Dan Voiculeasa (dvoicule)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/735374

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/735374
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=d2e8a8f9091e03e93261fc654c7a95195cc30367
Submitter: Zuul
Branch: master

commit d2e8a8f9091e03e93261fc654c7a95195cc30367
Author: Dan Voiculeasa <email address hidden>
Date: Fri Jun 12 18:51:57 2020 +0300

    Remove armada container before sysinv start

    Recover from scenarios where armada container after a reboot is in a
    state that upon being started it will exit abnormally.

    Delete armada container before sysinv-conductor start/restart.
    Armada container will be created when needed.

    Closes-Bug: 1877582
    Change-Id: Ic410fa0fecd0bbc7365f2dde1ddcb08d6251cab9
    Co-authored-by: Robert Church <email address hidden>
    Co-authored-by: Dan Voiculeasa <email address hidden>
    Signed-off-by: Dan Voiculeasa <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Nimalini Rasa (nrasa) wrote :

Verified with 2020-06-26_01-17-51

Revision history for this message
Difu Hu (difuhu) wrote :

Similar issue reproduced on 2020-08-27_00-00-00.

sysinv 2020-08-27 07:40:02.047 98719 INFO sysinv.conductor.manager [-] Platform managed application oidc-auth-apps: Uploading...
sysinv 2020-08-27 07:40:02.049 98719 INFO sysinv.conductor.kube_app [-] Application oidc-auth-apps (1.0-27) upload started.
sysinv 2020-08-27 07:40:02.649 98719 INFO sysinv.conductor.kube_app [-] PluginHelper: Creating oidc-auth-apps plugin directory /opt/platform/helm/20.06/oidc-auth-apps/1.0-27/plugins.
sysinv 2020-08-27 07:40:02.650 98719 INFO sysinv.conductor.kube_app [-] PluginHelper: Installing oidc-auth-apps plugin /scratch/apps/oidc-auth-apps/1.0-27/plugins/k8sapp_oidc-1.0-py2.py3-none-any.whl to /opt/platform/helm/20.06/oidc-auth-apps/1.0-27/plugins.
sysinv 2020-08-27 07:40:03.258 98719 ERROR sysinv.conductor.kube_app [-] Armada request validate for manifest /manifests/oidc-auth-apps/1.0-27/oidc-auth-apps-manifest.yaml failed: could not access armada pod : RuntimeError: could not access armada pod
sysinv 2020-08-27 07:40:03.258 98719 ERROR sysinv.conductor.kube_app [-] Upload of application oidc-auth-apps (1.0-27) failed: Failed to validate application manifest.: KubeAppUploadFailure: Upload of application oidc-auth-apps (1.0-27) failed: Failed to validate application manifest.
2020-08-27 07:40:03.258 98719 ERROR sysinv.conductor.kube_app KubeAppUploadFailure: Upload of application oidc-auth-apps (1.0-27) failed: Failed to validate application manifest.
sysinv 2020-08-27 07:40:03.466 98719 ERROR sysinv.conductor.kube_app [-] Application upload aborted!.: KubeAppUploadFailure: Upload of application oidc-auth-apps (1.0-27) failed: Failed to validate application manifest.

Revision history for this message
Difu Hu (difuhu) wrote :

log for 2020-08-27_00-00-00

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.