Failed to upload metrics-server app with kubernetes version 1.20.9

Bug #1948327 reported by Chris Friesen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Chris Friesen

Bug Description

Brief Description

After fresh install of WRCP with kubernetes version 1.20.9, metrics-server app 1.0-4 fails to get uploaded.

Severity

Critical

Steps to Reproduce

    Install WRCP_Dev_Build any configuration with k8s version 1.20.9
    Login to the system and run source /etc/platform/openrc
    Change your work directory: cd /usr/local/share/applications/helm
    Upload the application-metrics server: system application-upload metrics-server-1.0-4.tgz

Expected Behavior

metrics-server application should get uploaded successfully.

Actual Behavior

metrics-server application fails to get uploaded with error:

"Upload of server_manifest.yaml application metrics-server (1.0-4) failed:
Failed to validate application manifest."

Reproducibility

Reproducible

System Configuration

Any

Branch/Pull Time/Commit

-

Last Pass

Works well with k8s 1.19.

Timestamp/Logs

sysinv 2021-10-19 01:42:44.386 117875 INFO sysinv.api.controllers.v1.kube_app [-] Tar file of application metrics-server verified.
sysinv 2021-10-19 01:42:44.401 116262 INFO sysinv.conductor.kube_app [-] Application metrics-server (1.0-4) upload started.
sysinv 2021-10-19 01:42:44.466 116262 INFO sysinv.conductor.kube_app [-] PluginHelper: metrics-server does not contains any platform plugins.
sysinv 2021-10-19 01:42:45.645 116262 INFO sysinv.conductor.kube_app [-] Copy /opt/platform/armada/21.12/metrics-server to armada-api-556879b56f-2pfsc:/tmp/manifests .
sysinv 2021-10-19 01:42:45.869 116262 INFO sysinv.conductor.kube_app [-] Copy /opt/platform/helm/21.12/metrics-server to armada-api-556879b56f-2pfsc:/tmp/overrides .
sysinv 2021-10-19 01:42:45.941 116262 ERROR sysinv.conductor.kube_app [-] Failed to copy /opt/platform/helm/21.12/metrics-server to armada-api-556879b56f-2pfsc:/tmp/overrides, error: Unexpected error while running command.
Command: kubectl --kubeconfig /etc/kubernetes/admin.conf cp -n armada /opt/platform/helm/21.12/metrics-server armada-api-556879b56f-2pfsc:/tmp/overrides --container armada-api
Exit code: 1
Stdout: ''
Stderr: "error: /opt/platform/helm/21.12/metrics-server doesn't exist in local filesystem\n"
sysinv 2021-10-19 01:42:45.947 116262 WARNING sysinv.common.kubernetes [-] Failed to delete custom object, Namespace kube-system: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"locks.armada.process \"locks.armada.process.lock\" not found","reason":"NotFound","details":{"name":"locks.armada.process.lock","group":"armada.process","kind":"locks"},"code":404} : ApiException: (404)
sysinv 2021-10-19 01:42:45.948 116262 ERROR sysinv.conductor.kube_app [-] Armada request validate for manifest /manifests/metrics-server/1.0-4/metrics-server-metrics-server_manifest.yaml failed: could not access armada pod : RuntimeError: could not access armada pod
sysinv 2021-10-19 01:42:45.949 116262 ERROR sysinv.conductor.kube_app [-] Upload of application metrics-server (1.0-4) failed: Failed to validate application manifest.: KubeAppUploadFailure: Upload of application metrics-server (1.0-4) failed: Failed to validate application manifest.
2021-10-19 01:42:45.949 116262 ERROR sysinv.conductor.kube_app Traceback (most recent call last):
2021-10-19 01:42:45.949 116262 ERROR sysinv.conductor.kube_app File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 1812, in perform_app_upload
2021-10-19 01:42:45.949 116262 ERROR sysinv.conductor.kube_app reason="Failed to validate application manifest.")
2021-10-19 01:42:45.949 116262 ERROR sysinv.conductor.kube_app KubeAppUploadFailure: Upload of application metrics-server (1.0-4) failed: Failed to validate application manifest.
2021-10-19 01:42:45.949 116262 ERROR sysinv.conductor.kube_app
sysinv 2021-10-19 01:42:45.999 116262 ERROR sysinv.conductor.kube_app [-] Application upload aborted!.: KubeAppUploadFailure: Upload of application metrics-server (1.0-4) failed: Failed to validate application manifest.
sysinv 2021-10-19 01:42:46.000 116262 ERROR sysinv.openstack.common.rpc.amqp [-] Exception during message handling: KubeAppUploadFailure: Upload of application metrics-server (1.0-4) failed: Failed to validate application manifest.
2021-10-19 01:42:46.000 116262 ERROR sysinv.openstack.common.rpc.amqp Traceback (most recent call last):
2021-10-19 01:42:46.000 116262 ERROR sysinv.openstack.common.rpc.amqp File "/usr/lib64/python2.7/site-packages/sysinv/openstack/common/rpc/amqp.py", line 437, in _process_data
2021-10-19 01:42:46.000 116262 ERROR sysinv.openstack.common.rpc.amqp **args)
2021-10-19 01:42:46.000 116262 ERROR sysinv.openstack.common.rpc.amqp File "/usr/lib64/python2.7/site-packages/sysinv/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2021-10-19 01:42:46.000 116262 ERROR sysinv.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2021-10-19 01:42:46.000 116262 ERROR sysinv.openstack.common.rpc.amqp File "/usr/lib64/python2.7/site-packages/sysinv/conductor/manager.py", line 13100, in perform_app_upload
2021-10-19 01:42:46.000 116262 ERROR sysinv.openstack.common.rpc.amqp self._app.perform_app_upload(rpc_app, tarfile, lifecycle_hook_info_app_upload, images)
2021-10-19 01:42:46.000 116262 ERROR sysinv.openstack.common.rpc.amqp File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 1812, in perform_app_upload
2021-10-19 01:42:46.000 116262 ERROR sysinv.openstack.common.rpc.amqp reason="Failed to validate application manifest.")
2021-10-19 01:42:46.000 116262 ERROR sysinv.openstack.common.rpc.amqp KubeAppUploadFailure: Upload of application metrics-server (1.0-4) failed: Failed to validate application manifest.
2021-10-19 01:42:46.000 116262 ERROR sysinv.openstack.common.rpc.amqp

Alarms

-

Test Activity

Developer Testing

Workaround

Describe workaround if available

Revision history for this message
Chris Friesen (cbf123) wrote :

I suspect the problem dates back to http://bitbucket.wrs.com/projects/CGCS/repos/opendev.org.starlingx.config/commits/f53c96f7dfdc3787fa176f90e94dc48dca7f1db5 (Add support for Helm v3 and containerized armada)

As a workaround I tried adding the following code in copy_manifests_and_overrides_to_armada() and it allowed me to upload and apply the application:

if not os.path.exists(src_dir):
    LOG.info("%s doesn't exist, skipping" % src_dir)
    continue

I'm not sure if this is the correct fix or if we should ensure that the " /opt/platform/helm/21.12/<application>" directory gets created earlier.

According to https://airshipit.readthedocs.io/projects/armada/en/latest/commands/validate.html the "validate" command only takes the manifest file, so we don't need the overrides to be available yet when validating it.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/815053

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/815053
Committed: https://opendev.org/starlingx/config/commit/98e15076753dafacf4aecf846b9943655a742158
Submitter: "Zuul (22348)"
Branch: master

commit 98e15076753dafacf4aecf846b9943655a742158
Author: Chris Friesen <email address hidden>
Date: Thu Oct 21 13:16:17 2021 -0600

    allow application upload for applications with no plugins

    Starting with Kubernetes 1.20, the "kubectl cp" command will actually
    error out if the item being copied doesn't exist. Prior to this it
    failed silently.

    It turns out that our application upload code was relying on the silent
    failure.

    In order to make it work for applications without plugins we need to
    make it explicit in the code that it's not a fatal error if there are
    no overrides.

    Change-Id: Ifb70907c84b26bf6c2e19a72a60110a20bcb399b
    Closes-Bug: 1948327
    Signed-off-by: Chris Friesen <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
tags: added: stx.containers
Changed in starlingx:
importance: Undecided → Medium
assignee: nobody → Chris Friesen (cbf123)
tags: added: stx.6.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.