Application-apply failed due to error copying secret ceph-pool-kube-rbd

Bug #1828896 reported by Maria Guadalupe Perez Ibara on 2019-05-13
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Critical
Bob Church

Bug Description

Brief Description
-----------------
Application-apply failed due to error copying secret ceph-pool-kube-rbd

Severity
--------
Critical

Steps to Reproduce
------------------
1. Have a deployment Standar 2+2 or 2+2+2 ready
2. Execute application apply
  $ system application-apply stx-openstack

Expected Behavior
------------------
Application apply should be completed successfully

Actual Behavior
----------------
Application-apply failed

Reproducibility
---------------
100% reproducible.

System Configuration
--------------------
Multi-node system, Dedicated storage BM

Branch/Pull Time/Commit
-----------------------
OS="centos"
SW_VERSION="19.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20190512T233000Z"

JOB="STX_build_master_master"
<email address hidden>"
BUILD_NUMBER="99"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-05-12 23:30:00 +0000"

Timestamp/Logs
--------------
following error is logged on /var/log/sysinv.log

2019-05-13 11:59:08.623 96691 ERROR sysinv.common.kubernetes [req-c06222cc-5e13-4269-86d0-97ecaba9b21d admin admin] Failed to copy Secret ceph-pool-kube-rbd from Namespace kube-system to Namespace opensta
ck: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Date': 'Mon, 13 May 2019 11:59:08 GMT', 'Content-Length': '210', 'Content-Type': 'application/json'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"secrets \"ceph-pool-kube-rbd\" not found","reason":"NotFound","details":{"name":"ceph-pool-kube-rbd","kin
d":"secrets"},"code":404}
2019-05-13 11:59:08.623 96691 ERROR sysinv.conductor.kube_app [-] (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Date': 'Mon, 13 May 2019 11:59:08 GMT', 'Content-Length': '210', 'Content-Type': 'application/json'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"secrets \"ceph-pool-kube-rbd\" not found","reason":"NotFound","details":{"name":"ceph-pool-kube-rbd","kin
d":"secrets"},"code":404}
2019-05-13 11:59:08.623 96691 ERROR sysinv.conductor.kube_app [-] (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Date': 'Mon, 13 May 2019 11:59:08 GMT', 'Content-Length': '210', 'Content-Type': 'application/json'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"secrets \"ceph-pool-kube-rbd\" not found","reason":"NotFound","details":{"name":"ceph-pool-kube-rbd","kin
d":"secrets"},"code":404}

2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app Traceback (most recent call last):
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 1150, in perform_app_apply
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app self._create_storage_provisioner_secrets(app.name)
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 697, in _create_storage_provisioner_secrets
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app pool_secret, common.HELM_NS_STORAGE_PROVISIONER, ns)
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app File "/usr/lib64/python2.7/site-packages/sysinv/common/kubernetes.py", line 133, in kube_copy_secret
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app body = c.read_namespaced_secret(name, src_namespace, export=True)
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app File "/usr/lib/python2.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 19486, in read_namespaced_secret
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app (data) = self.read_namespaced_secret_with_http_info(name, namespace, **kwargs)
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app File "/usr/lib/python2.7/site-packages/kubernetes/client/apis/core_v1_api.py", line 19577, in read_namespaced_secret_with_http_info
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app collection_formats=collection_formats)
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 321, in call_api
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app _return_http_data_only, collection_formats, _preload_content, _request_timeout)
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 155, in __call_api
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app _request_timeout=_request_timeout)
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app File "/usr/lib/python2.7/site-packages/kubernetes/client/api_client.py", line 342, in request
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app headers=headers)
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app File "/usr/lib/python2.7/site-packages/kubernetes/client/rest.py", line 231, in GET
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app query_params=query_params)
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app File "/usr/lib/python2.7/site-packages/kubernetes/client/rest.py", line 222, in request
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app raise ApiException(http_resp=r)
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app ApiException: (404)
2019-05-13 11:59:08.623 96691 TRACE sysinv.conductor.kube_app Reason: Not Found

Test Activity
-------------
Sanity

Erich Cordoba (ericho) on 2019-05-13
summary: - Application-apply failed.
+ Application-apply failed due to error copying secret ceph-pool-kube-rbd
description: updated
Ghada Khalil (gkhalil) on 2019-05-14
tags: added: stx.storage
Changed in starlingx:
importance: Undecided → Critical
Ghada Khalil (gkhalil) wrote :

Suspect this is related to recent rbd-provisioner de-coupling; assigning to Bob to triage

Changed in starlingx:
assignee: nobody → Bob Church (rchurch)
Bob Church (rchurch) wrote :

Looks like the storage init job for the red-provisioner failed. Because of the failure, the secret is not created.

For the 2+2 and the 2+2+2 we are provisioning the OSDs (which loads the crushmap) much later as we need to establish a quorum (2 of 3 monitors). The platform-integ-apps will apply successfully early during provisioning, but the provisioner storage-init job will fail since a quorum and/or the crushmap is not loaded in the cluster.

https://review.opendev.org/#/c/658942/ will ensure that platform-integ-apps will not be applied until the required Ceph cluster dependencies are available.

Bob Church (rchurch) on 2019-05-14
Changed in starlingx:
status: New → In Progress
Bob Church (rchurch) wrote :

The workaround for this issue is to run the following after the system is fully provisioned and the Ceph cluster is operational.

$ system application-remove platform-integ-apps
$ system application-apply platform-integ-apps

After this you can upload and apply the stx-openstack application.

Ghada Khalil (gkhalil) wrote :

Marking as release gating; critical priority as this is causing a red sanity.

tags: added: stx.2.0 stx.sanity

Reviewed: https://review.opendev.org/658942
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=a5a0619ebccea603fc85df39fbc6e190ddae0f93
Submitter: Zuul
Branch: master

commit a5a0619ebccea603fc85df39fbc6e190ddae0f93
Author: Robert Church <email address hidden>
Date: Mon May 13 04:28:02 2019 -0400

    Add application apply prerequisites for platform managed apps

    Add an application-apply dependency for the platform integration
    application which launches the Ceph related charts. This dependency will
    require that a quorum has been established and the crushmap has been
    loaded prior to launching the application.

    This will ensure that the charts have the Ceph connectivity required for
    a successful chart release.

    Change-Id: I56528200d16c68d129bc092e3dcc9af135cff16a
    Story: 2005424
    Task: 30977
    Related-Bug: #1828896
    Signed-off-by: Robert Church <email address hidden>

Ghada Khalil (gkhalil) on 2019-05-15
Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers