Platform-integ-apps cannot be re-applied if ceph-monitor is changed

Bug #1843569 reported by Stefan Dinescu
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Daniel Badea

Bug Description

Title
-----
Platform-integ-apps cannot be re-applied if ceph monitor is changed

Brief Description
-----------------
In case a user deletes the compute node on which the 3rd ceph monitor is configured, the monitor configuration for that node is also deleted.
The user can then assign the ceph-mon function to another compute node. Once this is done, if the user tries to re-apply platform integ-apps, the re-apply fails.

Severity
--------
Major

Steps to Reproduce
------------------
1. Install a standard 2+2 lab with controller storage
2. The 3rd ceph-monitor is configured on compute-0 (during installation process)
     system ceph-mon-add compute-0
3. delete host with the 3rd ceph monitor (this also rdeletes the configured ceph-mon)
    system host-delete compute-0
4. lock and configure a new ceph-mon on copute-1
    system host-lock compute-1
 system ceph-mon-add compute-1
5. unlock compute-1
6. Try to manually re-apply platform-integ-apps
     system application-apply platform-integ-apps

Expected Behavior
------------------
The apply should be successful

Actual Behavior
----------------
The application is in an apply-failed state:
[sysadmin@controller-0 ~(keystone_admin)]$ system application-list
+---------------------+--------------------------------+-------------------------------+--------------------+--------------+------------------------------------------+
| application | version | manifest name | manifest file | status | progress |
+---------------------+--------------------------------+-------------------------------+--------------------+--------------+------------------------------------------+
| platform-integ-apps | 1.0-7 | platform-integration-manifest | manifest.yaml | apply-failed | operation aborted, check logs for detail |
| stx-openstack | 1.0-18-centos-stable-versioned | armada-manifest | stx-openstack.yaml | applied | completed |
+---------------------+--------------------------------+-------------------------------+--------------------+--------------+------------------------------------------+

Reproducibility
---------------
Reproducible

System Configuration
--------------------
2 + 2 system (kubernetes)

Branch/Pull Time/Commit
-----------------------
###
### StarlingX
### Built from master
###

OS="centos"
SW_VERSION="19.09"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20190911T013000Z"

JOB="STX_build_master_master"
<email address hidden>"
BUILD_NUMBER="245"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2019-09-11 01:30:00 +0000"

Timestamp/Logs
--------------
Sysinv logs show failure at the following time-stamp:
2019-09-11 11:14:38.555 99570 INFO sysinv.conductor.kube_app [-] Armada apply command = /bin/bash -c 'set -o pipefail; armada apply --enable-chart-cleanup --debug /manifests/platform-integ-apps/1.0-7/platform-integ-apps-manifest.yaml --values /overrides/platform-integ-apps/1.0-7/kube-system-rbd-provisioner.yaml --values /overrides/platform-integ-apps/1.0-7/kube-system-ceph-pools-audit.yaml --values /overrides/platform-integ-apps/1.0-7/helm-toolkit-helm-toolkit.yaml --tiller-host tiller-deploy.kube-system.svc.cluster.local | tee /logs/platform-integ-apps-apply.log'
2019-09-11 11:14:39.552 99570 INFO sysinv.conductor.kube_app [-] Starting progress monitoring thread for app platform-integ-apps
2019-09-11 11:14:40.946 99570 INFO sysinv.conductor.kube_app [-] processing chart: stx-rbd-provisioner, overall completion: 50.0%
2019-09-11 11:14:46.854 100977 INFO sysinv.api.controllers.v1.host [-] controller-0 ihost_patch_start_2019-09-11-11-14-46 patch
2019-09-11 11:14:46.855 100977 INFO sysinv.api.controllers.v1.host [-] controller-0 ihost_patch_end. No changes from mtce/1.0.
2019-09-11 11:14:56.091 100978 INFO sysinv.api.controllers.v1.host [-] Provisioned storage node(s) []
2019-09-11 11:14:56.229 100980 INFO sysinv.api.controllers.v1.host [-] Provisioned storage node(s) []
2019-09-11 11:15:02.777 99570 ERROR sysinv.conductor.kube_app [-] Failed to apply application manifest /manifests/platform-integ-apps/1.0-7/platform-integ-apps-manifest.yaml. See /var/log/armada/platform-integ-apps-apply.log for details.
2019-09-11 11:15:02.778 99570 INFO sysinv.conductor.kube_app [-] Exiting progress monitoring thread for app platform-integ-apps
2019-09-11 11:15:02.968 99570 ERROR sysinv.conductor.kube_app [-] Application apply aborted!.
2019-09-11 11:15:02.968 99570 INFO sysinv.conductor.kube_app [-] Deregister the abort status of app platform-integ-apps

Revision history for this message
Stefan Dinescu (stefandinescu) wrote :
  • logs Edit (64.9 MiB, application/x-tar)
Ghada Khalil (gkhalil)
tags: removed: stx.2.0
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.3.0 / medium priority -- this wouldn't be a common use-case

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Daniel Badea (daniel.badea)
tags: added: stx.3.0
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/683149

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/683149
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=b5322a892e963b483eb6ac320ec451dc9f37ad2f
Submitter: Zuul
Branch: master

commit b5322a892e963b483eb6ac320ec451dc9f37ad2f
Author: Daniel Badea <email address hidden>
Date: Thu Sep 19 12:52:28 2019 +0000

    rbd-provisioner storage class exclude 3rd monitor

    rbd-provisioner's storage class is referencing all configured
    Ceph monitors. When a compute node is deleted and another one is
    configured to run the 3rd ceph-mon then the storage class
    definition is updated as expected in the overrides but then
    platform-integ-apps fails to re-apply because storage class is
    immutable (you would need to remove the app first then apply it)

    To avoid this issue exclude 3rd monitor from rbd-provisioner's
    storage class when generating the overrides.

    Change-Id: I546dfc255c5ec362169d23f1804e70b805b2a316
    Closes-bug: 1843569
    Signed-off-by: Daniel Badea <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.