timeout deploying openstack-cinder chart if performed system storage-tier-modify operation

Bug #1836239 reported by Wendy Mitchell
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Daniel Badea

Bug Description

Bug description:
  Brief Description
  -----------------
Timeout deploying openstack-cinder chart (stx-openstack apply fails and aborts) if a tier name change performed (system storage-tier-modify)

ceph tier override feature test

  Severity
  --------
  Major

  Steps to Reproduce
  ------------------

  1. clean system install (platform only)
  2. Manually perform stx-openstack apply successfully and configuration completed.

  3. storage tier added here

     system application-list
     system storage-tier-add ceph_cluster goldy
     system storage-tier-list ceph_cluster
     Run the following to confirm the tree structure with goldy
 $ceph osd tree

$ system storage-tier-list ceph_cluster
+--------------------------------------+---------+---------+--------------------------------------+
| uuid | name | status | backend_using |
+--------------------------------------+---------+---------+--------------------------------------+
| 08b8dadc-716c-4f6a-a4b4-131472522240 | goldy | defined | None |
| 2f8b52e0-e044-4e37-af1b-2dbf154687a6 | storage | in-use | 9f775c29-1f1f-4385-8bff-a2c08f3ad409 |

$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-7 0 root goldy-tier
-8 0 chassis group-0-goldy
-9 0 host controller-0-goldy
-1 0.43500 root storage-tier
-2 0.43500 chassis group-0
-3 0.43500 host controller-0
 0 ssd 0.43500 osd.0 up 1.00000 1.00000

  4. Try to update (or amend) the tier name

  eg. system storage-tier-modify ceph_cluster goldy --name gold

bash.log shows the last tier modify operation ~ 16:13
2019-07-11T16:07:44.000 system storage-tier-add ceph_cluster goldy
...
2019-07-11T16:13:40.000 system storage-tier-modify -n gold ceph_cluster goldy

     Check for the new name
       $ceph osd tree
       $system storage-tier-list ceph_cluster

$ system storage-tier-list ceph_cluster
+--------------------------------------+---------+---------+--------------------------------------+
| uuid | name | status | backend_using |
+--------------------------------------+---------+---------+--------------------------------------+
| 08b8dadc-716c-4f6a-a4b4-131472522240 | gold | defined | None |
| 2f8b52e0-e044-4e37-af1b-2dbf154687a6 | storage | in-use | 9f775c29-1f1f-4385-8bff-a2c08f3ad40

$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-7 0 root goldy-tier
-8 0 chassis group-0-goldy
-9 0 host controller-0-goldy
-1 0.43500 root storage-tier
-2 0.43500 chassis group-0
-3 0.43500 host controller-0
 0 ssd 0.43500 osd.0 up 1.00000 1.00000

Note: The ceph osd tree remains with the original name 'goldy'

5. Try to complete the ceph tier configuration

Host lock
2019-07-11T16:47:37.000 controller-0 ...system host-lock controller-0

Backend add
2019-07-11T16:48:35.000 controller-0 ... system host-disk-list controller-0
2019-07-11T16:49:12.000 controller-0 ...system storage-backend-add -n gold-store -t 2b7b9540-4198-42a4-a531-62312bbb1ea0 ceph

host-stor-add
2019-07-11T16:50:39.000 controller-0 ... system host-stor-add --tier-uuid 08b8dadc-716c-4f6a-a4b4-131472522240 controller-0 2b7b9540-4198-42a4-a531-62312bbb1ea0
2019-07-11T16:51:03.000 controller-0 ... system host-stor-list controller-0

6. Unlock controller
2019-07-11T16:51:31.000 controller-0... system host-unlock controller-0

$system application-list
...
| stx-openstack | 1.0-17-centos-stable-versioned | armada-manifest | stx-openstack.yaml | apply-failed | operation aborted, check logs for detail

(Note: stx-openstack apply has succeeded in previous install attempts where the tier name was not modifie)

Expected Behavior
  ------------------

    Either update osd tree (where the tier is in 'defined' status) if supported or block and provide feedback if it is not supported
    (ie. no impact to the stx-openstack apply operation)

  Actual Behavior

  ----------------
In step 4, the system storage-tier-list shows the name changed to gold. The ceph osd tree remains with the original name 'goldy'
Application apply times out and aborts

2019-07-11 17:22:09.804 112674 ERROR sysinv.conductor.kube_app [-] Failed to apply application manifest /manifests/stx-openstack/1.0-17-centos-stable-versioned/stx-openstack-stx-openstack.yaml. See /var/log/armada/stx-openstack-apply.log for details.
2019-07-11 17:22:09.805 112674 INFO sysinv.conductor.kube_app [-] Exiting progress monitoring thread for app stx-openstack
2019-07-11 17:22:09.822 112674 ERROR sysinv.conductor.kube_app [-] Application apply aborted!.
2019-07-11 16:52:09.022 112674 INFO sysinv.conductor.kube_app [-] processing chart: osh-openstack-cinder, overall completion: 38.0%

In step 6, unlocking the controller, the armada log report Exception deploying openstack-cinder chart

2019-07-11 16:52:08.838 7505 DEBUG armada.handlers.chart_deploy [-] [chart=openstack-cinder]: {'dictionary_item_added': {<root['values']['conf']['backends']['gold-store'] t1:Not Present, t2:{'rbd_ceph_c...}>, <root['values']['conf']['cinder']['DEFAULT']['default_volume_type'] t1:Not Present, t2:'ceph-store'>, <root['values']['conf']['ceph']['pools']['cinder-volumes-gold'] t1:Not Present, t2:{'app_name':...}>}, 'values_changed': {<root['values']['conf']['cinder']['DEFAULT']['enabled_backends'] t1:'ceph-store', t2:'ceph-store,...'>}} execute /usr/local/lib/python3.6/dist-packages/armada/handlers/chart_deploy.py:136
2019-07-11 16:52:08.838 7505 DEBUG armada.handlers.chart_deploy [-] [chart=openstack-cinder]: {'dictionary_item_added': {<root['values']['conf']['backends']['gold-store'] t1:Not Present, t2:{'rbd_ceph_c...}>, <root['values']['conf']['cinder']['DEFAULT']['default_volume_type'] t1:Not Present, t2:'ceph-store'>, <root['values']['conf']['ceph']['pools']['cinder-volumes-gold'] t1:Not Present, t2:{'app_name':...}>}, 'values_changed': {<root['values']['conf']['cinder']['DEFAULT']['enabled_backends'] t1:'ceph-store', t2:'ceph-store,...'>}} execute /usr/local/lib/python3.6/dist-packages/armada/handlers/chart_deploy.py:136
....

2019-07-11 17:22:08.882 7505 ERROR armada.handlers.armada 
2019-07-11 17:22:08.887 7505 ERROR armada.handlers.armada [-] Chart deploy(s) failed: ['openstack-cinder']
2019-07-11 17:22:09.624 7505 INFO armada.handlers.lock [-] Releasing lock
2019-07-11 17:22:09.634 7505 ERROR armada.cli [-] Caught internal exception: armada.exceptions.armada_exceptions.ChartDeployException: Exception deploying charts: ['openstack-cinder']
2019-07-11 17:22:09.634 7505 ERROR armada.cli Traceback (most recent call last):
2019-07-11 17:22:09.634 7505 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/cli/__init__.py", line 39, in safe_invoke
2019-07-11 17:22:09.634 7505 ERROR armada.cli self.invoke()
2019-07-11 17:22:09.634 7505 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/cli/apply.py", line 209, in invoke
2019-07-11 17:22:09.634 7505 ERROR armada.cli resp = self.handle(documents, tiller)
2019-07-11 17:22:09.634 7505 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/handlers/lock.py", line 79, in func_wrapper
2019-07-11 17:22:09.634 7505 ERROR armada.cli return future.result()
2019-07-11 17:22:09.634 7505 ERROR armada.cli File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
2019-07-11 17:22:09.634 7505 ERROR armada.cli return self.__get_result()
2019-07-11 17:22:09.634 7505 ERROR armada.cli File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
2019-07-11 17:22:09.634 7505 ERROR armada.cli raise self._exception
2019-07-11 17:22:09.634 7505 ERROR armada.cli File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
2019-07-11 17:22:09.634 7505 ERROR armada.cli result = self.fn(*self.args, **self.kwargs)
2019-07-11 17:22:09.634 7505 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/cli/apply.py", line 252, in handle
2019-07-11 17:22:09.634 7505 ERROR armada.cli return armada.sync()
2019-07-11 17:22:09.634 7505 ERROR armada.cli File "/usr/local/lib/python3.6/dist-packages/armada/handlers/armada.py", line 250, in sync
2019-07-11 17:22:09.634 7505 ERROR armada.cli raise armada_exceptions.ChartDeployException(failures)
2019-07-11 17:22:09.634 7505 ERROR armada.cli armada.exceptions.armada_exceptions.ChartDeployException: Exception deploying charts: ['openstack-cinder']
2019-07-11 17:22:09.634 7505 ERROR armada.cli

  Reproducibility
  ---------------
  yes

  System Configuration
  --------------------
  tried on simplex

  Branch/Pull Time/Commit
  -----------------------
20190703T013000Z

  Timestamp/Logs
  --------------
see inline

  Last Pass
  ---------
  new feature

  Test Activity
  -------------
  [Feature Testing]

Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :
Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :
Numan Waheed (nwaheed)
tags: added: stx.retestneeded
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as stx.2.0 - it appears that overrides to storage tiers result in application apply failure

summary: timeout deploying openstack-cinder chart if performed system storage-
- tier-modify operation (ceph tier override feature test)
+ tier-modify operation
tags: added: stx.2.0 stx.containers
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Daniel Badea (daniel.badea)
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/672489

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/672489
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=7588f6d950bbd16d5bb2593dce07841589fb8ba5
Submitter: Zuul
Branch: master

commit 7588f6d950bbd16d5bb2593dce07841589fb8ba5
Author: Daniel Badea <email address hidden>
Date: Wed Jul 24 09:33:42 2019 +0000

    Ceph: update crush map when storage tier is renamed

    Renaming storage tiers is supported as long as they are not
    in use (no OSDs attached). Tier name is updated correctly
    in the inventory database but the crush map is still using
    the old name. This causes issues when trying to use the renamed
    storage tier.

    Add function to rename crush map buckets referenced by the storage
    tier being renamed. Note that changes to the crush map are
    incremental. Any exception raised while renaming buckets is
    in progress causes the crush map to be rolled back.

    Change-Id: Ie8f5162e61d291eed29f2f663ec596cd4402e5d9
    Closes-Bug: 1836239
    Signed-off-by: Daniel Badea <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :

2019-08-07_20-59-00 verified

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.