Unable to provision replication factor 3 on storage system

Bug #1827529 reported by Maria Yousaf
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Tingjie Chen

Bug Description

Brief Description
-----------------
It is not possible to provision replication factor 3 on a storage system. Command is rejected due to storage backend task state.

Severity
--------
Major

Steps to Reproduce
------------------
1. Install both controllers, provision and unlock. Both controllers are available.
2. Run the following:
[wrsroot@controller-0 ~(keystone_admin)]$ system storage-backend-modify ceph-store replication=3 min_replication=2
Can not modify ceph replication factor when storage backend state is 'configured' and task is 'reconfig-controller.' Operation supported for state 'configuring' and task 'provision-storage.'

Current state of the system is as follows:
[wrsroot@controller-0 ~(keystone_admin)]$ system storage-backend-list
+--------------------------------------+-----------------+----------+------------+---------------------+----------+--------------------+
| uuid | name | backend | state | task | services | capabilities |
+--------------------------------------+-----------------+----------+------------+---------------------+----------+--------------------+
| bedc9003-b3b1-4f57-bf29-397eb48eecf5 | ceph-store | ceph | configured | reconfig-controller | None | min_replication: 1 |
| | | | | | | replication: 2 |
| eba9eaed-a9b3-4fe1-a918-13c57e680f6c | shared_services | external | configured | None | glance | |
+--------------------------------------+-----------------+----------+------------+---------------------+----------+-----

I believe the root cause is the state of the backend being incorrect. It should be in provision-storage state.

Expected Behavior
------------------
The user should be able to set replication factor.

Actual Behavior
----------------
Replication factor provisioning is based.

Reproducibility
---------------
Reproducible.

System Configuration
--------------------
Dedicated storage

Branch/Pull Time/Commit
-----------------------
Designer load as of April 25th.

Last Pass
---------
First time testing in StarlingX

Timestamp/Logs
--------------
N/A. Easy to reproduce.

Test Activity
-------------
Regression testing

Numan Waheed (nwaheed)
tags: added: stx.retestneeded
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating; replication 3 is a supported stx storage configuration.

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.2.0 stx.config
Changed in starlingx:
assignee: nobody → Frank Miller (sensfan22)
tags: added: stx.storage
Changed in starlingx:
assignee: Frank Miller (sensfan22) → Tingjie Chen (silverhandy)
status: Triaged → In Progress
Revision history for this message
Maria Yousaf (myousaf) wrote :

This is still a problem in 20190510T013000Z. Output is as follows:

[wrsroot@controller-0 ~(keystone_admin)]$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | controller-1 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+
[wrsroot@controller-0 ~(keystone_admin)]$ system storage-backend-list
+--------------------------------------+-----------------+----------+------------+---------------------+----------+--------------------+
| uuid | name | backend | state | task | services | capabilities |
+--------------------------------------+-----------------+----------+------------+---------------------+----------+--------------------+
| a6dfa404-6968-4529-a1d7-b992cd3258f6 | shared_services | external | configured | None | glance | |
| c37722de-26aa-4317-9adb-fcd62f4200c6 | ceph-store | ceph | configured | reconfig-controller | None | min_replication: 1 |
| | | | | | | replication: 2 |
+--------------------------------------+-----------------+----------+------------+---------------------+----------+--------------------+
[wrsroot@controller-0 ~(keystone_admin)]$ system storage-backend-modify ceph-store replication=3 min_replication=2
Can not modify ceph replication factor when storage backend state is 'configured' and task is 'reconfig-controller.' Operation supported for state 'configuring' and task 'provision-storage.'

Cindy Xie (xxie1)
tags: added: stx.distro.other
Revision history for this message
Fernando Hernandez Gonzalez (fhernan2) wrote :

Found same behavior on BM 2+2+2 dedicated storage

[wrsroot@controller-0 ~(keystone_admin)]$ system storage-backend-list
+--------------------------------------+-----------------+----------+------------+---------------------+----------+--------------------+
| uuid | name | backend | state | task | services | capabilities |
+--------------------------------------+-----------------+----------+------------+---------------------+----------+--------------------+
| 48ab60d1-9733-4ce3-9f64-943249f83226 | shared_services | external | configured | None | glance | |
| 496dbdc3-f906-45ff-9c15-7d8ca8689572 | ceph-store | ceph | configured | reconfig-controller | None | min_replication: 1 |
| | | | | | | replication: 2 |
+--------------------------------------+-----------------+----------+------------+---------------------+----------+--------------------+
[wrsroot@controller-0 ~(keystone_admin)]$ system storage-backend-modify ceph-store replication=3 min_replication=2
Can not modify ceph replication factor when storage backend state is 'configured' and task is 'reconfig-controller.' Operation supported for state 'configuring' and task 'provision-storage.'

Revision history for this message
Ovidiu Poncea (ovidiuponcea) wrote :

SB_TASK_RECONFIG_CONTROLLER is no longer needed as a state. It should be removed. It was needed at some point in time but no longer.

Then at some point in time, modifying ceph parameters lead to application of runtime puppet manifests. In order for that to succeed both controllers had to be available and configuration fully applied (i.e. not Config-out-of-date). This was equivalent with stroage backend being in 'provsion-storage'... this is no longer the case.

Let's go one step back. Replication changes are allowed in two configurations:
1. AIO-SX: can go from 1 to 2 to 3 and back w/o any restrictions.
2. On a storage setup (i.e. one that has storage nodes) users can go from 2 to 3 if there are less than 2 storage nodes deployed. There is no way go back once enabled (To recheck this! In theory we should be able to go back if there are less than two storage nodes installed).

There is no way to increase replication number on DX or standard (a.k.a. 2+2 w/o storage nodes).

Now going back to our issue; the following semantic checks make more sense:
1. Storage backend task in 'configured' state
2. No more than two storage nodes provisioned, once 3rd storage node is provisioned users should no longer be allowed to increase replication number, nor to go back (the reason is that with replication 2 the 3rd node is part of a different replication group and is impossible to go back w/o loosing data or complex operations).
3. Storage model is not CEPH_CONTROLLER_MODEL. This will protect against users trying to modify replication number if 2+2 is configured [Check this funtion: ceph.get_ceph_storage_model()]
4. Do not allow adding the 3rd ceph-monitor if replication is set to 3, this will protect against users who set replication number before selecting the storage model.

Now, regarding storage modes. We support 3 model, selected automatically based on system configuration (i.e. there is no specific settings, but implied from mutliple settings, see get_ceph_storage_model() on the full logic behind it ig you are interested):
1. AIO-SX model: we have a single node, replication is done on OSDs, not on nodes.
2. Controller-model: OSDs can be installed on controllers. A 3rd monitor has to be installed on a worker node to make for the 3 monitor quorum. Once an OSD or a monitor in installed on a worker, user won't be able to install storage nodes.
3. Storage-model: OSDs can be installed on storage nodes. Once storage-0 is added, users can no longer add a monitor to a compute nor to add OSDs to controllers.

Revision history for this message
Fernando Hernandez Gonzalez (fhernan2) wrote :

@Ovidiu, checking your comments, couple of questions:
- <Per Ovidiu> 2. On a storage setup (i.e. one that has storage nodes) users can go from 2 to 3 if there are less than 2 storage nodes deployed.
<FH> Assuming your comment about “having less than 2 storage nodes” I was thinking having following scenario:
Scenario 1) when having 2 storage nodes in a dedicated storage with at least 3 osd.[x]s, meaning we can storage on osd.0 and get replicated on osd.1 and osd.2
[wrsroot@controller-0 ~(keystone_admin)]$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
 -1 2.17743 root storage-tier
 -3 1.30646 chassis group-0
 -4 0.43549 host storage-0
  0 ssd 0.43549 osd.0 up 1.00000 1.00000
  1 ssd 0.43549 osd.1 up 1.00000 1.00000
-5 0.87097 host storage-1
  2 ssd 0.43549 osd.2 up 1.00000 1.00000

***Could you please share if there is any other scenario?

- <Per Ovidiu> 2. No more than two storage nodes provisioned, once 3rd storage node is provisioned users should no longer be allowed to increase replication number, nor to go back (the reason is that with replication 2 the 3rd node is part of a different replication group and is impossible to go back w/o loosing data or complex operations).
<FH> based on your comments, we should be thinking in following prerequisites? could you please confirm?
   Prerequisites to enable factor 3.
o Have 2 storage nodes with at least 3 osds up.
o Remove SB_TASK_RECONFIG_CONTROLLER state since is not longer required. IT THIS MEAN there should be a fix/commit for this change?
o All storage nodes should be on OK status.
o and after that run “$system storage-backend-modify ceph-store replication=3 min_replication=2” command?

@Ovidiu, could you please confirm if adding below test cases make sense for you?
- Add negative test case where replication factor 3 is not allowed on DX and standard (a.k.a. 2+2 w/o storage nodes)
Regarding storage modes, we should be adding following negative test cases
- For AIO-SX model, confirm replication is made on OSDs
- For Controller-model, confirm two osd monitors are on controllers and the 3rd one in a worker and make sure after that user wont be able to install more storage nodes.
- For storage-model, once storage-0 is added user can no longer add monitors to a compute nor OSDs to controllers. Meaning we can do it if we first create osds on controllers and then add storage-0 node

Revision history for this message
Ovidiu Poncea (ovidiuponcea) wrote :
Download full text (4.5 KiB)

Replied via email, see inline [Ovi] tag:

@Ovidiu, checking your comments from LP1827529, couple of questions:

- <Per Ovidiu> 2. On a storage setup (i.e. one that has storage nodes) users can go from 2 to 3 if there are less than 2 storage nodes deployed.

<FH> Assuming your comment about “having less than 2 storage nodes” I was thinking having following scenario:

Scenario 1) when having 2 storage nodes in a dedicated storage with at least 3 osd.[x]s, meaning we can storage on osd.0 and get replicated on osd.1 and osd.2

[wrsroot@controller-0 ~(keystone_admin)]$ ceph osd tree

ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF

 -1 2.17743 root storage-tier

 -3 1.30646 chassis group-0

 -4 0.43549 host storage-0

  0 ssd 0.43549 osd.0 up 1.00000 1.00000

  1 ssd 0.43549 osd.1 up 1.00000 1.00000

-5 0.87097 host storage-1

  2 ssd 0.43549 osd.2 up 1.00000 1.00000

***Could you please share if there is any other scenario?

[Ovi] Note that data on osd.0 does not replicate on osd.1. Replication is done per node, not per OSD. So data from osd.0 and osd.1 will get replicated on osd.2 (data is divided into small chunks called placement gorups - PGs - PGs get replicated not OSDs => there is no corresponding replicated OSD, so you can't say that OSD.x is replicating on OSD.y but you can say that PG1 on osd.0 gets replicated to osd.2, PG2 on osd.0 gets replicated on osd.2 PG3 on osd.1 gets replicated on osd.2 and so on...).

In this case you have replication 2. If you would have replication 3 you will get something like output below and data from osd.0 & osd.1 will be replication on osd.2 and on osd.3:

[wrsroot@controller-0 ~(keystone_admin)]$ ceph osd tree

ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF

 -1 2.17743 root storage-tier

 -3 1.30646 chassis group-0

 -4 0.43549 host storage-0

  0 ssd 0.43549 osd.0 up 1.00000 1.00000

  1 ssd 0.43549 osd.1 up 1.00000 1.00000

-5 0.87097 host storage-1

  2 ssd 0.43549 osd.2 up 1.00000 1.00000

-5 0.87097 host storage-2

  2 ssd 0.43549 osd.3 up 1.00000 1.00000

- <Per Ovidiu> 2. No more than two storage nodes provisioned, once 3rd storage node is provisioned users should no longer be allowed to increase replication number, nor to go back (the reason is that with replication 2 the 3rd node is part of a different replication group and is impossible to go back w/o loosing data or complex operations).

<FH> based on your comments, we should be thinking in following prerequisites? could you please confirm?

   Prerequisites to enable factor 3.

o Have 2 storage nodes with at least 3 osds up [Ovi] at least one OSD per storage node. There is no need for mor...

Read more...

Revision history for this message
Tingjie Chen (silverhandy) wrote :

Good discussion about the scenarios :)
so beside the needed change for SB_TASK_RECONFIG_CONTROLLER, I think maybe add some test cases to validate there has issues and we can re-consider whether to modify the related source code for the replication control strategy.
How do you think? Ovidiu and Fernando?

Changed in starlingx:
status: In Progress → New
status: New → In Progress
Cindy Xie (xxie1)
tags: removed: stx.distro.other
Cindy Xie (xxie1)
Changed in starlingx:
status: In Progress → Triaged
Revision history for this message
Tingjie Chen (silverhandy) wrote :

I have made patch: https://review.opendev.org/#/c/662704/ which expected to resolve the issue and reclaim teh state-machine and answer the comments with Ovidiu in patch review as following:

[Ovidiu] Changes are ok as they deal with #1 in the bug comment:
Now going back to our issue; the following semantic checks make more sense:
1. Storage backend task in 'configured' state
2. No more than two storage nodes provisioned, once 3rd storage node is provisioned users should no longer be allowed to increase replication number, nor to go back (the reason is that with replication 2 the 3rd node is part of a different replication group and is impossible to go back w/o loosing data or complex operations).
3. Storage model is not CEPH_CONTROLLER_MODEL, if it is then replication number changes should be denied. This will protect against users trying to modify replication number if 2+2 is configured [Check this funtion: ceph.get_ceph_storage_model()]
4. Do not allow adding the 3rd ceph-monitor if replication is set to 3, this will protect against users who set replication number before selecting the storage model.

What's your proposal on dealing with 2,3,4? #2 is a short test, but #3 and #4 are most likely missing from code. One option is for Fernando to test these case and raise issues or fix them as part of this commit?

[Tingjie] Yes, #1 in bug comments is fixed to remove SB_TASK_RECONFIG_CONTROLLER.
for the semantic checks lists, in current replication check mechanism, there are:
a. Allow replication modify when AIO-SX with SB_STATE_CONFIGURED state.
b. NOT allow replication modify when AIO-DX and controller-nodes (2 controller + 2 worker) with ceph model.
c. In ceph storage model, 2 controller + 2 worker + 2 storage nodes
    Allow modifications of ceph storage backend parameters after the manifests have been applied and BEFORE first storage node has been configured
    Changing replication factor once the first storage node has been installed (pools created) is NOT supported
    NOT support change replication factor to smaller value.
I have verified the check list #1 and #3 by Ovidiu, more cases may need validation team to check for raise issues if have.

Cindy Xie (xxie1)
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
Tingjie Chen (silverhandy) wrote :

With the patch: https://review.opendev.org/#/c/662704/, I also verified case of this issue, in storage dedicated (2+2=2) deployment, replication can modify AFTER 2 controllers unlock and BEFORE first storage node installed.

The result in my deploy env:
----------------------------------------------------------

[wrsroot@controller-0 ~(keystone_admin)]$ system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | controller-1 | controller | unlocked | enabled | available |
| 3 | storage-0 | storage | unlocked | enabled | available |
| 4 | storage-1 | storage | unlocked | enabled | available |
| 5 | compute-0 | worker | unlocked | enabled | available |
| 6 | compute-1 | worker | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+
[wrsroot@controller-0 ~(keystone_admin)]$ system storage-backend-list
+--------------------------------------+-----------------+----------+------------+-------------------+----------+--------------------+
| uuid | name | backend | state | task | services | capabilities |
+--------------------------------------+-----------------+----------+------------+-------------------+----------+--------------------+
| bece9bf5-5acd-4d17-aab7-f38ff785b0b0 | ceph-store | ceph | configured | provision-storage | None | min_replication: 2 |
| | | | | | | replication: 3 |
| e76c9167-f757-4ea2-a60b-18c3741f719b | shared_services | external | configured | None | glance | |
+--------------------------------------+-----------------+----------+------------+-------------------+----------+--------------------+
[wrsroot@controller-0 ~(keystone_admin)]$ ceph -s
  cluster:
    id: 7da118b8-1bbe-4162-94aa-c780dc9eb5f4
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum controller-0,controller-1,storage-0
    mgr: controller-0(active), standbys: controller-1
    osd: 2 osds: 2 up, 2 in
    rgw: 1 daemon active

  data:
    pools: 4 pools, 256 pgs
    objects: 1.13 k objects, 1.1 KiB
    usage: 225 MiB used, 498 GiB / 498 GiB avail
    pgs: 256 active+clean

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/662704
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=843abd69585875e04ada4e59ef7f4f5c1c92abd7
Submitter: Zuul
Branch: master

commit 843abd69585875e04ada4e59ef7f4f5c1c92abd7
Author: Chen, Tingjie <email address hidden>
Date: Mon Jun 3 08:47:56 2019 +0800

    Rework task state machine for storage backend provision replication

    Remove SB_TASK_RECONFIG_CONTROLLER since it is no needed, it is
    previously used to define the process between LVM and CEPH backend, but
    currently no longer have LVM.

    By default task is SB_TASK_PROVISION_STORAGE in replace of
    SB_TASK_RECONFIG_CONTROLLER, and after add ceph, task change to None as
    already provisioned.

    Closes-Bug: 1827529
    Change-Id: I050eda8203e881907e7f338b71848f6e3cd5e16f
    Signed-off-by: Chen, Tingjie <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :
Download full text (6.1 KiB)

verified 2019-08-12_20-59-00

Replication factor 3 can be configured if both controllers are unlocked (controller-1 can be in degraded)

On storage system 2+2+X deployment
Unable to modify replication factor if the storage nodes are installed already
$ system storage-backend-modify ceph-store replication=3 min_replication=2
Can not modify ceph replication factor once a storage node has been installed. This operation is not supported.

If storage nodes are not installed, both controllers are needed
[sysadmin@controller-0 ~(keystone_admin)]$ system storage-backend-modify ceph-store replication=3 min_replication=2
Storage backend operations require both controllers to be enabled and available/degraded.

$ system storage-backend-modify ceph-store replication=3 min_replication=2
+----------------------+--------------------------------------+
| Property | Value |
+----------------------+--------------------------------------+
| backend | ceph |
| name | ceph-store |
| state | configured |
| task | provision-storage |
| services | None |
| capabilities | min_replication: 2 |
| | replication: 3 |
| object_gateway | False |
| ceph_total_space_gib | 0 |
| object_pool_gib | None |
| cinder_pool_gib | None |
| kube_pool_gib | None |
| glance_pool_gib | None |
| ephemeral_pool_gib | None |
| tier_name | storage |
| tier_uuid | 1bfcc144-2afd-4b95-bce1-1790d4c784f3 |
| created_at | 2019-08-14T14:03:48.338323+00:00 |
| updated_at | 2019-08-14T14:43:47.536261+00:00

$ system storage-backend-list
+--------------------------------------+-------------+----------+------------+-------------------+----------+--------------------+
| uuid | name | backend | state | task | services | capabilities |
+--------------------------------------+-------------+----------+------------+-------------------+----------+--------------------+
| 9b7a0b9b-03ff-4806-8c9b-4573f3e19e6a | shared_serv | external | configured | None | glance | |
| | ices | | | | | |
| | | | | | | |
| ed785c9a-3b3c-403d-ac61-5b06ea61c113 | ceph-store | ceph | configured | provision-storage | None | min_replication: 2 |
| | | | | | | replication: 3 ...

Read more...

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.