Bug #1827529 “Unable to provision replication factor 3 on storag...” : Bugs : StarlingX

Numan Waheed (nwaheed) on 2019-05-06

tags:

added: stx.retestneeded

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2019-05-06:

#1

Marking as release gating; replication 3 is a supported stx storage configuration.

Changed in starlingx:
importance:	Undecided → Medium
status:	New → Triaged
tags:	added: stx.2.0 stx.config
Changed in starlingx:
assignee:	nobody → Frank Miller (sensfan22)

Tingjie Chen (silverhandy) on 2019-05-09

tags:

added: stx.storage

Tingjie Chen (silverhandy) on 2019-05-10

Changed in starlingx:
assignee:	Frank Miller (sensfan22) → Tingjie Chen (silverhandy)
status:	Triaged → In Progress

Revision history for this message

Maria Yousaf (myousaf) wrote on 2019-05-10:

#2

This is still a problem in 20190510T013000Z. Output is as follows:

Cindy Xie (xxie1) on 2019-05-13

tags:

added: stx.distro.other

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2019-05-13:

#3

Found same behavior on BM 2+2+2 dedicated storage

[wrsroot@controller-0 ~(keystone_admin)]$ system storage-backend-list
+--------------------------------------+-----------------+----------+------------+---------------------+----------+--------------------+
| uuid | name | backend | state | task | services | capabilities |
+--------------------------------------+-----------------+----------+------------+---------------------+----------+--------------------+
| 48ab60d1-9733-4ce3-9f64-943249f83226 | shared_services | external | configured | None | glance | |
| 496dbdc3-f906-45ff-9c15-7d8ca8689572 | ceph-store | ceph | configured | reconfig-controller | None | min_replication: 1 |
| | | | | | | replication: 2 |
+--------------------------------------+-----------------+----------+------------+---------------------+----------+--------------------+
[wrsroot@controller-0 ~(keystone_admin)]$ system storage-backend-modify ceph-store replication=3 min_replication=2
Can not modify ceph replication factor when storage backend state is 'configured' and task is 'reconfig-controller.' Operation supported for state 'configuring' and task 'provision-storage.'

Revision history for this message

Ovidiu Poncea (ovidiuponcea) wrote on 2019-05-14:

#4

SB_TASK_RECONFIG_CONTROLLER is no longer needed as a state. It should be removed. It was needed at some point in time but no longer.

Then at some point in time, modifying ceph parameters lead to application of runtime puppet manifests. In order for that to succeed both controllers had to be available and configuration fully applied (i.e. not Config-out-of-date). This was equivalent with stroage backend being in 'provsion-storage'... this is no longer the case.

Let's go one step back. Replication changes are allowed in two configurations:
1. AIO-SX: can go from 1 to 2 to 3 and back w/o any restrictions.
2. On a storage setup (i.e. one that has storage nodes) users can go from 2 to 3 if there are less than 2 storage nodes deployed. There is no way go back once enabled (To recheck this! In theory we should be able to go back if there are less than two storage nodes installed).

There is no way to increase replication number on DX or standard (a.k.a. 2+2 w/o storage nodes).

Now going back to our issue; the following semantic checks make more sense:
1. Storage backend task in 'configured' state
2. No more than two storage nodes provisioned, once 3rd storage node is provisioned users should no longer be allowed to increase replication number, nor to go back (the reason is that with replication 2 the 3rd node is part of a different replication group and is impossible to go back w/o loosing data or complex operations).
3. Storage model is not CEPH_CONTROLLER_MODEL. This will protect against users trying to modify replication number if 2+2 is configured [Check this funtion: ceph.get_ceph_storage_model()]
4. Do not allow adding the 3rd ceph-monitor if replication is set to 3, this will protect against users who set replication number before selecting the storage model.

Now, regarding storage modes. We support 3 model, selected automatically based on system configuration (i.e. there is no specific settings, but implied from mutliple settings, see get_ceph_storage_model() on the full logic behind it ig you are interested):
1. AIO-SX model: we have a single node, replication is done on OSDs, not on nodes.
2. Controller-model: OSDs can be installed on controllers. A 3rd monitor has to be installed on a worker node to make for the 3 monitor quorum. Once an OSD or a monitor in installed on a worker, user won't be able to install storage nodes.
3. Storage-model: OSDs can be installed on storage nodes. Once storage-0 is added, users can no longer add a monitor to a compute nor to add OSDs to controllers.

SB_TASK_RECONFIG_CONTROLLER is no longer needed as a state. It should be removed. It was needed at some point in time but no longer.

Then at some point in time, modifying ceph parameters lead to application of runtime puppet manifests. In order for that to succeed both controllers had to be available and configuration fully applied (i.e. not Config-out-of-date). This was equivalent with stroage backend being in 'provsion-storage'... this is no longer the case.

Let's go one step back. Replication changes are allowed in two configurations:
1. AIO-SX: can go from 1 to 2 to 3 and back w/o any restrictions.
2. On a storage setup (i.e. one that has storage nodes) users can go from 2 to 3 if there are less than 2 storage nodes deployed. There is no way go back once enabled (To recheck this! In theory we should be able to go back if there are less than two storage nodes installed).

There is no way to increase replication number on DX or standard (a.k.a. 2+2 w/o storage nodes).

Now going back to our issue; the following semantic checks make more sense:
1. Storage backend task in 'configured' state
2. No more than two storage nodes provisioned, once 3rd storage node is provisioned users should no longer be allowed to increase replication number, nor to go back (the reason is that with replication 2 the 3rd node is part of a different replication group and is impossible to go back w/o loosing data or complex operations).
3. Storage model is not CEPH_CONTROLLER_MODEL. This will protect against users trying to modify replication number if 2+2 is configured [Check this funtion: ceph.get_ceph_storage_model()]
4. Do not allow adding the 3rd ceph-monitor if replication is set to 3, this will protect against users who set replication number before selecting the storage model.

Now, regarding storage modes. We support 3 model, selected automatically based on system configuration (i.e. there is no specific settings, but implied from mutliple settings, see get_ceph_storage_model() on the full logic behind it ig you are interested):
1. AIO-SX model: we have a single node, replication is done on OSDs, not on nodes.
2. Controller-model: OSDs can be installed on controllers. A 3rd monitor has to be installed on a worker node to make for the 3 monitor quorum. Once an OSD or a monitor in installed on a worker, user won't be able to install storage nodes.
3. Storage-model: OSDs can be installed on storage nodes. Once storage-0 is added, users can no longer add a monitor to a compute nor to add OSDs to controllers.

Revision history for this message

Fernando Hernandez Gonzalez (fhernan2) wrote on 2019-05-15:

#5

@Ovidiu, checking your comments, couple of questions:
- <Per Ovidiu> 2. On a storage setup (i.e. one that has storage nodes) users can go from 2 to 3 if there are less than 2 storage nodes deployed.
<FH> Assuming your comment about “having less than 2 storage nodes” I was thinking having following scenario:
Scenario 1) when having 2 storage nodes in a dedicated storage with at least 3 osd.[x]s, meaning we can storage on osd.0 and get replicated on osd.1 and osd.2
[wrsroot@controller-0 ~(keystone_admin)]$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 2.17743 root storage-tier
-3 1.30646 chassis group-0
-4 0.43549 host storage-0
  0 ssd 0.43549 osd.0 up 1.00000 1.00000
  1 ssd 0.43549 osd.1 up 1.00000 1.00000
-5 0.87097 host storage-1
  2 ssd 0.43549 osd.2 up 1.00000 1.00000

***Could you please share if there is any other scenario?

- <Per Ovidiu> 2. No more than two storage nodes provisioned, once 3rd storage node is provisioned users should no longer be allowed to increase replication number, nor to go back (the reason is that with replication 2 the 3rd node is part of a different replication group and is impossible to go back w/o loosing data or complex operations).
<FH> based on your comments, we should be thinking in following prerequisites? could you please confirm?
Prerequisites to enable factor 3.
o Have 2 storage nodes with at least 3 osds up.
o Remove SB_TASK_RECONFIG_CONTROLLER state since is not longer required. IT THIS MEAN there should be a fix/commit for this change?
o All storage nodes should be on OK status.
o and after that run “$system storage-backend-modify ceph-store replication=3 min_replication=2” command?

@Ovidiu, could you please confirm if adding below test cases make sense for you?
- Add negative test case where replication factor 3 is not allowed on DX and standard (a.k.a. 2+2 w/o storage nodes)
Regarding storage modes, we should be adding following negative test cases
- For AIO-SX model, confirm replication is made on OSDs
- For Controller-model, confirm two osd monitors are on controllers and the 3rd one in a worker and make sure after that user wont be able to install more storage nodes.
- For storage-model, once storage-0 is added user can no longer add monitors to a compute nor OSDs to controllers. Meaning we can do it if we first create osds on controllers and then add storage-0 node

@Ovidiu, checking your comments, couple of questions:
-	<Per Ovidiu> 2. On a storage setup (i.e. one that has storage nodes) users can go from 2 to 3 if there are less than 2 storage nodes deployed. 
<FH> Assuming your comment about “having less than 2 storage nodes” I was thinking having following scenario: 
Scenario 1) when having 2 storage nodes in a dedicated storage with at least 3 osd.[x]s, meaning we can storage on osd.0 and get replicated on osd.1 and osd.2
[wrsroot@controller-0 ~(keystone_admin)]$ ceph osd tree
ID  CLASS WEIGHT  TYPE NAME              STATUS REWEIGHT PRI-AFF 
 -1       2.17743 root storage-tier                              
 -3       1.30646     chassis group-0                            
 -4       0.43549         host storage-0                         
  0   ssd 0.43549             osd.0          up  1.00000 1.00000 
  1   ssd 0.43549             osd.1          up  1.00000 1.00000 
-5       0.87097         host storage-1                         
  2   ssd 0.43549             osd.2          up  1.00000 1.00000

***Could you please share if there is any other scenario?

-	<Per Ovidiu> 2. No more than two storage nodes provisioned, once 3rd storage node is provisioned users should no longer be allowed to increase replication number, nor to go back (the reason is that with replication 2 the 3rd node is part of a different replication group and is impossible to go back w/o loosing data or complex operations).
<FH> based on your comments, we should be thinking in following prerequisites? could you please confirm?
   Prerequisites to enable factor 3.
o	Have 2 storage nodes with at least 3 osds up.
o	Remove SB_TASK_RECONFIG_CONTROLLER state since is not longer required. IT THIS MEAN there should be a fix/commit for this change?
o	All storage nodes should be on OK status.
o	and after that run “$system storage-backend-modify ceph-store replication=3 min_replication=2” command?

@Ovidiu, could you please confirm if adding below test cases make sense for you?
-	Add negative test case where replication factor 3 is not allowed on DX and standard (a.k.a. 2+2 w/o storage nodes)
Regarding storage modes, we should be adding following negative test cases
-	For AIO-SX model, confirm replication is made on OSDs
-	For Controller-model, confirm two osd monitors are on controllers and the 3rd one in a worker and make sure after that user wont be able to install more storage nodes.
-	For storage-model, once storage-0 is added user can no longer add monitors to a compute nor OSDs to controllers. Meaning we can do it if we first create osds on controllers and then add storage-0 node

Revision history for this message

Ovidiu Poncea (ovidiuponcea) wrote on 2019-05-16:

#6

Download full text (4.5 KiB)

Replied via email, see inline [Ovi] tag:

@Ovidiu, checking your comments from LP1827529, couple of questions:

- <Per Ovidiu> 2. On a storage setup (i.e. one that has storage nodes) users can go from 2 to 3 if there are less than 2 storage nodes deployed.

<FH> Assuming your comment about “having less than 2 storage nodes” I was thinking having following scenario:

Scenario 1) when having 2 storage nodes in a dedicated storage with at least 3 osd.[x]s, meaning we can storage on osd.0 and get replicated on osd.1 and osd.2

[wrsroot@controller-0 ~(keystone_admin)]$ ceph osd tree

ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF

-1 2.17743 root storage-tier

-3 1.30646 chassis group-0

-4 0.43549 host storage-0

0 ssd 0.43549 osd.0 up 1.00000 1.00000

1 ssd 0.43549 osd.1 up 1.00000 1.00000

-5 0.87097 host storage-1

2 ssd 0.43549 osd.2 up 1.00000 1.00000

***Could you please share if there is any other scenario?

[Ovi] Note that data on osd.0 does not replicate on osd.1. Replication is done per node, not per OSD. So data from osd.0 and osd.1 will get replicated on osd.2 (data is divided into small chunks called placement gorups - PGs - PGs get replicated not OSDs => there is no corresponding replicated OSD, so you can't say that OSD.x is replicating on OSD.y but you can say that PG1 on osd.0 gets replicated to osd.2, PG2 on osd.0 gets replicated on osd.2 PG3 on osd.1 gets replicated on osd.2 and so on...).

In this case you have replication 2. If you would have replication 3 you will get something like output below and data from osd.0 & osd.1 will be replication on osd.2 and on osd.3:

[wrsroot@controller-0 ~(keystone_admin)]$ ceph osd tree

ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF

-1 2.17743 root storage-tier

-3 1.30646 chassis group-0

-4 0.43549 host storage-0

0 ssd 0.43549 osd.0 up 1.00000 1.00000

1 ssd 0.43549 osd.1 up 1.00000 1.00000

-5 0.87097 host storage-1

2 ssd 0.43549 osd.2 up 1.00000 1.00000

-5 0.87097 host storage-2

2 ssd 0.43549 osd.3 up 1.00000 1.00000

- <Per Ovidiu> 2. No more than two storage nodes provisioned, once 3rd storage node is provisioned users should no longer be allowed to increase replication number, nor to go back (the reason is that with replication 2 the 3rd node is part of a different replication group and is impossible to go back w/o loosing data or complex operations).

<FH> based on your comments, we should be thinking in following prerequisites? could you please confirm?

Prerequisites to enable factor 3.

o Have 2 storage nodes with at least 3 osds up [Ovi] at least one OSD per storage node. There is no need for mor...

Replied via email, see inline [Ovi] tag:

@Ovidiu, checking your comments from LP1827529, couple of questions:

-          <Per Ovidiu> 2. On a storage setup (i.e. one that has storage nodes) users can go from 2 to 3 if there are less than 2 storage nodes deployed.

<FH> Assuming your comment about “having less than 2 storage nodes” I was thinking having following scenario:

Scenario 1) when having 2 storage nodes in a dedicated storage with at least 3 osd.[x]s, meaning we can storage on osd.0 and get replicated on osd.1 and osd.2

[wrsroot@controller-0 ~(keystone_admin)]$ ceph osd tree

ID  CLASS WEIGHT  TYPE NAME              STATUS REWEIGHT PRI-AFF

-1       2.17743 root storage-tier

-3       1.30646     chassis group-0

-4       0.43549         host storage-0

0   ssd 0.43549             osd.0          up  1.00000 1.00000

1   ssd 0.43549             osd.1          up  1.00000 1.00000

-5       0.87097         host storage-1

2   ssd 0.43549             osd.2          up  1.00000 1.00000

***Could you please share if there is any other scenario?

[Ovi] Note that data on osd.0 does not replicate on osd.1. Replication is done per node, not per OSD. So data from osd.0 and osd.1 will get replicated on osd.2 (data is divided into small chunks called placement gorups - PGs - PGs get replicated not OSDs => there is no corresponding replicated OSD, so you can't say that OSD.x is replicating on OSD.y but you can say that PG1 on osd.0 gets replicated to osd.2, PG2 on osd.0 gets replicated on osd.2 PG3 on osd.1 gets replicated on osd.2 and so on...).

In this case you have replication 2. If you would have replication 3 you will get something like output below and data from osd.0 & osd.1 will be replication on osd.2 and on osd.3:

[wrsroot@controller-0 ~(keystone_admin)]$ ceph osd tree

ID  CLASS WEIGHT  TYPE NAME              STATUS REWEIGHT PRI-AFF

-1       2.17743 root storage-tier

-3       1.30646     chassis group-0

-4       0.43549         host storage-0

0   ssd 0.43549             osd.0          up  1.00000 1.00000

1   ssd 0.43549             osd.1          up  1.00000 1.00000

-5       0.87097         host storage-1

2   ssd 0.43549             osd.2          up  1.00000 1.00000

-5       0.87097         host storage-2

2   ssd 0.43549             osd.3          up  1.00000 1.00000

-          <Per Ovidiu> 2. No more than two storage nodes provisioned, once 3rd storage node is provisioned users should no longer be allowed to increase replication number, nor to go back (the reason is that with replication 2 the 3rd node is part of a different replication group and is impossible to go back w/o loosing data or complex operations).

<FH> based on your comments, we should be thinking in following prerequisites? could you please confirm?

Prerequisites to enable factor 3.

o   Have 2 storage nodes with at least 3 osds up [Ovi] at least one OSD per storage node. There is no need for more than one.

o   Remove SB_TASK_RECONFIG_CONTROLLER state since is not longer required. IT THIS MEAN there should be a fix/commit for this change? [Ovi] Yes, it;s not related to testing :)

o   All storage nodes should be on OK status. [Ovi] yes

o   and after that run “$system storage-backend-modify ceph-store replication=3 min_replication=2” command? [Ovi] yes

@Ovidiu, could you please confirm if adding below test cases make sense for you?

-          Add negative test case where replication factor 3 is not allowed on DX and standard (a.k.a. 2+2 w/o storage nodes) [Ovi] I confirm

Regarding storage modes, we should be adding following negative test cases

-          For AIO-SX model, confirm replication is made on OSDs [Ovi] I confirm. Btw. if AIO-SX has 2 OSDs and user goes from replication 2 to 3 will set cluster to HEALTH_WARN till 3rd OSD is installed.

-          For Controller-model, confirm two osd monitors are on controllers and the 3rd one in a worker and make sure after that user wont be able to install more storage nodes. [Ovi] I confirm

-          For storage-model, once storage-0 is added user can no longer add monitors to a compute nor OSDs to controllers. Meaning we can do it if we first create osds on controllers and then add storage-0 node [Ovi] No, once OSDs are added to controllers storage node install should not be allowed (a test is worth to check for this)

Revision history for this message

Tingjie Chen (silverhandy) wrote on 2019-05-17:

#7

Good discussion about the scenarios :)
so beside the needed change for SB_TASK_RECONFIG_CONTROLLER, I think maybe add some test cases to validate there has issues and we can re-consider whether to modify the related source code for the replication control strategy.
How do you think? Ovidiu and Fernando?

Tingjie Chen (silverhandy) on 2019-05-21

Changed in starlingx:
status:	In Progress → New
status:	New → In Progress

Cindy Xie (xxie1) on 2019-05-22

tags:

removed: stx.distro.other

Cindy Xie (xxie1) on 2019-06-05

Changed in starlingx:
status:	In Progress → Triaged

Revision history for this message

Tingjie Chen (silverhandy) wrote on 2019-06-05:

#8

I have made patch: https://review.opendev.org/#/c/662704/ which expected to resolve the issue and reclaim teh state-machine and answer the comments with Ovidiu in patch review as following:

[Ovidiu] Changes are ok as they deal with #1 in the bug comment:
Now going back to our issue; the following semantic checks make more sense:
1. Storage backend task in 'configured' state
2. No more than two storage nodes provisioned, once 3rd storage node is provisioned users should no longer be allowed to increase replication number, nor to go back (the reason is that with replication 2 the 3rd node is part of a different replication group and is impossible to go back w/o loosing data or complex operations).
3. Storage model is not CEPH_CONTROLLER_MODEL, if it is then replication number changes should be denied. This will protect against users trying to modify replication number if 2+2 is configured [Check this funtion: ceph.get_ceph_storage_model()]
4. Do not allow adding the 3rd ceph-monitor if replication is set to 3, this will protect against users who set replication number before selecting the storage model.

What's your proposal on dealing with 2,3,4? #2 is a short test, but #3 and #4 are most likely missing from code. One option is for Fernando to test these case and raise issues or fix them as part of this commit?

[Tingjie] Yes, #1 in bug comments is fixed to remove SB_TASK_RECONFIG_CONTROLLER.
for the semantic checks lists, in current replication check mechanism, there are:
a. Allow replication modify when AIO-SX with SB_STATE_CONFIGURED state.
b. NOT allow replication modify when AIO-DX and controller-nodes (2 controller + 2 worker) with ceph model.
c. In ceph storage model, 2 controller + 2 worker + 2 storage nodes
    Allow modifications of ceph storage backend parameters after the manifests have been applied and BEFORE first storage node has been configured
    Changing replication factor once the first storage node has been installed (pools created) is NOT supported
    NOT support change replication factor to smaller value.
I have verified the check list #1 and #3 by Ovidiu, more cases may need validation team to check for raise issues if have.

I have made patch: https://review.opendev.org/#/c/662704/ which expected to resolve the issue and reclaim teh state-machine and answer the comments with Ovidiu in patch review as following:

[Ovidiu] Changes are ok as they deal with #1 in the bug comment:
Now going back to our issue; the following semantic checks make more sense:
1. Storage backend task in 'configured' state
2. No more than two storage nodes provisioned, once 3rd storage node is provisioned users should no longer be allowed to increase replication number, nor to go back (the reason is that with replication 2 the 3rd node is part of a different replication group and is impossible to go back w/o loosing data or complex operations).
3. Storage model is not CEPH_CONTROLLER_MODEL, if it is then replication number changes should be denied. This will protect against users trying to modify replication number if 2+2 is configured [Check this funtion: ceph.get_ceph_storage_model()]
4. Do not allow adding the 3rd ceph-monitor if replication is set to 3, this will protect against users who set replication number before selecting the storage model.

What's your proposal on dealing with 2,3,4? #2 is a short test, but #3 and #4 are most likely missing from code. One option is for Fernando to test these case and raise issues or fix them as part of this commit?

[Tingjie] Yes, #1 in bug comments is fixed to remove SB_TASK_RECONFIG_CONTROLLER.
for the semantic checks lists, in current replication check mechanism, there are:
a. Allow replication modify when AIO-SX with SB_STATE_CONFIGURED state.
b. NOT allow replication modify when AIO-DX and controller-nodes (2 controller + 2 worker) with ceph model.
c. In ceph storage model, 2 controller + 2 worker + 2 storage nodes
    Allow modifications of ceph storage backend parameters after the manifests have been applied and BEFORE first storage node has been configured
    Changing replication factor once the first storage node has been installed (pools created) is NOT supported
    NOT support change replication factor to smaller value.
I have verified the check list #1 and #3 by Ovidiu, more cases may need validation team to check for raise issues if have.

Cindy Xie (xxie1) on 2019-06-05

Changed in starlingx:
status:	Triaged → In Progress

Revision history for this message

Tingjie Chen (silverhandy) wrote on 2019-06-05:

#9

With the patch: https://review.opendev.org/#/c/662704/, I also verified case of this issue, in storage dedicated (2+2=2) deployment, replication can modify AFTER 2 controllers unlock and BEFORE first storage node installed.

The result in my deploy env:
----------------------------------------------------------

  services:
    mon: 3 daemons, quorum controller-0,controller-1,storage-0
    mgr: controller-0(active), standbys: controller-1
    osd: 2 osds: 2 up, 2 in
    rgw: 1 daemon active

  data:
    pools: 4 pools, 256 pgs
    objects: 1.13 k objects, 1.1 KiB
    usage: 225 MiB used, 498 GiB / 498 GiB avail
    pgs: 256 active+clean

With the patch: https://review.opendev.org/#/c/662704/, I also verified case of this issue, in storage dedicated (2+2=2) deployment, replication can modify AFTER 2 controllers unlock and BEFORE first storage node installed.

The result in my deploy env:
----------------------------------------------------------

services:
    mon: 3 daemons, quorum controller-0,controller-1,storage-0
    mgr: controller-0(active), standbys: controller-1
    osd: 2 osds: 2 up, 2 in
    rgw: 1 daemon active

data:
    pools:   4 pools, 256 pgs
    objects: 1.13 k objects, 1.1 KiB
    usage:   225 MiB used, 498 GiB / 498 GiB avail
    pgs:     256 active+clean

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-13: Fix merged to config (master)

#10

Reviewed: https://review.opendev.org/662704
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=843abd69585875e04ada4e59ef7f4f5c1c92abd7
Submitter: Zuul
Branch: master

commit 843abd69585875e04ada4e59ef7f4f5c1c92abd7
Author: Chen, Tingjie <email address hidden>
Date: Mon Jun 3 08:47:56 2019 +0800

Rework task state machine for storage backend provision replication

    Remove SB_TASK_RECONFIG_CONTROLLER since it is no needed, it is
    previously used to define the process between LVM and CEPH backend, but
    currently no longer have LVM.

    By default task is SB_TASK_PROVISION_STORAGE in replace of
    SB_TASK_RECONFIG_CONTROLLER, and after add ceph, task change to None as
    already provisioned.

    Closes-Bug: 1827529
    Change-Id: I050eda8203e881907e7f338b71848f6e3cd5e16f
    Signed-off-by: Chen, Tingjie <email address hidden>

Changed in starlingx:
status:	In Progress → Fix Released

Revision history for this message

Wendy Mitchell (wmitchellwr) wrote on 2019-08-14:

#11

Download full text (6.1 KiB)

verified 2019-08-12_20-59-00

Replication factor 3 can be configured if both controllers are unlocked (controller-1 can be in degraded)

On storage system 2+2+X deployment
Unable to modify replication factor if the storage nodes are installed already
$ system storage-backend-modify ceph-store replication=3 min_replication=2
Can not modify ceph replication factor once a storage node has been installed. This operation is not supported.

If storage nodes are not installed, both controllers are needed
[sysadmin@controller-0 ~(keystone_admin)]$ system storage-backend-modify ceph-store replication=3 min_replication=2
Storage backend operations require both controllers to be enabled and available/degraded.

StarlingX

Unable to provision replication factor 3 on storage system

Bug Description

Other bug subscribers

Remote bug watches