Changing of Ceph replication factor after deployment

Bug #1430705 reported by Jon Skarpeteig on 2015-03-11
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Wishlist
MOS Ceph
7.0.x
Wishlist
MOS Maintenance
8.0.x
Wishlist
MOS Ceph
Mitaka
Wishlist
MOS Ceph
Newton
Wishlist
MOS Ceph

Bug Description

In my case the replication factor number was misunderstood and left at 2. As this means 2 copies in total, and not 2 copies in addition to the original upload - I needed to change this.

This was done on any controller to fix it:

ceph osd lspools # List Ceph pools
ceph osd pool set <pool name> size <size_value> # Increase size one by one
ceph -s # Verify that replication completes successfully

Then edit the Fuel Postgres database manually to increase from 2 to 3 - to avoid any issues with expanding the cluster.

The only way to do this through the Fuel GUI currently is to reset the entire OpenStack cluster (thus wiping existing Ceph data). This was not an option for us.

Given the simplicity of the above solution, it should be no problem to enable editing of this field through the Fuel GUI. The only other thing that should be done would be a quick check to see if there's enough space available in the Ceph cluster for the new replication factor.

Changed in fuel:
importance: Undecided → Wishlist
status: New → Confirmed
milestone: none → next
Changed in fuel:
assignee: nobody → Fuel UI Team (fuel-ui)
Dmitry Pyzhov (dpyzhov) on 2015-06-01
Changed in fuel:
milestone: next → 7.0
tags: added: ui
Vitaly Kramskikh (vkramskikh) wrote :

If everything is that easy as described, can we just add "always_editable: true" to ceph replication factor in openstack.yaml?

Vitaly Kramskikh (vkramskikh) wrote :

Library guys, please check if just changing replication factor really works, and if it is, just add always_editable: true for "osd_pool_size" in openstack.yaml

Changed in fuel:
assignee: Fuel UI Team (fuel-ui) → Fuel Library Team (fuel-library)
tags: added: customer-found
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Mykola Golub (mgolub)
assignee: Mykola Golub (mgolub) → MOS Ceph (mos-ceph)
Sergii Golovatiuk (sgolovatiuk) wrote :

This bug is related to lifecycle management. I am assigning it to subject matter experts to get an answer if it's safe to increase replication factor for operational cluster.

Meanwhile I am moving this bug to 8.0 where it can be converted to feature.

Changed in fuel:
status: Confirmed → Won't Fix
Dmitry Pyzhov (dpyzhov) on 2015-10-12
Changed in fuel:
milestone: 7.0 → 8.0
status: Won't Fix → Confirmed
no longer affects: fuel/8.0.x
Dmitry Pyzhov (dpyzhov) on 2015-10-16
tags: added: feature
Kostiantyn Danylov (kdanylov) wrote :

All ceph maintenance is suppose to be done with external tool. We aren't suppose to add this to FUEL for now.

tags: added: life-cycle-management need-bp
tags: removed: feature

For now we don't intend to manage Ceph configuration in post-deployment phase w/ Fuel, we plan to integrate 3rd party tool for it (plugin for Calamari is already available, other tools are in backlog). If the cloud is already deployed and RF/other stuff has to be changed - user should manage it directly via native ceph tools/config files.

I suggest that we close it as "Won't fix". I've attached this bug to corresponding Epic.

Dmitry Pyzhov (dpyzhov) on 2015-10-22
tags: added: area-mos
Dmitry Borodaenko (angdraug) wrote :

@Dmitriy N.: if this bug is expected to be addressed by a blueprint, the standard practice is to attach it to a blueprint in LP, and leave it open until blueprint is implemented. This allows us to verify that blueprint implementation has really fixed all issues that we have expected it to fix.

Roman Podoliaka (rpodolyaka) wrote :

We no longer fix Wishlist bugs in 8.0, closing as Won't Fix

tags: added: wontfix-feature
Alexei Sheplyakov (asheplyakov) wrote :

Increasing the replication factor is one of the most disruptive events which could ever happen to a ceph cluster.
Even recovering a few failed OSDs, or moving data from full/nearly full OSD, or scrubbing (checking if objects'
replicas are identical) which boils down to copying or moving 5 -- 20 % of the available objects can substantially
reduce the cluster performance. Making an extra copy of every object (as a result of incrementing replication factor)
on a production cluster is a big no-no.

As a matter of fact replication factor, number of PGs, etc, should be planned in advance. Just because it's possible
*in theory* to change these parameters doesn't mean it's easy. I think we should NOT provide a UI for such
dangerous/expensive operations.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers