Lost data in Ceph during failure adding new Ceph nodes.

Bug #1445296 reported by Denis Ipatov
This bug report is a duplicate of:  Bug #1430845: Ceph HIGH IO load when add new OSDs. Edit Remove
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Critical
Stanislav Makar
6.0.x
In Progress
Critical
Stanislav Makar

Bug Description

How to reproduce it:

1. Create new cluster. All data need to be located in Ceph.
2. Add some data in the working cloud. For example add several images.
3. Add new Ceph OSD nodes. Ceph starts repalancing after installing the OS and Ceph but before finish deployment.
4. If we had any deployment's error the cluster marks as 'Error'
5. Delete Ceph nodes witch were marked as "Error"
6. Number of lost number depend on how much data was rebalanced.

How to avoid it:
1. Execute the command `ceph osd set noout` to stop rebalancing data before adding Ceph OSD nodes
and `ceph osd unset noout` after succesful end of deployment.

This is affect all version of MOS.

Denis Ipatov (dipatov)
description: updated
summary: - Lost data in Ceph during fail adding new Ceph nodes.
+ Lost data in Ceph during failure adding new Ceph nodes.
tags: added: customer-found
Denis Ipatov (dipatov)
description: updated
Changed in fuel:
milestone: none → 6.1
assignee: nobody → Fuel Library Team (fuel-library)
importance: Undecided → High
Revision history for this message
Stanislav Makar (smakar) wrote :

It would be fine to know ISO version ?
We have already merged the patch https://github.com/stackforge/fuel-library/commit/c52d4fc377efe1134e8be81a18560c0a6e0138c3 which should help with it

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Stanislav Makar (smakar)
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Potential data loss should has a critical priority

Changed in fuel:
importance: High → Critical
Revision history for this message
Stanislav Makar (smakar) wrote :

We have to backport to 6.0 version

Changed in fuel:
status: New → Incomplete
status: Incomplete → Fix Committed
Changed in fuel:
status: Fix Committed → Incomplete
Revision history for this message
Stanislav Makar (smakar) wrote :

Fix proposed to branch: stable/6.0
Review: https://review.openstack.org/175364

Revision history for this message
Denis Ipatov (dipatov) wrote :

I think this fix doesn't fix the bug.

Revision history for this message
Denis Ipatov (dipatov) wrote :

osd max backfills

Description: The maximum number of backfills allowed to or from a single OSD.
Type: 64-bit Unsigned Integer
Default: 10

osd recovery max active

Description: The number of active recovery requests per OSD at one time. More requests will accelerate recovery, but the requests places an increased load on the cluster.
Type: 32-bit Integer
Default: 15

These values only decrease speed of process of replication, but we must stop the replication until nodes are successfully added.

Revision history for this message
Tomasz 'Zen' Napierala (tzn) wrote :

Stas,

It should be worked on in master, not 6.0, also Incomplete status for 6.1 is not correct.

Changed in fuel:
status: Incomplete → Confirmed
Revision history for this message
Stanislav Makar (smakar) wrote :

This patch is connected with problem when you add a lot of OSDs, cluster starts rebalancing very hard and in such case some osd nodes are lost due to big load
And it is the root cause why data was lost

Changed in fuel:
status: Confirmed → Fix Committed
Revision history for this message
Stanislav Makar (smakar) wrote :

Data will be save but will last longer

Revision history for this message
Stanislav Makar (smakar) wrote :
tags: added: on-verification
tags: removed: on-verification
Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

On verification

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

I performed deployment of ceph-enabled environment,
then added some images to glance
after that i have added one more Ceph OSD Node to deployment without errors.

How should i check data integrity in Ceph storage after rebalancing?

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

 Ubuntu, 1 Controller+Ceph, 1 Compute+Ceph.
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  openstack_version: "2014.2.2-6.1"
  api: "1.0"
  build_number: "437"
  build_id: "2015-05-19_10-05-51"
  nailgun_sha: "593c99f2b46cf52b2be6c7c6e182b6ba9f2232cd"
  python-fuelclient_sha: "e19f1b65792f84c4a18b5a9473f85ef3ba172fce"
  astute_sha: "96801c5bccb14aa3f2a0d7f27f4a4b6dd2b4a548"
  fuel-library_sha: "2814c51668f487e97e1449b078bad1942421e6b9"
  fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"
  fuelmain_sha: "68796aeaa7b669e68bc0976ffd616709c937187a"

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

My attempt to verify this fix caused new bug: https://bugs.launchpad.net/fuel/+bug/1457487

I`ll try to re-verify without exceeding disc space.

Revision history for this message
Stanislav Makar (smakar) wrote :

go to any controller and run "ceph -s" before deployment new osd and after
Output should be looks like

ceph -s
    cluster a04e9f78-693f-4b1b-b73f-92af242d002b
     health HEALTH_OK
     monmap e3: 3 mons at {node-1=192.168.0.4:6789/0,node-2=192.168.0.5:6789/0,node-3=192.168.0.6:6789/0}, election epoch 6, quorum 0,1,2 node-1,node-2,node-3
     osdmap e43: 8 osds: 8 up, 8 in
      pgmap v90: 4800 pgs, 12 pools, 13696 kB data, 48 objects
            16749 MB used, 378 GB / 395 GB avail
                4800 active+clean

Could be HEALTH_WARN but not HEALTH_ERR

Revision history for this message
Denis Ipatov (dipatov) wrote :

go to any controller and run "ceph -s" before deployment new osd and after

Output should be looks like before deployment:

ceph -s
    cluster a04e9f78-693f-4b1b-b73f-92af242d002b
     health HEALTH_OK
     monmap e3: 3 mons at {node-1=192.168.0.4:6789/0,node-2=192.168.0.5:6789/0,node-3=192.168.0.6:6789/0}, election epoch 6, quorum 0,1,2 node-1,node-2,node-3
     osdmap e43: 8 osds: 8 up, 8 in
      pgmap v90: 4800 pgs, 12 pools, 13696 kB data, 48 objects
            16749 MB used, 378 GB / 395 GB avail
                4800 active+clean

After deleting the new node it should be (in the example are added 2 disk)
ceph -s
    cluster a04e9f78-693f-4b1b-b73f-92af242d002b
     health HEALTH_OK
     monmap e3: 3 mons at {node-1=192.168.0.4:6789/0,node-2=192.168.0.5:6789/0,node-3=192.168.0.6:6789/0}, election epoch 6, quorum 0,1,2 node-1,node-2,node-3
     osdmap e43: 10 osds: 8 up, 10 in
      pgmap v90: 4800 pgs, 12 pools, 13696 kB data, 48 objects
            16749 MB used, 378 GB / 395 GB avail
                4800 active+clean

Could be HEALTH_WARN but not HEALTH_ERR

Revision history for this message
Denis Ipatov (dipatov) wrote :

Short update:
To lose data, you should add OSDs more than your replication factor.
For example: you have replication factor 3 and you should add 3 nodes.
I not sure what "list" in MOS (node or disk).

You can slack me or send e-mail if you have question.

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Now i performed following operations:

1) Created Env and added one 1GB image (Ubuntu 14 desktop iso) and 10GB volume based on image.
Ubuntu 14.04.1, Neutron VLAN
1 Controller + Ceph OSD
1 Compute + Ceph OSD
Replication Factor = 2

2) Checked Ceph status
root@node-7:~# ceph -s
    cluster ce0608f2-f66e-44e1-a359-d7f01e59e547
     health HEALTH_OK
     monmap e1: 1 mons at {node-7=192.168.0.3:6789/0}, election epoch 2, quorum 0 node-7
     osdmap e40: 4 osds: 4 up, 4 in
      pgmap v212: 2496 pgs, 12 pools, 2005 MB data, 427 objects
            14283 MB used, 239 GB / 253 GB avail
                2496 active+clean

3) Added one more Ceph OSD Node and redeployed.
Now we have 3 OSD nodes and RP=2.

4) Rechecked ceph status
# ceph -s
    cluster ce0608f2-f66e-44e1-a359-d7f01e59e547
     health HEALTH_OK
     monmap e1: 1 mons at {node-7=192.168.0.3:6789/0}, election epoch 2, quorum 0 node-7
     osdmap e52: 6 osds: 6 up, 6 in
      pgmap v297: 2496 pgs, 12 pools, 2005 MB data, 427 objects
            16583 MB used, 364 GB / 380 GB avail
                2496 active+clean

5) Performed storage-related OSTF tests, they passed successfully.
6) Launched instance from saved Image.

Revision history for this message
Kyrylo Romanenko (kromanenko) wrote :

Verified on
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  openstack_version: "2014.2.2-6.1"
  api: "1.0"
  build_number: "437"
  build_id: "2015-05-19_10-05-51"
  nailgun_sha: "593c99f2b46cf52b2be6c7c6e182b6ba9f2232cd"
  python-fuelclient_sha: "e19f1b65792f84c4a18b5a9473f85ef3ba172fce"
  astute_sha: "96801c5bccb14aa3f2a0d7f27f4a4b6dd2b4a548"
  fuel-library_sha: "2814c51668f487e97e1449b078bad1942421e6b9"
  fuel-ostf_sha: "9ce1800749081780b8b2a4a7eab6586583ffaf33"
  fuelmain_sha: "68796aeaa7b669e68bc0976ffd616709c937187a"

Changed in fuel:
status: Fix Committed → Fix Released
Revision history for this message
Denis Ipatov (dipatov) wrote :

Your test does not show anything.
You need to have:
1. Working cluster with some data, not only one TestVM image. Best way is emulate production cluster with 40-50% using of space in the cluster.
2. Add several new OSD nodes. Count of the new nodes should be more than replica factor.
3. If you see an error you need to delete these nodes. (For test you can delete the nodes after installing ceph osd. After start rebalancing but before end: in middle of the proccess). More data in Ceph cluster allow find this error easier.

In our case a customer had replication's factor 3 and he added 10 nodes with 12 disks. During installation was an deployment's error. The customer deleted these nodes. He lost around 4-5% of data.

Denis Ipatov (dipatov)
Changed in fuel:
status: Fix Released → In Progress
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Folks, we had a patch that actually should show a warning about such possible implications. Actually, the best option here is to document this behaviout and fix it in 7.0 if it retains. User can delete nodes waiting for ceph to rebalance after each deletion operation.

Revision history for this message
Stanislav Makar (smakar) wrote :

Requirements to check this patch:
 - hardware 10 osd nodes
 - cluster should is at least - half full, a lot of VMs are running
 - Add new ceph osd node >= replication's factor (as dipatov wrote)

For more details please read https://bugs.launchpad.net/fuel/+bug/1374969/comments/3

Changed in fuel:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.