Fuel for OpenStack

[doc] Ceph OSD disks are lost at node reboot

Bug #1416855 reported by Dmitriy Novakovskiy on 2015-02-01

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Fuel for OpenStack	Fix Released	High	Fuel Documentation Team	Fuel for OpenStack 9.0
5.0.x	Won't Fix	High	Fuel Documentation Team	Fuel for OpenStack 5.0-updates
5.1.x	Won't Fix	High	Fuel Documentation Team	Fuel for OpenStack 5.1.1-updates
6.0.x	Won't Fix	High	Fuel Documentation Team	Fuel for OpenStack 6.0-updates
6.1.x	Fix Released	High	Fuel Documentation Team	Fuel for OpenStack 6.1-updates
7.0.x	Fix Released	Undecided	Fuel Documentation Team	Fuel for OpenStack 7.0-updates
8.0.x	Fix Released	Undecided	Fuel Documentation Team	Fuel for OpenStack 8.0
Future	Invalid	Undecided	Fuel Documentation Team	Fuel for OpenStack next
Mitaka	Fix Released	High	Fuel Documentation Team	Fuel for OpenStack 9.0

Bug Description

In one of customer deployments we faced the situation when host OS kernel would initialize multiple backplanes with disks in random order at boot time. This causes OSD disks that Fuel deployed w/ ceph-deploy utility to be lost from OSD after node reboot, since they were mounted via /dev/sdXXX pointers and these numbers become different at every boot.

The solution is to mount OSD disks using UUID instead of sdXXX names. It's needs to be checked whether ceph-deploy utility can do it (so Fuel could just pass additional parameter) or a more sophisticated approach is needed to solve this problem.

The following document describes the manual OSD management steps for the deployment where we first found this: http://goo.gl/SPZGFC. Also, Miroslav Anashkin (<email address hidden>) has detailed context

Tags:

Nastya Urlapova (aurlapova) on 2015-02-01

Changed in fuel:
assignee:	nobody → Fuel Library Team (fuel-library)
milestone:	none → 6.1

Oleksiy Molchanov (omolchanov) on 2015-02-01

Changed in fuel:
importance:	Undecided → High
status:	New → Confirmed

Mike Scherbakov (mihgen) on 2015-02-01

tags:

added: customer-found

Bogdan Dobrelya (bogdando) on 2015-02-02

Changed in fuel:
status:	Confirmed → Triaged

Oleksiy Molchanov (omolchanov) on 2015-02-02

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Oleksiy Molchanov (omolchanov)

Oleksiy Molchanov (omolchanov) on 2015-02-02

Changed in fuel:
status:	Triaged → In Progress

Revision history for this message

Oleksiy Molchanov (omolchanov) wrote on 2015-02-04:

This issue doesn't affect 6.1

Changed in fuel:
milestone:	6.1 → 5.0.3

Revision history for this message

Dmitriy Novakovskiy (dnovakovskiy) wrote on 2015-02-04:

>>This issue doesn't affect 6.1

Why? What was changed in how Ceph is deployed in 6.1?

Revision history for this message

Ryan Moe (rmoe) wrote on 2015-02-05:

When Fuel deploys Ceph we set the GPT partition typecode (using sgdisk [0]) for OSD and journal partitions. Ceph installs udev rules [1] that will find all partitions containing these GUIDs and activate them as needed. If you're going to deploy new OSDs manually you'll probably want to set the partition GUIDs.

[0] https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/cobbler/templates/scripts/pmanager.py#L885
[1] https://github.com/ceph/ceph/blob/master/udev/95-ceph-osd.rules

Revision history for this message

Dmitriy Novakovskiy (dnovakovskiy) wrote on 2015-02-07:

Ryan, do I understand correctly that you're describing how Ceph deployment is done in 6.1?

Miroslav, please verify that the approach that Ryan describes will help

Revision history for this message

Ryan Moe (rmoe) wrote on 2015-02-09:

This is how Fuel has deployed Ceph since the feature was added.

Revision history for this message

Dmitriy Novakovskiy (dnovakovskiy) wrote on 2015-02-09: Re: [Bug 1416855] Re: Ceph OSD disks are lost at node reboot

Then this needs to be reviewed by Miroslav. He first observed this problem at
customer installation

On Monday, February 9, 2015, Ryan Moe <email address hidden> wrote:

> This is how Fuel has deployed Ceph since the feature was added.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1416855
>
> Title:
> Ceph OSD disks are lost at node reboot
>
> Status in Fuel: OpenStack installer that works:
> In Progress
>
> Bug description:
> In one of customer deployments we faced the situation when host OS
> kernel would initialize multiple backplanes with disks in random order
> at boot time. This causes OSD disks that Fuel deployed w/ ceph-deploy
> utility to be lost from OSD after node reboot, since they were mounted
> via /dev/sdXXX pointers and these numbers become different at every
> boot.
>
> The solution is to mount OSD disks using UUID instead of sdXXX names.
> It's needs to be checked whether ceph-deploy utility can do it (so
> Fuel could just pass additional parameter) or a more sophisticated
> approach is needed to solve this problem.
>
> The following document describes the manual OSD management steps for
> the deployment where we first found this: http://goo.gl/SPZGFC. Also,
> Miroslav Anashkin (<email address hidden> <javascript:;>) has detailed
> context
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/fuel/+bug/1416855/+subscriptions
>

--
---
Regards,

*Dmitriy Novakovskiy*
Sr. Sales Engineer, Mirantis EMEA

*Skype:* dmitriy.novakovskiy
*Operating from:* Ukraine

Oleksiy Molchanov (omolchanov) on 2015-02-10

Changed in fuel:
assignee:	Oleksiy Molchanov (omolchanov) → Miroslav Anashkin (manashkin)

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-02-17: Re: Ceph OSD disks are lost at node reboot

That was a version of Fuel for the reported issue?

Revision history for this message

Sergii Golovatiuk (sgolovatiuk) wrote on 2015-02-17:

Issue with OSD happens only when disks are added manually after deployment phase, during operation.

Revision history for this message

Aleksandr Didenko (adidenko) wrote on 2015-02-17:

Ryan and Sergii are right, such problem may affect only disks that were added manually after deployment. I've checked on 5.0.1 env I have and udev rules are in place and osd partitions have GUID set, so marking this invalid for 5.0.

Revision history for this message

Ryan Moe (rmoe) wrote on 2015-02-18:

#10

This is invalid for 5.1 and 6.0 for the same reasons it's invalid for 6.1 and 5.0. We set the partition GUID during provisioning and the udev rules are present in both of those releases.

Revision history for this message

Dmitry Borodaenko (angdraug) wrote on 2015-02-18:

#11

Please document the correct procedure to add disks to deployed OSD nodes in the Operations Guide. See the document linked from bug description for reference.

tags:

added: docs

Revision history for this message

Denis Klepikov (dklepikov) wrote on 2015-02-20:

#12

Draft "How to add OSD with mount by UDEV on reboot, with notes."

https://docs.google.com/a/mirantis.com/document/d/18gPSkw4Cg3cV5mHF-O3OxATMqPSatiIBwUw-l-uD1-k/edit?usp=sharing

Comments are welcome.

Revision history for this message

Miroslav Anashkin (manashkin) wrote on 2015-04-01:

#13

Published bulletin:
https://online.mirantis.com/hubfs/Mirantis-Technical-Bulletin-5-Removing-Ceph-OSD-node.pdf?t=1427907150102

Revision history for this message

Dmitriy Novakovskiy (dnovakovskiy) wrote on 2015-04-01: Re: [Bug 1416855] Re: Ceph OSD disks are lost at node reboot

#14

it seems to be dealing with another issue - removing OSD nodes from Fuel
UI, not OSD getting lost at reboot

On Wed, Apr 1, 2015 at 8:31 PM, Miroslav Anashkin <email address hidden>
wrote:

> Published bulletin:
>
> https://online.mirantis.com/hubfs/Mirantis-Technical-Bulletin-5-Removing-Ceph-OSD-node.pdf?t=1427907150102
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1416855
>
> Title:
> Ceph OSD disks are lost at node reboot
>
> Status in Fuel: OpenStack installer that works:
> Confirmed
> Status in Fuel for OpenStack 5.0.x series:
> Won't Fix
> Status in Fuel for OpenStack 5.1.x series:
> Won't Fix
> Status in Fuel for OpenStack 6.0.x series:
> Confirmed
> Status in Fuel for OpenStack 6.1.x series:
> Confirmed
>
> Bug description:
> In one of customer deployments we faced the situation when host OS
> kernel would initialize multiple backplanes with disks in random order
> at boot time. This causes OSD disks that Fuel deployed w/ ceph-deploy
> utility to be lost from OSD after node reboot, since they were mounted
> via /dev/sdXXX pointers and these numbers become different at every
> boot.
>
> The solution is to mount OSD disks using UUID instead of sdXXX names.
> It's needs to be checked whether ceph-deploy utility can do it (so
> Fuel could just pass additional parameter) or a more sophisticated
> approach is needed to solve this problem.
>
> The following document describes the manual OSD management steps for
> the deployment where we first found this: http://goo.gl/SPZGFC. Also,
> Miroslav Anashkin (<email address hidden>) has detailed context
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/fuel/+bug/1416855/+subscriptions
>

Revision history for this message

Andrey Grebennikov (agrebennikov) wrote on 2015-04-21: Re: Ceph OSD disks are lost at node reboot

#15

This still doesn't help. In 6.0 right now we are experiencing this issue. We have 2 disks for journal, one for OS and 10 for OSDs. When node is bootstrapped, sda and sdb were assigned to be journals, sdc turned into OS, remaining disks became OSDs. In puppet log though I see /dev/sdl and /dev/sdk are the journal disks. If I reboot the node now, these last two disks will become /dev/sda and /dev/sdb and no OSD will be able to start anymore since they will not be able to find their journals.

Bogdan Dobrelya (bogdando) on 2015-05-11

summary:

- Ceph OSD disks are lost at node reboot
+ [doc] Ceph OSD disks are lost at node reboot

Revision history for this message

Igor Shishkin (teran) wrote on 2015-08-07:

#16

Moving to 6.1-updates since 6.1 is already released.

Changed in fuel:
milestone:	6.1 → 6.1-updates

Fuel Devops McRobotson (fuel-devops-robot) on 2015-12-30

Changed in fuel:
milestone:	6.1-updates → 9.0
status:	Triaged → New

Dmitry Pyzhov (dpyzhov) on 2016-02-03

tags:

added: area-docs

Revision history for this message

Miroslav Anashkin (manashkin) wrote on 2016-03-11:

#17

Ceph-deploy 1.5.20 uses disk IDs to link journals to.
So, issue is fixed in 6.1 and higher versions.
Marked these versions as fix released, since we ship Ceph version with fix for these versions.

Revision history for this message

Alexey Stupnikov (astupnikov) wrote on 2017-05-29:

#18

MOS 5.0, MOS 5.1 and MOS6.0 are no longer supported. Moving to Won't Fix.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.