Fuel for OpenStack

different ceph-osd use the same journals on SSD

Bug #1280752 reported by Gleb on 2014-02-16

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Released	High	Ryan Moe	Fuel for OpenStack 4.1

Bug Description

{"ostf_sha": "83ada35fec2664089e07fdc0d34861ae2a4d948a", "fuelmain_sha": "17eed776b30886851ae0042fa7a30184f5cd8eb6", "astute_sha": "8b2059a37be9bd82df49f684822727b4df4c511b", "release": "4.0", "nailgun_sha": "ac02e18990cd652db6577ce42bdea9838076c63c", "fuellib_sha": "098f381ff8a528a39d3b6f17ea70955baeb159e8"}

Deployment ceph-osd node with 2 SSD (for journal).

After the successful deployment we have this situation

root@node-25:~# find /var/lib/ceph/osd/ -name journal -exec ls -la {} \;
lrwxrwxrwx 1 root root 9 Feb 15 11:15 /var/lib/ceph/osd/ceph-66/journal -> /dev/sdb2
lrwxrwxrwx 1 root root 9 Feb 15 11:14 /var/lib/ceph/osd/ceph-57/journal -> /dev/sda9
lrwxrwxrwx 1 root root 9 Feb 15 11:14 /var/lib/ceph/osd/ceph-8/journal -> /dev/sda3
lrwxrwxrwx 1 root root 9 Feb 15 11:14 /var/lib/ceph/osd/ceph-49/journal -> /dev/sda8
lrwxrwxrwx 1 root root 9 Feb 15 11:15 /var/lib/ceph/osd/ceph-83/journal -> /dev/sdb4
lrwxrwxrwx 1 root root 9 Feb 15 11:14 /var/lib/ceph/osd/ceph-9/journal -> /dev/sda3
lrwxrwxrwx 1 root root 9 Feb 15 11:14 /var/lib/ceph/osd/ceph-45/journal -> /dev/sda7
lrwxrwxrwx 1 root root 9 Feb 15 11:14 /var/lib/ceph/osd/ceph-21/journal -> /dev/sda4
lrwxrwxrwx 1 root root 9 Feb 15 11:14 /var/lib/ceph/osd/ceph-29/journal -> /dev/sda5
lrwxrwxrwx 1 root root 9 Feb 15 11:14 /var/lib/ceph/osd/ceph-41/journal -> /dev/sda7
lrwxrwxrwx 1 root root 9 Feb 15 11:15 /var/lib/ceph/osd/ceph-74/journal -> /dev/sdb3
lrwxrwxrwx 1 root root 9 Feb 15 11:15 /var/lib/ceph/osd/ceph-78/journal -> /dev/sdb3
lrwxrwxrwx 1 root root 9 Feb 15 11:14 /var/lib/ceph/osd/ceph-17/journal -> /dev/sda4
lrwxrwxrwx 1 root root 9 Feb 15 11:14 /var/lib/ceph/osd/ceph-6/journal -> /dev/sda2
lrwxrwxrwx 1 root root 9 Feb 15 11:14 /var/lib/ceph/osd/ceph-25/journal -> /dev/sda5
lrwxrwxrwx 1 root root 9 Feb 15 11:15 /var/lib/ceph/osd/ceph-70/journal -> /dev/sdb2
lrwxrwxrwx 1 root root 9 Feb 15 11:14 /var/lib/ceph/osd/ceph-33/journal -> /dev/sda6
lrwxrwxrwx 1 root root 9 Feb 15 11:14 /var/lib/ceph/osd/ceph-1/journal -> /dev/sda2
lrwxrwxrwx 1 root root 9 Feb 15 11:14 /var/lib/ceph/osd/ceph-62/journal -> /dev/sda9
lrwxrwxrwx 1 root root 9 Feb 15 11:14 /var/lib/ceph/osd/ceph-37/journal -> /dev/sda6
lrwxrwxrwx 1 root root 9 Feb 15 11:14 /var/lib/ceph/osd/ceph-54/journal -> /dev/sda8

Every SSD partition are used twice by different ceph-osd processes.

That's another command
root@node-25:~# grep 'Running command' /root/ceph.log
2014-02-15 11:14:13,211 [node-25][INFO ] Running command: udevadm trigger --subsystem-match=block --action=add
2014-02-15 11:14:13,290 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdd2 /dev/sda2
2014-02-15 11:14:15,696 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sde2 /dev/sda2
2014-02-15 11:14:19,596 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdf2 /dev/sda3
2014-02-15 11:14:23,102 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdg2 /dev/sda3
2014-02-15 11:14:26,725 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdh2 /dev/sda4
2014-02-15 11:14:30,207 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdi2 /dev/sda4
2014-02-15 11:14:32,951 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdj2 /dev/sda5
2014-02-15 11:14:35,935 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdk2 /dev/sda5
2014-02-15 11:14:38,422 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdl2 /dev/sda6
2014-02-15 11:14:42,011 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdm2 /dev/sda6
2014-02-15 11:14:43,990 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdn2 /dev/sda7
2014-02-15 11:14:45,742 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdo2 /dev/sda7
2014-02-15 11:14:48,628 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdp2 /dev/sda8
2014-02-15 11:14:50,352 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdq2 /dev/sda8
2014-02-15 11:14:52,977 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdr2 /dev/sda9
2014-02-15 11:14:55,607 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sds2 /dev/sda9
2014-02-15 11:14:58,955 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdt2 /dev/sdb2
2014-02-15 11:15:02,629 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdu2 /dev/sdb2
2014-02-15 11:15:04,643 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdv2 /dev/sdb3
2014-02-15 11:15:07,295 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdw2 /dev/sdb3
2014-02-15 11:15:09,395 [node-25][INFO ] Running command: ceph-disk-prepare --fs-type xfs --cluster ceph -- /dev/sdx2 /dev/sdb4

Also number of journal partitions is not enough. We have 22 ceph-osd and only 21 journal partition

root@node-25:~# parted /dev/sda print
Model: HP LOGICAL VOLUME (scsi)
Disk /dev/sda: 400GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
1 17.4kB 25.2MB 25.1MB primary bios_grub
2 25.2MB 10.8GB 10.7GB primary
3 10.8GB 21.5GB 10.7GB primary
4 21.5GB 32.2GB 10.7GB primary
5 32.2GB 43.0GB 10.7GB primary
6 43.0GB 53.7GB 10.7GB primary
7 53.7GB 64.4GB 10.7GB primary
8 64.4GB 75.2GB 10.7GB primary
9 75.2GB 85.9GB 10.7GB primary
10 85.9GB 96.7GB 10.7GB primary
11 96.7GB 107GB 10.7GB primary
12 107GB 118GB 10.7GB primary

root@node-25:~# parted /dev/sdb print
Model: HP LOGICAL VOLUME (scsi)
Disk /dev/sdb: 400GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

See original description

Tags:

Andrey Korolyov (xdeller) on 2014-02-16

Changed in fuel:
importance:	Undecided → Critical
assignee:	nobody → Dmitry Borodaenko (dborodaenko)
milestone:	none → 4.1

Gleb (gleb-q) on 2014-02-16

description:

updated

Revision history for this message

Andrey Korolyov (xdeller) wrote on 2014-02-16:

Some kind of fix. Edit (1.3 KiB, text/x-sh)

Mike Scherbakov (mihgen) on 2014-02-17

tags:

added: ceph customer-found

Revision history for this message

Roman Sokolkov (rsokolkov) wrote on 2014-02-17:

I also have 3 nodes with 2 equal SSDs per server and 20 OSDs per server.

Andrey, does this actions are right before your script?

http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#stopping-w-out-rebalancing

My guess:

node1# ceph osd set noout
node1# for i in id1 id2 id3...; do ceph osd stop osd.$i; done
node1# ./ultimate_dangerous_script_beware_of_dragons.sh /dev/sdb /dev/sdw <--- SSDs
node1# for i in id1 id2 id3...; do ceph osd start osd.$i; done
node1# ceph osd unset noout

And same for node2, node3.

Revision history for this message

Gleb (gleb-q) wrote on 2014-02-17:

logs.tar.gz Edit (58.8 KiB, application/x-tar)

There are some logs from the environment.
But unfortunately json and yaml contain irrelevant disk information after the deployment.
(see the bug https://bugs.launchpad.net/fuel/+bug/1280978)

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2014-02-17:

Looks like provisioning serializer generates incorrect data for installer leaving single partition for journal instead of journal-per-disk mapping.

Changed in fuel:
status:	New → Triaged
assignee:	Dmitry Borodaenko (dborodaenko) → Fuel Python Team (fuel-python)

Revision history for this message

Gleb (gleb-q) wrote on 2014-02-17:

What exactly I did in GUI.
I have 22 SAS HDD and 2 SSD on the node.
So I assign entire 2 SSD as Ceph Journals , assigh one of SAS as Base System and assign other 21 SAS as Ceph.

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2014-02-17:

This is not what is expected from the user. Currently, you will need partition these disks into 21 partitions, each one representing dedicated ceph journal for each corresponding data disk. This is currently not achievable. You can alter disk configuration using API.

Changed in fuel:
importance:	Critical → Medium
milestone:	4.1 → 5.0

Revision history for this message

Gleb (gleb-q) wrote on 2014-02-18:

You should write it in the disk configuration dialog with red letters.

I was said that automatic partitioning for journal is worked out of the box. So I did it this way.
And if we look into the original post with find /var/lib/ceph/osd/ -name journal -exec ls -la {} \; we can see that Fuel nevertheless made a number of partitions for different osd but did it wrong way.

Revision history for this message

Andrew Woodward (xarses) wrote on 2014-02-18:

It looks like the partitions where created correctly, but that the puppet fact that is supposed to pair them is broken.

See https://github.com/stackforge/fuel-library/blob/master/deployment/puppet/ceph/lib/facter/ceph_osd.rb#L30-35

Andrew Woodward (xarses) on 2014-02-18

Changed in fuel:
milestone:	5.0 → 4.1
importance:	Medium → High
assignee:	Fuel Python Team (fuel-python) → Ryan Moe (rmoe)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-02-19: Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/74541

Changed in fuel:
status:	Triaged → In Progress

Vladimir Kuklin (vkuklin) on 2014-02-19

tags:

added: release-notes

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-02-19: Fix merged to fuel-library (master)

#10

Reviewed: https://review.openstack.org/74541
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=491998df46eb83104ed53c3610f09dff373d45c1
Submitter: Jenkins
Branch: master

commit 491998df46eb83104ed53c3610f09dff373d45c1
Author: Ryan Moe <email address hidden>
Date: Tue Feb 18 15:41:12 2014 -0800

Assign journal devices to OSDs sequentially

If there are morei OSDs than journals then the
remaining OSDs will place its journal on its disk.

Closes-bug: #1280752
Change-Id: If78615450b8daaece69dca9d6cd409f79eb3ac01

Changed in fuel:
status:	In Progress → Fix Committed

Andrew Woodward (xarses) on 2014-03-24

Changed in fuel:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.