kolla

Bug #1776888
Activity log

Activity log for bug #1776888

Date	Who	What changed	Old value	New value	Message
2018-06-14 10:42:21	wangwei	bug			added bug
2018-06-14 11:37:54	wangwei	description	I tested the latest bluestore code, it is the patch of Mr Tonezhang: https://review.openstack.org/#/c/566810/ https://review.openstack.org/#/c/566801/9 1. In my tests, I encountered a problem that osd bootstrap failed when executing this command: ceph-osd -i "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" the los as follows: ``` ++ partprobe ++ ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_B_2 /var/lib/ceph/osd/ceph-2/block ++ '[' -n '' ']' ++ '[' -n '' ']' ++ ceph-osd -i 2 --mkfs -k /var/lib/ceph/osd/ceph-2/keyring --osd-uuid b5703869-87d1-4ab8-be11-ab24db2870cc ``` So I add "-d" parameter to debug the problem: ceph-osd -i -d "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" ``` ++ ceph-osd -d -i 0 --mkfs -k /var/lib/ceph/osd/ceph-0/keyring --osd-uuid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.216034 7f808b6a0d80 0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 78 2018-06-13 17:29:53.243358 7f808b6a0d80 0 stack NetworkStack max thread limit is 24, switching to this now. Higher thread values are unnecessary and currently unsupported. 2018-06-13 17:29:53.248479 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs path /var/lib/ceph/osd/ceph-0 2018-06-13 17:29:53.248676 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.248714 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.249134 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) _read_fsid unparsable uuid 2018-06-13 17:29:53.249141 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs using provided fsid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.249361 7f808b6a0d80 1 bdev create path /var/lib/ceph/osd/ceph-0/block type kernel 2018-06-13 17:29:53.249372 7f808b6a0d80 1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2018-06-13 17:29:53.249400 7f808b6a0d80 -1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open open got: (2) No such file or directory 2018-06-13 17:29:53.249654 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs failed, (2) No such file or directory 2018-06-13 17:29:53.249662 7f808b6a0d80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory 2018-06-13 17:29:53.249950 7f808b6a0d80 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directory ``` After my testing, I found that after executing this command: ``` sgdisk "--change-name=2:KOLLA_CEPH_DATA_BS_B_${OSD_ID}" "--typecode=2:${CEPH_OSD_TYPE_CODE}" -- "${OSD_BS_BLOCK_DEV}" ``` It took 3 seconds to generate the by-partlabel folder and partlabel on my centos virtual machine, but the partuuid was generated immediately without delay. So I think using partuuid is better than partlabel, when initializing ceph. In the ceph-deploy tool, the command to initialize osd is 'ceph-disk preapre', which also uses partuuid. 2. In the current find_disks.py logic, both bluestore and filestore return all disk information, including osd partition, journal partition, block partition, wal partition and db partition. ``` "bs_db_device": "", "bs_db_label": "", "bs_db_partition_num": "", "bs_wal_device": "", "bs_wal_label": "", "bs_wal_partition_num": "", "device": "/dev/xvdb", "external_journal": false, "fs_label": "", "fs_uuid": "cd711f44-2fa8-41c8-8f74-b43e96758edd", "journal": "", "journal_device": "", "journal_num": 0, "partition": "/dev/xvdb", "partition_label": "KOLLA_CEPH_OSD_BOOTSTRAP_BS", "partition_num": "1" ``` There is a bit of confusion here. In fact, in the filestore, there are only osd partition and journal partition. In the bluestore, there are osd data partition, block partition, wal partition and db partition. I think we should distinguish between the bluestore and filestore disk information. 3. The osd partition lable after successful initialization is as follows: ``` KOLLA_CEPH_BSDATA_1 KOLLA_CEPH_DATA_BS_B_1 KOLLA_CEPH_DATA_BS_D_1 KOLLA_CEPH_DATA_BS_W_1 ``` The prefix is different so we can't find the disk as the filestore's logic. So I think a good way to name it like this: ``` KOLLA_CEPH_DATA_BS_1 KOLLA_CEPH_DATA_BS_1_B KOLLA_CEPH_DATA_BS_1_D KOLLA_CEPH_DATA_BS_1_W ``` Regular naming can reduce some code.	I tested the latest bluestore code, it is the patch of Mr Tonezhang: kolla: https://review.openstack.org/#/c/566810/ kolla-ansible: https://review.openstack.org/#/c/566801/9 1. In my tests, I encountered a problem that osd bootstrap failed when executing this command: ceph-osd -i "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" the los as follows: ``` ++ partprobe ++ ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_B_2 /var/lib/ceph/osd/ceph-2/block ++ '[' -n '' ']' ++ '[' -n '' ']' ++ ceph-osd -i 2 --mkfs -k /var/lib/ceph/osd/ceph-2/keyring --osd-uuid b5703869-87d1-4ab8-be11-ab24db2870cc ``` So I add "-d" parameter to debug the problem: ceph-osd -i -d "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" ``` ++ ceph-osd -d -i 0 --mkfs -k /var/lib/ceph/osd/ceph-0/keyring --osd-uuid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.216034 7f808b6a0d80 0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 78 2018-06-13 17:29:53.243358 7f808b6a0d80 0 stack NetworkStack max thread limit is 24, switching to this now. Higher thread values are unnecessary and currently unsupported. 2018-06-13 17:29:53.248479 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs path /var/lib/ceph/osd/ceph-0 2018-06-13 17:29:53.248676 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.248714 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.249134 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) _read_fsid unparsable uuid 2018-06-13 17:29:53.249141 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs using provided fsid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.249361 7f808b6a0d80 1 bdev create path /var/lib/ceph/osd/ceph-0/block type kernel 2018-06-13 17:29:53.249372 7f808b6a0d80 1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2018-06-13 17:29:53.249400 7f808b6a0d80 -1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open open got: (2) No such file or directory 2018-06-13 17:29:53.249654 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs failed, (2) No such file or directory 2018-06-13 17:29:53.249662 7f808b6a0d80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory 2018-06-13 17:29:53.249950 7f808b6a0d80 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directory ``` After my testing, I found that after executing this command: ``` sgdisk "--change-name=2:KOLLA_CEPH_DATA_BS_B_${OSD_ID}" "--typecode=2:${CEPH_OSD_TYPE_CODE}" -- "${OSD_BS_BLOCK_DEV}" ``` It took 3 seconds to generate the by-partlabel folder and partlabel on my centos virtual machine, but the partuuid was generated immediately without delay. So I think using partuuid is better than partlabel, when initializing ceph. In the ceph-deploy tool, the command to initialize osd is 'ceph-disk preapre', which also uses partuuid. 2. In the current find_disks.py logic, both bluestore and filestore return all disk information, including osd partition, journal partition, block partition, wal partition and db partition. ``` "bs_db_device": "", "bs_db_label": "", "bs_db_partition_num": "", "bs_wal_device": "", "bs_wal_label": "", "bs_wal_partition_num": "", "device": "/dev/xvdb", "external_journal": false, "fs_label": "", "fs_uuid": "cd711f44-2fa8-41c8-8f74-b43e96758edd", "journal": "", "journal_device": "", "journal_num": 0, "partition": "/dev/xvdb", "partition_label": "KOLLA_CEPH_OSD_BOOTSTRAP_BS", "partition_num": "1" ``` There is a bit of confusion here. In fact, in the filestore, there are only osd partition and journal partition. In the bluestore, there are osd data partition, block partition, wal partition and db partition. I think we should distinguish between the bluestore and filestore disk information. 3. The osd partition lable after successful initialization is as follows: ``` KOLLA_CEPH_BSDATA_1 KOLLA_CEPH_DATA_BS_B_1 KOLLA_CEPH_DATA_BS_D_1 KOLLA_CEPH_DATA_BS_W_1 ``` The prefix is different so we can't find the disk as the filestore's logic. So I think a good way to name it like this: ``` KOLLA_CEPH_DATA_BS_1 KOLLA_CEPH_DATA_BS_1_B KOLLA_CEPH_DATA_BS_1_D KOLLA_CEPH_DATA_BS_1_W ``` Regular naming can reduce some code. I did some optimizations for the above aspects (based on tonezhang's patch): kolla: https://review.openstack.org/#/c/575400/ kolla-ansible: https://review.openstack.org/#/c/575408/
2018-06-14 11:51:07	wangwei	kolla: assignee		wangwei (wangwei-david)
2018-06-27 12:12:36	OpenStack Infra	kolla: status	New	In Progress
2018-06-28 02:29:22	wangwei	description	I tested the latest bluestore code, it is the patch of Mr Tonezhang: kolla: https://review.openstack.org/#/c/566810/ kolla-ansible: https://review.openstack.org/#/c/566801/9 1. In my tests, I encountered a problem that osd bootstrap failed when executing this command: ceph-osd -i "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" the los as follows: ``` ++ partprobe ++ ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_B_2 /var/lib/ceph/osd/ceph-2/block ++ '[' -n '' ']' ++ '[' -n '' ']' ++ ceph-osd -i 2 --mkfs -k /var/lib/ceph/osd/ceph-2/keyring --osd-uuid b5703869-87d1-4ab8-be11-ab24db2870cc ``` So I add "-d" parameter to debug the problem: ceph-osd -i -d "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" ``` ++ ceph-osd -d -i 0 --mkfs -k /var/lib/ceph/osd/ceph-0/keyring --osd-uuid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.216034 7f808b6a0d80 0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 78 2018-06-13 17:29:53.243358 7f808b6a0d80 0 stack NetworkStack max thread limit is 24, switching to this now. Higher thread values are unnecessary and currently unsupported. 2018-06-13 17:29:53.248479 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs path /var/lib/ceph/osd/ceph-0 2018-06-13 17:29:53.248676 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.248714 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.249134 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) _read_fsid unparsable uuid 2018-06-13 17:29:53.249141 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs using provided fsid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.249361 7f808b6a0d80 1 bdev create path /var/lib/ceph/osd/ceph-0/block type kernel 2018-06-13 17:29:53.249372 7f808b6a0d80 1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2018-06-13 17:29:53.249400 7f808b6a0d80 -1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open open got: (2) No such file or directory 2018-06-13 17:29:53.249654 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs failed, (2) No such file or directory 2018-06-13 17:29:53.249662 7f808b6a0d80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory 2018-06-13 17:29:53.249950 7f808b6a0d80 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directory ``` After my testing, I found that after executing this command: ``` sgdisk "--change-name=2:KOLLA_CEPH_DATA_BS_B_${OSD_ID}" "--typecode=2:${CEPH_OSD_TYPE_CODE}" -- "${OSD_BS_BLOCK_DEV}" ``` It took 3 seconds to generate the by-partlabel folder and partlabel on my centos virtual machine, but the partuuid was generated immediately without delay. So I think using partuuid is better than partlabel, when initializing ceph. In the ceph-deploy tool, the command to initialize osd is 'ceph-disk preapre', which also uses partuuid. 2. In the current find_disks.py logic, both bluestore and filestore return all disk information, including osd partition, journal partition, block partition, wal partition and db partition. ``` "bs_db_device": "", "bs_db_label": "", "bs_db_partition_num": "", "bs_wal_device": "", "bs_wal_label": "", "bs_wal_partition_num": "", "device": "/dev/xvdb", "external_journal": false, "fs_label": "", "fs_uuid": "cd711f44-2fa8-41c8-8f74-b43e96758edd", "journal": "", "journal_device": "", "journal_num": 0, "partition": "/dev/xvdb", "partition_label": "KOLLA_CEPH_OSD_BOOTSTRAP_BS", "partition_num": "1" ``` There is a bit of confusion here. In fact, in the filestore, there are only osd partition and journal partition. In the bluestore, there are osd data partition, block partition, wal partition and db partition. I think we should distinguish between the bluestore and filestore disk information. 3. The osd partition lable after successful initialization is as follows: ``` KOLLA_CEPH_BSDATA_1 KOLLA_CEPH_DATA_BS_B_1 KOLLA_CEPH_DATA_BS_D_1 KOLLA_CEPH_DATA_BS_W_1 ``` The prefix is different so we can't find the disk as the filestore's logic. So I think a good way to name it like this: ``` KOLLA_CEPH_DATA_BS_1 KOLLA_CEPH_DATA_BS_1_B KOLLA_CEPH_DATA_BS_1_D KOLLA_CEPH_DATA_BS_1_W ``` Regular naming can reduce some code. I did some optimizations for the above aspects (based on tonezhang's patch): kolla: https://review.openstack.org/#/c/575400/ kolla-ansible: https://review.openstack.org/#/c/575408/	I tested the latest bluestore code, it is the patch of Mr Tonezhang: kolla: https://review.openstack.org/#/c/566810/ kolla-ansible: https://review.openstack.org/#/c/566801/9 1. In my tests, I encountered a problem that osd bootstrap failed when executing this command: ceph-osd -i "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" the los as follows: ``` ++ partprobe ++ ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_B_2 /var/lib/ceph/osd/ceph-2/block ++ '[' -n '' ']' ++ '[' -n '' ']' ++ ceph-osd -i 2 --mkfs -k /var/lib/ceph/osd/ceph-2/keyring --osd-uuid b5703869-87d1-4ab8-be11-ab24db2870cc ``` So I add "-d" parameter to debug the problem: ceph-osd -i -d "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" ``` ++ ceph-osd -d -i 0 --mkfs -k /var/lib/ceph/osd/ceph-0/keyring --osd-uuid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.216034 7f808b6a0d80 0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 78 2018-06-13 17:29:53.243358 7f808b6a0d80 0 stack NetworkStack max thread limit is 24, switching to this now. Higher thread values are unnecessary and currently unsupported. 2018-06-13 17:29:53.248479 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs path /var/lib/ceph/osd/ceph-0 2018-06-13 17:29:53.248676 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.248714 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.249134 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) _read_fsid unparsable uuid 2018-06-13 17:29:53.249141 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs using provided fsid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.249361 7f808b6a0d80 1 bdev create path /var/lib/ceph/osd/ceph-0/block type kernel 2018-06-13 17:29:53.249372 7f808b6a0d80 1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2018-06-13 17:29:53.249400 7f808b6a0d80 -1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open open got: (2) No such file or directory 2018-06-13 17:29:53.249654 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs failed, (2) No such file or directory 2018-06-13 17:29:53.249662 7f808b6a0d80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory 2018-06-13 17:29:53.249950 7f808b6a0d80 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directory ``` After my testing, I found that after executing this command: ``` sgdisk "--change-name=2:KOLLA_CEPH_DATA_BS_B_${OSD_ID}" "--typecode=2:${CEPH_OSD_TYPE_CODE}" -- "${OSD_BS_BLOCK_DEV}" ``` It took 3 seconds to generate the by-partlabel folder and partlabel on my centos virtual machine, but the partuuid was generated immediately without delay. So I think using partuuid is better than partlabel, when initializing ceph. In the ceph-deploy tool, the command to initialize osd is 'ceph-disk preapre', which also uses partuuid. 2. In the current find_disks.py logic, both bluestore and filestore return all disk information, including osd partition, journal partition, block partition, wal partition and db partition. ``` "bs_db_device": "", "bs_db_label": "", "bs_db_partition_num": "", "bs_wal_device": "", "bs_wal_label": "", "bs_wal_partition_num": "", "device": "/dev/xvdb", "external_journal": false, "fs_label": "", "fs_uuid": "cd711f44-2fa8-41c8-8f74-b43e96758edd", "journal": "", "journal_device": "", "journal_num": 0, "partition": "/dev/xvdb", "partition_label": "KOLLA_CEPH_OSD_BOOTSTRAP_BS", "partition_num": "1" ``` There is a bit of confusion here. In fact, in the filestore, there are only osd partition and journal partition. In the bluestore, there are osd data partition, block partition, wal partition and db partition. I think we should distinguish between the bluestore and filestore disk information， like this: ```bluestore "osds_bootstrap": [ { "bs_blk_device": "/dev/xvdb", "bs_blk_partition": "/dev/xvdb4", "bs_blk_partition_num": "4", "bs_blk_partition_uuid": "f3597572-96df-4221-ac7d-e42ab321717b", "bs_db_device": "/dev/xvdb", "bs_db_partition": "/dev/xvdb3", "bs_db_partition_num": "3", "bs_db_partition_uuid": "a986cc7a-65c0-4467-87de-2df37d059aac", "bs_wal_device": "/dev/xvdb", "bs_wal_partition": "/dev/xvdb2", "bs_wal_partition_num": "2", "bs_wal_partition_uuid": "a678b905-8738-4913-8bf3-c7819bc1edc9", "fs_label": "", "fs_uuid": "3789e760-4d16-4990-8148-95912cc4fd1f", "osd_device": "/dev/xvdb", "osd_partition": "/dev/xvdb1", "osd_partition_num": "1", "osd_partition_uuid": "b38c6169-e36d-4a7f-afdc-863c612d2851", "store_type": "bluestore", "use_entire_disk": false } ] ``` ```filestore "osds_bootstrap": [ { "fs_label": "", "fs_uuid": "19edd63c-8546-4e04-8e5c-b6764d4cdb2a", "journal_device": "/dev/xvdb", "journal_num": 2, "journal_partition": "/dev/xvdb2", "osd_device": "/dev/xvdb", "osd_partition": "/dev/xvdb1", "osd_partition_num": "1", "store_type": "filestore", "use_entire_disk": true } ] ``` 3. The osd partition lable after successful initialization is as follows: ``` KOLLA_CEPH_BSDATA_1 KOLLA_CEPH_DATA_BS_B_1 KOLLA_CEPH_DATA_BS_D_1 KOLLA_CEPH_DATA_BS_W_1 ``` The prefix is different so we can't find the disk as the filestore's logic. So I think a good way to name it like this: ``` KOLLA_CEPH_DATA_BS_1 KOLLA_CEPH_DATA_BS_1_B KOLLA_CEPH_DATA_BS_1_D KOLLA_CEPH_DATA_BS_1_W ``` Regular naming can reduce some code. Similarly, the division of each osd partition label should take the following approach: ``` KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_D ``` The simplest label is: ``` KOLLA_CEPH_OSD_BOOTSTRAP_BS ``` 4. According to the naming method above, we can deploy in three ways. 1) The disk has only one partition or the label of the first partition is "KOLLA_CEPH_OSD_BOOTSTRAP_BS", if you use this label, kolla will default to dividing the entire disk into 100M osd data and block partitions. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS 1 -1 ``` result: ``` Number Start End Size File system Name Flags 1 1049kB 106MB 105MB xfs KOLLA_CEPH_DATA_BS_2 2 106MB 107GB 107GB KOLLA_CEPH_DATA_BS_2_B ``` 2）If a disk has only one partition, or the label of the first partition is "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1", then you do not specify a block partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B", then kolla will initialize the entire disk to osd data and block. If you specify additional wal and db partitions (not on the same disk as the osd partition) then kolla will initialize wal and db according to your definition. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 2048 sudo /sbin/parted /dev/loop0 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W 1 -1 ``` result: ``` Disk /dev/xvdb: 107GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 106MB 105MB xfs KOLLA_CEPH_DATA_BS_2 2 106MB 107GB 107GB KOLLA_CEPH_DATA_BS_2_B Model: Loopback device (loopback) Disk /dev/loop0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 10.7GB 10.7GB KOLLA_CEPH_DATA_BS_2_W ``` 3)If you specify the osd partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1" and specify the block partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B", kolla will initialize the disk according to your definition. If you specify additional wal and db partitions then kolla will initialize wal and db also according to your definition.In this case, you can arbitrarily define your partition. Four partitions can be defined on the same disk or on different disks. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 200 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W 201 2249 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_D 2250 4298 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B 4299 100% ``` result: ``` Number Start End Size File system Name Flags 1 1049kB 200MB 199MB xfs KOLLA_CEPH_DATA_BS_1 2 201MB 2249MB 2048MB KOLLA_CEPH_DATA_BS_1_W 3 2250MB 4298MB 2048MB KOLLA_CEPH_DATA_BS_1_D 4 4299MB 107GB 103GB KOLLA_CEPH_DATA_BS_1_B ``` 5. In the calculation of OSD_INITIAL_WEIGHT, if the partition is a block partition, because it is a raw device, when the following command is executed, it will give an error, but does not affect the result, so need to add "\|\| true" to ignore the error: ``` OSD_INITIAL_WEIGHT=$(parted --script ${WEIGHT_PARTITION} unit TB print \| awk 'match($0, /^Disk.* (.)TB/, a){printf("%.2f", a[1])}') ``` The error like this: ``` ++ [[ auto == \a\u\t\o ]] +++ parted --script /dev/xvdb2 unit TB print +++ awk 'match($0, /^Disk. (.*)TB/, a){printf("%.2f", a[1])}' Error: /dev/xvdb2: unrecognised disk label ``` 6. https://review.openstack.org/#/c/575346/ This patch added support for loop devices , but adding some judgments about whether the device is a loop device, it is actually not necessary. We can get directly from find_disks.py. For example, we prepare four loopdevice: ``` sudo /sbin/parted /dev/loop0 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_B 1 -1 sudo /sbin/parted /dev/loop1 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO 1 -1 sudo /sbin/parted /dev/loop2 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_D 1 -1 sudo /sbin/parted /dev/loop3 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_W 1 -1 ``` the result of find_disks.py like this: ``` "osds_bootstrap": [ { "bs_blk_device": "/dev/loop0", "bs_blk_partition": "/dev/loop0p1", "bs_blk_partition_num": "1", "bs_blk_partition_uuid": "b4ac5b80-7015-431b-8256-407769d22907", "bs_db_device": "/dev/loop2", "bs_db_partition": "/dev/loop2p1", "bs_db_partition_num": "1", "bs_db_partition_uuid": "ae16d004-c8f9-4696-a514-ad6c0f23429e", "bs_wal_device": "/dev/loop3", "bs_wal_partition": "/dev/loop3p1", "bs_wal_partition_num": "1", "bs_wal_partition_uuid": "7d2c2986-639d-4889-a664-456137ec8fb2", "fs_label": "", "fs_uuid": "000980bc-6a28-4c0a-b18c-f6726bedeb69", "osd_device": "/dev/loop1", "osd_partition": "/dev/loop1p1", "osd_partition_num": "1", "osd_partition_uuid": "27ba1939-6cb9-4de9-9f14-027c4f6c856f", "store_type": "bluestore", "use_entire_disk": false } ] ``` So only need to use the corresponding partition. I did some optimizations for the above aspects (based on tonezhang's patch): kolla: https://review.openstack.org/#/c/575400/ kolla-ansible: https://review.openstack.org/#/c/575408/
2018-07-04 01:48:46	Jeffrey Zhang	bug			added subscriber Jeffrey Zhang
2018-07-04 09:33:23	wangwei	description	I tested the latest bluestore code, it is the patch of Mr Tonezhang: kolla: https://review.openstack.org/#/c/566810/ kolla-ansible: https://review.openstack.org/#/c/566801/9 1. In my tests, I encountered a problem that osd bootstrap failed when executing this command: ceph-osd -i "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" the los as follows: ``` ++ partprobe ++ ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_B_2 /var/lib/ceph/osd/ceph-2/block ++ '[' -n '' ']' ++ '[' -n '' ']' ++ ceph-osd -i 2 --mkfs -k /var/lib/ceph/osd/ceph-2/keyring --osd-uuid b5703869-87d1-4ab8-be11-ab24db2870cc ``` So I add "-d" parameter to debug the problem: ceph-osd -i -d "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" ``` ++ ceph-osd -d -i 0 --mkfs -k /var/lib/ceph/osd/ceph-0/keyring --osd-uuid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.216034 7f808b6a0d80 0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 78 2018-06-13 17:29:53.243358 7f808b6a0d80 0 stack NetworkStack max thread limit is 24, switching to this now. Higher thread values are unnecessary and currently unsupported. 2018-06-13 17:29:53.248479 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs path /var/lib/ceph/osd/ceph-0 2018-06-13 17:29:53.248676 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.248714 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.249134 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) _read_fsid unparsable uuid 2018-06-13 17:29:53.249141 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs using provided fsid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.249361 7f808b6a0d80 1 bdev create path /var/lib/ceph/osd/ceph-0/block type kernel 2018-06-13 17:29:53.249372 7f808b6a0d80 1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2018-06-13 17:29:53.249400 7f808b6a0d80 -1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open open got: (2) No such file or directory 2018-06-13 17:29:53.249654 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs failed, (2) No such file or directory 2018-06-13 17:29:53.249662 7f808b6a0d80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory 2018-06-13 17:29:53.249950 7f808b6a0d80 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directory ``` After my testing, I found that after executing this command: ``` sgdisk "--change-name=2:KOLLA_CEPH_DATA_BS_B_${OSD_ID}" "--typecode=2:${CEPH_OSD_TYPE_CODE}" -- "${OSD_BS_BLOCK_DEV}" ``` It took 3 seconds to generate the by-partlabel folder and partlabel on my centos virtual machine, but the partuuid was generated immediately without delay. So I think using partuuid is better than partlabel, when initializing ceph. In the ceph-deploy tool, the command to initialize osd is 'ceph-disk preapre', which also uses partuuid. 2. In the current find_disks.py logic, both bluestore and filestore return all disk information, including osd partition, journal partition, block partition, wal partition and db partition. ``` "bs_db_device": "", "bs_db_label": "", "bs_db_partition_num": "", "bs_wal_device": "", "bs_wal_label": "", "bs_wal_partition_num": "", "device": "/dev/xvdb", "external_journal": false, "fs_label": "", "fs_uuid": "cd711f44-2fa8-41c8-8f74-b43e96758edd", "journal": "", "journal_device": "", "journal_num": 0, "partition": "/dev/xvdb", "partition_label": "KOLLA_CEPH_OSD_BOOTSTRAP_BS", "partition_num": "1" ``` There is a bit of confusion here. In fact, in the filestore, there are only osd partition and journal partition. In the bluestore, there are osd data partition, block partition, wal partition and db partition. I think we should distinguish between the bluestore and filestore disk information， like this: ```bluestore "osds_bootstrap": [ { "bs_blk_device": "/dev/xvdb", "bs_blk_partition": "/dev/xvdb4", "bs_blk_partition_num": "4", "bs_blk_partition_uuid": "f3597572-96df-4221-ac7d-e42ab321717b", "bs_db_device": "/dev/xvdb", "bs_db_partition": "/dev/xvdb3", "bs_db_partition_num": "3", "bs_db_partition_uuid": "a986cc7a-65c0-4467-87de-2df37d059aac", "bs_wal_device": "/dev/xvdb", "bs_wal_partition": "/dev/xvdb2", "bs_wal_partition_num": "2", "bs_wal_partition_uuid": "a678b905-8738-4913-8bf3-c7819bc1edc9", "fs_label": "", "fs_uuid": "3789e760-4d16-4990-8148-95912cc4fd1f", "osd_device": "/dev/xvdb", "osd_partition": "/dev/xvdb1", "osd_partition_num": "1", "osd_partition_uuid": "b38c6169-e36d-4a7f-afdc-863c612d2851", "store_type": "bluestore", "use_entire_disk": false } ] ``` ```filestore "osds_bootstrap": [ { "fs_label": "", "fs_uuid": "19edd63c-8546-4e04-8e5c-b6764d4cdb2a", "journal_device": "/dev/xvdb", "journal_num": 2, "journal_partition": "/dev/xvdb2", "osd_device": "/dev/xvdb", "osd_partition": "/dev/xvdb1", "osd_partition_num": "1", "store_type": "filestore", "use_entire_disk": true } ] ``` 3. The osd partition lable after successful initialization is as follows: ``` KOLLA_CEPH_BSDATA_1 KOLLA_CEPH_DATA_BS_B_1 KOLLA_CEPH_DATA_BS_D_1 KOLLA_CEPH_DATA_BS_W_1 ``` The prefix is different so we can't find the disk as the filestore's logic. So I think a good way to name it like this: ``` KOLLA_CEPH_DATA_BS_1 KOLLA_CEPH_DATA_BS_1_B KOLLA_CEPH_DATA_BS_1_D KOLLA_CEPH_DATA_BS_1_W ``` Regular naming can reduce some code. Similarly, the division of each osd partition label should take the following approach: ``` KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_D ``` The simplest label is: ``` KOLLA_CEPH_OSD_BOOTSTRAP_BS ``` 4. According to the naming method above, we can deploy in three ways. 1) The disk has only one partition or the label of the first partition is "KOLLA_CEPH_OSD_BOOTSTRAP_BS", if you use this label, kolla will default to dividing the entire disk into 100M osd data and block partitions. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS 1 -1 ``` result: ``` Number Start End Size File system Name Flags 1 1049kB 106MB 105MB xfs KOLLA_CEPH_DATA_BS_2 2 106MB 107GB 107GB KOLLA_CEPH_DATA_BS_2_B ``` 2）If a disk has only one partition, or the label of the first partition is "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1", then you do not specify a block partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B", then kolla will initialize the entire disk to osd data and block. If you specify additional wal and db partitions (not on the same disk as the osd partition) then kolla will initialize wal and db according to your definition. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 2048 sudo /sbin/parted /dev/loop0 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W 1 -1 ``` result: ``` Disk /dev/xvdb: 107GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 106MB 105MB xfs KOLLA_CEPH_DATA_BS_2 2 106MB 107GB 107GB KOLLA_CEPH_DATA_BS_2_B Model: Loopback device (loopback) Disk /dev/loop0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 10.7GB 10.7GB KOLLA_CEPH_DATA_BS_2_W ``` 3)If you specify the osd partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1" and specify the block partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B", kolla will initialize the disk according to your definition. If you specify additional wal and db partitions then kolla will initialize wal and db also according to your definition.In this case, you can arbitrarily define your partition. Four partitions can be defined on the same disk or on different disks. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 200 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W 201 2249 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_D 2250 4298 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B 4299 100% ``` result: ``` Number Start End Size File system Name Flags 1 1049kB 200MB 199MB xfs KOLLA_CEPH_DATA_BS_1 2 201MB 2249MB 2048MB KOLLA_CEPH_DATA_BS_1_W 3 2250MB 4298MB 2048MB KOLLA_CEPH_DATA_BS_1_D 4 4299MB 107GB 103GB KOLLA_CEPH_DATA_BS_1_B ``` 5. In the calculation of OSD_INITIAL_WEIGHT, if the partition is a block partition, because it is a raw device, when the following command is executed, it will give an error, but does not affect the result, so need to add "\|\| true" to ignore the error: ``` OSD_INITIAL_WEIGHT=$(parted --script ${WEIGHT_PARTITION} unit TB print \| awk 'match($0, /^Disk.* (.)TB/, a){printf("%.2f", a[1])}') ``` The error like this: ``` ++ [[ auto == \a\u\t\o ]] +++ parted --script /dev/xvdb2 unit TB print +++ awk 'match($0, /^Disk. (.*)TB/, a){printf("%.2f", a[1])}' Error: /dev/xvdb2: unrecognised disk label ``` 6. https://review.openstack.org/#/c/575346/ This patch added support for loop devices , but adding some judgments about whether the device is a loop device, it is actually not necessary. We can get directly from find_disks.py. For example, we prepare four loopdevice: ``` sudo /sbin/parted /dev/loop0 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_B 1 -1 sudo /sbin/parted /dev/loop1 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO 1 -1 sudo /sbin/parted /dev/loop2 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_D 1 -1 sudo /sbin/parted /dev/loop3 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_W 1 -1 ``` the result of find_disks.py like this: ``` "osds_bootstrap": [ { "bs_blk_device": "/dev/loop0", "bs_blk_partition": "/dev/loop0p1", "bs_blk_partition_num": "1", "bs_blk_partition_uuid": "b4ac5b80-7015-431b-8256-407769d22907", "bs_db_device": "/dev/loop2", "bs_db_partition": "/dev/loop2p1", "bs_db_partition_num": "1", "bs_db_partition_uuid": "ae16d004-c8f9-4696-a514-ad6c0f23429e", "bs_wal_device": "/dev/loop3", "bs_wal_partition": "/dev/loop3p1", "bs_wal_partition_num": "1", "bs_wal_partition_uuid": "7d2c2986-639d-4889-a664-456137ec8fb2", "fs_label": "", "fs_uuid": "000980bc-6a28-4c0a-b18c-f6726bedeb69", "osd_device": "/dev/loop1", "osd_partition": "/dev/loop1p1", "osd_partition_num": "1", "osd_partition_uuid": "27ba1939-6cb9-4de9-9f14-027c4f6c856f", "store_type": "bluestore", "use_entire_disk": false } ] ``` So only need to use the corresponding partition. I did some optimizations for the above aspects (based on tonezhang's patch): kolla: https://review.openstack.org/#/c/575400/ kolla-ansible: https://review.openstack.org/#/c/575408/	I tested the latest bluestore code, it is the patch of Mr Tonezhang: kolla: https://review.openstack.org/#/c/566810/ kolla-ansible: https://review.openstack.org/#/c/566801/9 1. PS: I have only encountered the following problems on the cloud virtual machine. It is okay to use partlabel on the virtual machine of VMware. If anyone else encounters this problem, please leave the environment description. The following is the original description: In my tests, I encountered a problem that osd bootstrap failed when executing this command: ceph-osd -i "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" the los as follows: ``` ++ partprobe ++ ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_B_2 /var/lib/ceph/osd/ceph-2/block ++ '[' -n '' ']' ++ '[' -n '' ']' ++ ceph-osd -i 2 --mkfs -k /var/lib/ceph/osd/ceph-2/keyring --osd-uuid b5703869-87d1-4ab8-be11-ab24db2870cc ``` So I add "-d" parameter to debug the problem: ceph-osd -i -d "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" ``` ++ ceph-osd -d -i 0 --mkfs -k /var/lib/ceph/osd/ceph-0/keyring --osd-uuid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.216034 7f808b6a0d80 0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 78 2018-06-13 17:29:53.243358 7f808b6a0d80 0 stack NetworkStack max thread limit is 24, switching to this now. Higher thread values are unnecessary and currently unsupported. 2018-06-13 17:29:53.248479 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs path /var/lib/ceph/osd/ceph-0 2018-06-13 17:29:53.248676 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.248714 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.249134 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) _read_fsid unparsable uuid 2018-06-13 17:29:53.249141 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs using provided fsid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.249361 7f808b6a0d80 1 bdev create path /var/lib/ceph/osd/ceph-0/block type kernel 2018-06-13 17:29:53.249372 7f808b6a0d80 1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2018-06-13 17:29:53.249400 7f808b6a0d80 -1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open open got: (2) No such file or directory 2018-06-13 17:29:53.249654 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs failed, (2) No such file or directory 2018-06-13 17:29:53.249662 7f808b6a0d80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory 2018-06-13 17:29:53.249950 7f808b6a0d80 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directory ``` After my testing, I found that after executing this command: ``` sgdisk "--change-name=2:KOLLA_CEPH_DATA_BS_B_${OSD_ID}" "--typecode=2:${CEPH_OSD_TYPE_CODE}" -- "${OSD_BS_BLOCK_DEV}" ``` It took 3 seconds to generate the by-partlabel folder and partlabel on my centos virtual machine, but the partuuid was generated immediately without delay. So I think using partuuid is better than partlabel, when initializing ceph. In the ceph-deploy tool, the command to initialize osd is 'ceph-disk preapre', which also uses partuuid. 2. In the current find_disks.py logic, both bluestore and filestore return all disk information, including osd partition, journal partition, block partition, wal partition and db partition. ``` "bs_db_device": "", "bs_db_label": "", "bs_db_partition_num": "", "bs_wal_device": "", "bs_wal_label": "", "bs_wal_partition_num": "", "device": "/dev/xvdb", "external_journal": false, "fs_label": "", "fs_uuid": "cd711f44-2fa8-41c8-8f74-b43e96758edd", "journal": "", "journal_device": "", "journal_num": 0, "partition": "/dev/xvdb", "partition_label": "KOLLA_CEPH_OSD_BOOTSTRAP_BS", "partition_num": "1" ``` There is a bit of confusion here. In fact, in the filestore, there are only osd partition and journal partition. In the bluestore, there are osd data partition, block partition, wal partition and db partition. I think we should distinguish between the bluestore and filestore disk information， like this: ```bluestore "osds_bootstrap": [ { "bs_blk_device": "/dev/sdb", "bs_blk_partition": "/dev/sdb2", "bs_blk_partition_num": 2, "fs_label": "", "fs_uuid": "d5ca7d92-457e-484c-a7f3-7b0497249f87", "osd_device": "/dev/sdb", "osd_partition": "/dev/sdb1", "osd_partition_num": "1", "store_type": "bluestore", "use_entire_disk": true }, "osds_bootstrap": [ { "bs_blk_device": "/dev/sdb", "bs_blk_partition": "/dev/sdb2", "bs_blk_partition_num": 2, "bs_db_device": "/dev/sdc", "bs_db_partition": "/dev/sdc2", "bs_db_partition_num": "2", "bs_wal_device": "/dev/sdc", "bs_wal_partition": "/dev/sdc1", "bs_wal_partition_num": "1", "fs_label": "", "fs_uuid": "f1590016-2bf2-4690-b9cb-497a95eacac0", "osd_device": "/dev/sdb", "osd_partition": "/dev/sdb1", "osd_partition_num": "1", "store_type": "bluestore", "use_entire_disk": true } ] ``` ```filestore "osds_bootstrap": [ { "fs_label": "", "fs_uuid": "0d965d41-2027-4713-ba24-3e0f53ce5ec2", "journal_device": "/dev/sdb", "journal_num": 2, "journal_partition": "/dev/sdb2", "osd_device": "/dev/sdb", "osd_partition": "/dev/sdb1", "osd_partition_num": "1", "store_type": "filestore", "use_entire_disk": true }, { "fs_label": "", "fs_uuid": "", "journal_device": "/dev/sdc", "journal_num": "2", "journal_partition": "/dev/disk/by-partuuid/81f04fbf-f272-4073-9217-cf02805dda17", "osd_device": "/dev/sdc", "osd_partition": "/dev/sdc1", "osd_partition_num": "1", "store_type": "filestore", "use_entire_disk": false } ] ``` 3. The osd partition lable after successful initialization is as follows: ``` KOLLA_CEPH_BSDATA_1 KOLLA_CEPH_DATA_BS_B_1 KOLLA_CEPH_DATA_BS_D_1 KOLLA_CEPH_DATA_BS_W_1 ``` The prefix is different so we can't find the disk as the filestore's logic. So I think a good way to name it like this: ``` KOLLA_CEPH_DATA_BS_1 KOLLA_CEPH_DATA_BS_1_B KOLLA_CEPH_DATA_BS_1_D KOLLA_CEPH_DATA_BS_1_W ``` Regular naming can reduce some code. Similarly, the division of each osd partition label should take the following approach: ``` KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_D ``` The simplest label is: ``` KOLLA_CEPH_OSD_BOOTSTRAP_BS ``` 4. According to the naming method above, we can deploy in three ways. 1) The disk has only one partition or the label of the first partition is "KOLLA_CEPH_OSD_BOOTSTRAP_BS", if you use this label, kolla will default to dividing the entire disk into 100M osd data and block partitions. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS 1 -1 ``` result: ``` Number Start End Size File system Name Flags 1 1049kB 106MB 105MB xfs KOLLA_CEPH_DATA_BS_2 2 106MB 107GB 107GB KOLLA_CEPH_DATA_BS_2_B ``` 2）If a disk has only one partition, or the label of the first partition is "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1", then you do not specify a block partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B", then kolla will initialize the entire disk to osd data and block. If you specify additional wal and db partitions (not on the same disk as the osd partition) then kolla will initialize wal and db according to your definition. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 2048 sudo /sbin/parted /dev/loop0 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W 1 -1 ``` result: ``` Disk /dev/xvdb: 107GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 106MB 105MB xfs KOLLA_CEPH_DATA_BS_2 2 106MB 107GB 107GB KOLLA_CEPH_DATA_BS_2_B Model: Loopback device (loopback) Disk /dev/loop0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 10.7GB 10.7GB KOLLA_CEPH_DATA_BS_2_W ``` 3)If you specify the osd partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1" and specify the block partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B", kolla will initialize the disk according to your definition. If you specify additional wal and db partitions then kolla will initialize wal and db also according to your definition.In this case, you can arbitrarily define your partition. Four partitions can be defined on the same disk or on different disks. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 200 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W 201 2249 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_D 2250 4298 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B 4299 100% ``` result: ``` Number Start End Size File system Name Flags 1 1049kB 200MB 199MB xfs KOLLA_CEPH_DATA_BS_1 2 201MB 2249MB 2048MB KOLLA_CEPH_DATA_BS_1_W 3 2250MB 4298MB 2048MB KOLLA_CEPH_DATA_BS_1_D 4 4299MB 107GB 103GB KOLLA_CEPH_DATA_BS_1_B ``` 5. In the calculation of OSD_INITIAL_WEIGHT, if the partition is a block partition, because it is a raw device, when the following command is executed, it will give an error, but does not affect the result, so need to add "\|\| true" to ignore the error: ``` OSD_INITIAL_WEIGHT=$(parted --script ${WEIGHT_PARTITION} unit TB print \| awk 'match($0, /^Disk.* (.)TB/, a){printf("%.2f", a[1])}') ``` The error like this: ``` ++ [[ auto == \a\u\t\o ]] +++ parted --script /dev/xvdb2 unit TB print +++ awk 'match($0, /^Disk. (.*)TB/, a){printf("%.2f", a[1])}' Error: /dev/xvdb2: unrecognised disk label ``` 6. https://review.openstack.org/#/c/575346/ This patch added support for loop devices , but adding some judgments about whether the device is a loop device, it is actually not necessary. We can get directly from find_disks.py. For example, we prepare four loopdevice: ``` sudo /sbin/parted /dev/loop0 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_B 1 -1 sudo /sbin/parted /dev/loop1 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO 1 -1 sudo /sbin/parted /dev/loop2 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_D 1 -1 sudo /sbin/parted /dev/loop3 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_W 1 -1 ``` the result of find_disks.py like this: ``` "osds_bootstrap": [ { "bs_blk_device": "/dev/loop0", "bs_blk_partition": "/dev/loop0p1", "bs_blk_partition_num": "1", "bs_blk_partition_uuid": "b4ac5b80-7015-431b-8256-407769d22907", "bs_db_device": "/dev/loop2", "bs_db_partition": "/dev/loop2p1", "bs_db_partition_num": "1", "bs_db_partition_uuid": "ae16d004-c8f9-4696-a514-ad6c0f23429e", "bs_wal_device": "/dev/loop3", "bs_wal_partition": "/dev/loop3p1", "bs_wal_partition_num": "1", "bs_wal_partition_uuid": "7d2c2986-639d-4889-a664-456137ec8fb2", "fs_label": "", "fs_uuid": "000980bc-6a28-4c0a-b18c-f6726bedeb69", "osd_device": "/dev/loop1", "osd_partition": "/dev/loop1p1", "osd_partition_num": "1", "osd_partition_uuid": "27ba1939-6cb9-4de9-9f14-027c4f6c856f", "store_type": "bluestore", "use_entire_disk": false } ] ``` So only need to use the corresponding partition. I did some optimizations for the above aspects (based on tonezhang's patch): kolla: https://review.openstack.org/#/c/575400/ kolla-ansible: https://review.openstack.org/#/c/575408/
2018-07-25 10:51:47	wangwei	description	I tested the latest bluestore code, it is the patch of Mr Tonezhang: kolla: https://review.openstack.org/#/c/566810/ kolla-ansible: https://review.openstack.org/#/c/566801/9 1. PS: I have only encountered the following problems on the cloud virtual machine. It is okay to use partlabel on the virtual machine of VMware. If anyone else encounters this problem, please leave the environment description. The following is the original description: In my tests, I encountered a problem that osd bootstrap failed when executing this command: ceph-osd -i "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" the los as follows: ``` ++ partprobe ++ ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_B_2 /var/lib/ceph/osd/ceph-2/block ++ '[' -n '' ']' ++ '[' -n '' ']' ++ ceph-osd -i 2 --mkfs -k /var/lib/ceph/osd/ceph-2/keyring --osd-uuid b5703869-87d1-4ab8-be11-ab24db2870cc ``` So I add "-d" parameter to debug the problem: ceph-osd -i -d "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" ``` ++ ceph-osd -d -i 0 --mkfs -k /var/lib/ceph/osd/ceph-0/keyring --osd-uuid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.216034 7f808b6a0d80 0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 78 2018-06-13 17:29:53.243358 7f808b6a0d80 0 stack NetworkStack max thread limit is 24, switching to this now. Higher thread values are unnecessary and currently unsupported. 2018-06-13 17:29:53.248479 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs path /var/lib/ceph/osd/ceph-0 2018-06-13 17:29:53.248676 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.248714 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.249134 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) _read_fsid unparsable uuid 2018-06-13 17:29:53.249141 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs using provided fsid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.249361 7f808b6a0d80 1 bdev create path /var/lib/ceph/osd/ceph-0/block type kernel 2018-06-13 17:29:53.249372 7f808b6a0d80 1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2018-06-13 17:29:53.249400 7f808b6a0d80 -1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open open got: (2) No such file or directory 2018-06-13 17:29:53.249654 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs failed, (2) No such file or directory 2018-06-13 17:29:53.249662 7f808b6a0d80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory 2018-06-13 17:29:53.249950 7f808b6a0d80 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directory ``` After my testing, I found that after executing this command: ``` sgdisk "--change-name=2:KOLLA_CEPH_DATA_BS_B_${OSD_ID}" "--typecode=2:${CEPH_OSD_TYPE_CODE}" -- "${OSD_BS_BLOCK_DEV}" ``` It took 3 seconds to generate the by-partlabel folder and partlabel on my centos virtual machine, but the partuuid was generated immediately without delay. So I think using partuuid is better than partlabel, when initializing ceph. In the ceph-deploy tool, the command to initialize osd is 'ceph-disk preapre', which also uses partuuid. 2. In the current find_disks.py logic, both bluestore and filestore return all disk information, including osd partition, journal partition, block partition, wal partition and db partition. ``` "bs_db_device": "", "bs_db_label": "", "bs_db_partition_num": "", "bs_wal_device": "", "bs_wal_label": "", "bs_wal_partition_num": "", "device": "/dev/xvdb", "external_journal": false, "fs_label": "", "fs_uuid": "cd711f44-2fa8-41c8-8f74-b43e96758edd", "journal": "", "journal_device": "", "journal_num": 0, "partition": "/dev/xvdb", "partition_label": "KOLLA_CEPH_OSD_BOOTSTRAP_BS", "partition_num": "1" ``` There is a bit of confusion here. In fact, in the filestore, there are only osd partition and journal partition. In the bluestore, there are osd data partition, block partition, wal partition and db partition. I think we should distinguish between the bluestore and filestore disk information， like this: ```bluestore "osds_bootstrap": [ { "bs_blk_device": "/dev/sdb", "bs_blk_partition": "/dev/sdb2", "bs_blk_partition_num": 2, "fs_label": "", "fs_uuid": "d5ca7d92-457e-484c-a7f3-7b0497249f87", "osd_device": "/dev/sdb", "osd_partition": "/dev/sdb1", "osd_partition_num": "1", "store_type": "bluestore", "use_entire_disk": true }, "osds_bootstrap": [ { "bs_blk_device": "/dev/sdb", "bs_blk_partition": "/dev/sdb2", "bs_blk_partition_num": 2, "bs_db_device": "/dev/sdc", "bs_db_partition": "/dev/sdc2", "bs_db_partition_num": "2", "bs_wal_device": "/dev/sdc", "bs_wal_partition": "/dev/sdc1", "bs_wal_partition_num": "1", "fs_label": "", "fs_uuid": "f1590016-2bf2-4690-b9cb-497a95eacac0", "osd_device": "/dev/sdb", "osd_partition": "/dev/sdb1", "osd_partition_num": "1", "store_type": "bluestore", "use_entire_disk": true } ] ``` ```filestore "osds_bootstrap": [ { "fs_label": "", "fs_uuid": "0d965d41-2027-4713-ba24-3e0f53ce5ec2", "journal_device": "/dev/sdb", "journal_num": 2, "journal_partition": "/dev/sdb2", "osd_device": "/dev/sdb", "osd_partition": "/dev/sdb1", "osd_partition_num": "1", "store_type": "filestore", "use_entire_disk": true }, { "fs_label": "", "fs_uuid": "", "journal_device": "/dev/sdc", "journal_num": "2", "journal_partition": "/dev/disk/by-partuuid/81f04fbf-f272-4073-9217-cf02805dda17", "osd_device": "/dev/sdc", "osd_partition": "/dev/sdc1", "osd_partition_num": "1", "store_type": "filestore", "use_entire_disk": false } ] ``` 3. The osd partition lable after successful initialization is as follows: ``` KOLLA_CEPH_BSDATA_1 KOLLA_CEPH_DATA_BS_B_1 KOLLA_CEPH_DATA_BS_D_1 KOLLA_CEPH_DATA_BS_W_1 ``` The prefix is different so we can't find the disk as the filestore's logic. So I think a good way to name it like this: ``` KOLLA_CEPH_DATA_BS_1 KOLLA_CEPH_DATA_BS_1_B KOLLA_CEPH_DATA_BS_1_D KOLLA_CEPH_DATA_BS_1_W ``` Regular naming can reduce some code. Similarly, the division of each osd partition label should take the following approach: ``` KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_D ``` The simplest label is: ``` KOLLA_CEPH_OSD_BOOTSTRAP_BS ``` 4. According to the naming method above, we can deploy in three ways. 1) The disk has only one partition or the label of the first partition is "KOLLA_CEPH_OSD_BOOTSTRAP_BS", if you use this label, kolla will default to dividing the entire disk into 100M osd data and block partitions. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS 1 -1 ``` result: ``` Number Start End Size File system Name Flags 1 1049kB 106MB 105MB xfs KOLLA_CEPH_DATA_BS_2 2 106MB 107GB 107GB KOLLA_CEPH_DATA_BS_2_B ``` 2）If a disk has only one partition, or the label of the first partition is "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1", then you do not specify a block partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B", then kolla will initialize the entire disk to osd data and block. If you specify additional wal and db partitions (not on the same disk as the osd partition) then kolla will initialize wal and db according to your definition. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 2048 sudo /sbin/parted /dev/loop0 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W 1 -1 ``` result: ``` Disk /dev/xvdb: 107GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 106MB 105MB xfs KOLLA_CEPH_DATA_BS_2 2 106MB 107GB 107GB KOLLA_CEPH_DATA_BS_2_B Model: Loopback device (loopback) Disk /dev/loop0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 10.7GB 10.7GB KOLLA_CEPH_DATA_BS_2_W ``` 3)If you specify the osd partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1" and specify the block partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B", kolla will initialize the disk according to your definition. If you specify additional wal and db partitions then kolla will initialize wal and db also according to your definition.In this case, you can arbitrarily define your partition. Four partitions can be defined on the same disk or on different disks. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 200 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W 201 2249 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_D 2250 4298 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B 4299 100% ``` result: ``` Number Start End Size File system Name Flags 1 1049kB 200MB 199MB xfs KOLLA_CEPH_DATA_BS_1 2 201MB 2249MB 2048MB KOLLA_CEPH_DATA_BS_1_W 3 2250MB 4298MB 2048MB KOLLA_CEPH_DATA_BS_1_D 4 4299MB 107GB 103GB KOLLA_CEPH_DATA_BS_1_B ``` 5. In the calculation of OSD_INITIAL_WEIGHT, if the partition is a block partition, because it is a raw device, when the following command is executed, it will give an error, but does not affect the result, so need to add "\|\| true" to ignore the error: ``` OSD_INITIAL_WEIGHT=$(parted --script ${WEIGHT_PARTITION} unit TB print \| awk 'match($0, /^Disk.* (.)TB/, a){printf("%.2f", a[1])}') ``` The error like this: ``` ++ [[ auto == \a\u\t\o ]] +++ parted --script /dev/xvdb2 unit TB print +++ awk 'match($0, /^Disk. (.*)TB/, a){printf("%.2f", a[1])}' Error: /dev/xvdb2: unrecognised disk label ``` 6. https://review.openstack.org/#/c/575346/ This patch added support for loop devices , but adding some judgments about whether the device is a loop device, it is actually not necessary. We can get directly from find_disks.py. For example, we prepare four loopdevice: ``` sudo /sbin/parted /dev/loop0 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_B 1 -1 sudo /sbin/parted /dev/loop1 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO 1 -1 sudo /sbin/parted /dev/loop2 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_D 1 -1 sudo /sbin/parted /dev/loop3 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_W 1 -1 ``` the result of find_disks.py like this: ``` "osds_bootstrap": [ { "bs_blk_device": "/dev/loop0", "bs_blk_partition": "/dev/loop0p1", "bs_blk_partition_num": "1", "bs_blk_partition_uuid": "b4ac5b80-7015-431b-8256-407769d22907", "bs_db_device": "/dev/loop2", "bs_db_partition": "/dev/loop2p1", "bs_db_partition_num": "1", "bs_db_partition_uuid": "ae16d004-c8f9-4696-a514-ad6c0f23429e", "bs_wal_device": "/dev/loop3", "bs_wal_partition": "/dev/loop3p1", "bs_wal_partition_num": "1", "bs_wal_partition_uuid": "7d2c2986-639d-4889-a664-456137ec8fb2", "fs_label": "", "fs_uuid": "000980bc-6a28-4c0a-b18c-f6726bedeb69", "osd_device": "/dev/loop1", "osd_partition": "/dev/loop1p1", "osd_partition_num": "1", "osd_partition_uuid": "27ba1939-6cb9-4de9-9f14-027c4f6c856f", "store_type": "bluestore", "use_entire_disk": false } ] ``` So only need to use the corresponding partition. I did some optimizations for the above aspects (based on tonezhang's patch): kolla: https://review.openstack.org/#/c/575400/ kolla-ansible: https://review.openstack.org/#/c/575408/	I tested the latest bluestore code, it is the patch of Mr Tonezhang: kolla: https://review.openstack.org/#/c/566810/ kolla-ansible: https://review.openstack.org/#/c/566801/9 1. PS: I have only encountered the following problems on the cloud virtual machine. It is okay to use partlabel on the virtual machine of VMware. If anyone else encounters this problem, please leave the environment description. The following is the original description: In my tests, I encountered a problem that osd bootstrap failed when executing this command: ceph-osd -i "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" the los as follows: ``` ++ partprobe ++ ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_B_2 /var/lib/ceph/osd/ceph-2/block ++ '[' -n '' ']' ++ '[' -n '' ']' ++ ceph-osd -i 2 --mkfs -k /var/lib/ceph/osd/ceph-2/keyring --osd-uuid b5703869-87d1-4ab8-be11-ab24db2870cc ``` So I add "-d" parameter to debug the problem: ceph-osd -i -d "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" ``` ++ ceph-osd -d -i 0 --mkfs -k /var/lib/ceph/osd/ceph-0/keyring --osd-uuid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.216034 7f808b6a0d80 0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 78 2018-06-13 17:29:53.243358 7f808b6a0d80 0 stack NetworkStack max thread limit is 24, switching to this now. Higher thread values are unnecessary and currently unsupported. 2018-06-13 17:29:53.248479 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs path /var/lib/ceph/osd/ceph-0 2018-06-13 17:29:53.248676 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.248714 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.249134 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) _read_fsid unparsable uuid 2018-06-13 17:29:53.249141 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs using provided fsid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.249361 7f808b6a0d80 1 bdev create path /var/lib/ceph/osd/ceph-0/block type kernel 2018-06-13 17:29:53.249372 7f808b6a0d80 1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2018-06-13 17:29:53.249400 7f808b6a0d80 -1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open open got: (2) No such file or directory 2018-06-13 17:29:53.249654 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs failed, (2) No such file or directory 2018-06-13 17:29:53.249662 7f808b6a0d80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory 2018-06-13 17:29:53.249950 7f808b6a0d80 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directory ``` After my testing, I found that after executing this command: ``` sgdisk "--change-name=2:KOLLA_CEPH_DATA_BS_B_${OSD_ID}" "--typecode=2:${CEPH_OSD_TYPE_CODE}" -- "${OSD_BS_BLOCK_DEV}" ``` It took 3 seconds to generate the by-partlabel folder and partlabel on my centos virtual machine, but the partuuid was generated immediately without delay. So I think using partuuid is better than partlabel, when initializing ceph. In the ceph-deploy tool, the command to initialize osd is 'ceph-disk preapre', which also uses partuuid. 2. In the current find_disks.py logic, both bluestore and filestore return all disk information, including osd partition, journal partition, block partition, wal partition and db partition. ``` "bs_db_device": "", "bs_db_label": "", "bs_db_partition_num": "", "bs_wal_device": "", "bs_wal_label": "", "bs_wal_partition_num": "", "device": "/dev/xvdb", "external_journal": false, "fs_label": "", "fs_uuid": "cd711f44-2fa8-41c8-8f74-b43e96758edd", "journal": "", "journal_device": "", "journal_num": 0, "partition": "/dev/xvdb", "partition_label": "KOLLA_CEPH_OSD_BOOTSTRAP_BS", "partition_num": "1" ``` There is a bit of confusion here. In fact, in the filestore, there are only osd partition and journal partition. In the bluestore, there are osd data partition, block partition, wal partition and db partition. I think we should distinguish between the bluestore and filestore disk information， like this: ```bluestore "osds_bootstrap": [ { "bs_blk_device": "/dev/sdb", "bs_blk_partition": "/dev/sdb2", "bs_blk_partition_num": 2, "fs_label": "", "fs_uuid": "d5ca7d92-457e-484c-a7f3-7b0497249f87", "osd_device": "/dev/sdb", "osd_partition": "/dev/sdb1", "osd_partition_num": "1", "store_type": "bluestore", "use_entire_disk": true }, "osds_bootstrap": [ { "bs_blk_device": "/dev/sdb", "bs_blk_partition": "/dev/sdb2", "bs_blk_partition_num": 2, "bs_db_device": "/dev/sdc", "bs_db_partition": "/dev/sdc2", "bs_db_partition_num": "2", "bs_wal_device": "/dev/sdc", "bs_wal_partition": "/dev/sdc1", "bs_wal_partition_num": "1", "fs_label": "", "fs_uuid": "f1590016-2bf2-4690-b9cb-497a95eacac0", "osd_device": "/dev/sdb", "osd_partition": "/dev/sdb1", "osd_partition_num": "1", "store_type": "bluestore", "use_entire_disk": true } ] ``` ```filestore "osds_bootstrap": [ { "fs_label": "", "fs_uuid": "0d965d41-2027-4713-ba24-3e0f53ce5ec2", "journal_device": "/dev/sdb", "journal_num": 2, "journal_partition": "/dev/sdb2", "osd_device": "/dev/sdb", "osd_partition": "/dev/sdb1", "osd_partition_num": "1", "store_type": "filestore", "use_entire_disk": true }, { "fs_label": "", "fs_uuid": "", "journal_device": "/dev/sdc", "journal_num": "2", "journal_partition": "/dev/disk/by-partuuid/81f04fbf-f272-4073-9217-cf02805dda17", "osd_device": "/dev/sdc", "osd_partition": "/dev/sdc1", "osd_partition_num": "1", "store_type": "filestore", "use_entire_disk": false } ] ``` 3. The osd partition lable after successful initialization is as follows: ``` KOLLA_CEPH_BSDATA_1 KOLLA_CEPH_DATA_BS_B_1 KOLLA_CEPH_DATA_BS_D_1 KOLLA_CEPH_DATA_BS_W_1 ``` The prefix is different so we can't find the disk as the filestore's logic. So I think a good way to name it like this: ``` KOLLA_CEPH_DATA_BS_1 KOLLA_CEPH_DATA_BS_1_B KOLLA_CEPH_DATA_BS_1_D KOLLA_CEPH_DATA_BS_1_W ``` Regular naming can reduce some code. Similarly, the division of each osd partition label should take the following approach: ``` KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_D ``` The simplest label is: ``` KOLLA_CEPH_OSD_BOOTSTRAP_BS ``` 4. According to the naming method above, we can deploy in three ways. 1) The disk has only one partition or the label of the first partition is "KOLLA_CEPH_OSD_BOOTSTRAP_BS", if you use this label, kolla will default to dividing the entire disk into 100M osd data and block partitions. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS 1 -1 ``` result: ``` Number Start End Size File system Name Flags 1 1049kB 106MB 105MB xfs KOLLA_CEPH_DATA_BS_2 2 106MB 107GB 107GB KOLLA_CEPH_DATA_BS_2_B ``` 2）If a disk has only one partition, or the label of the first partition is "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1", then you do not specify a block partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B", then kolla will initialize the entire disk to osd data and block. If you specify additional wal and db partitions (not on the same disk as the osd partition) then kolla will initialize wal and db according to your definition. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 2048 sudo /sbin/parted /dev/loop0 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W 1 -1 ``` result: ``` Disk /dev/xvdb: 107GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 106MB 105MB xfs KOLLA_CEPH_DATA_BS_2 2 106MB 107GB 107GB KOLLA_CEPH_DATA_BS_2_B Model: Loopback device (loopback) Disk /dev/loop0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 10.7GB 10.7GB KOLLA_CEPH_DATA_BS_2_W ``` 3)If you specify the osd partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1" and specify the block partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B", kolla will initialize the disk according to your definition. If you specify additional wal and db partitions then kolla will initialize wal and db also according to your definition.In this case, you can arbitrarily define your partition. Four partitions can be defined on the same disk or on different disks. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 200 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W 201 2249 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_D 2250 4298 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B 4299 100% ``` result: ``` Number Start End Size File system Name Flags 1 1049kB 200MB 199MB xfs KOLLA_CEPH_DATA_BS_1 2 201MB 2249MB 2048MB KOLLA_CEPH_DATA_BS_1_W 3 2250MB 4298MB 2048MB KOLLA_CEPH_DATA_BS_1_D 4 4299MB 107GB 103GB KOLLA_CEPH_DATA_BS_1_B ``` 5. In the calculation of OSD_INITIAL_WEIGHT, if the partition is a block partition, because it is a raw device, when the following command is executed, it will give an error, but does not affect the result, so need to add "\|\| true" to ignore the error: ``` OSD_INITIAL_WEIGHT=$(parted --script ${WEIGHT_PARTITION} unit TB print \| awk 'match($0, /^Disk.* (.)TB/, a){printf("%.2f", a[1])}') ``` The error like this: ``` ++ [[ auto == \a\u\t\o ]] +++ parted --script /dev/xvdb2 unit TB print +++ awk 'match($0, /^Disk. (.*)TB/, a){printf("%.2f", a[1])}' Error: /dev/xvdb2: unrecognised disk label ``` 6. https://review.openstack.org/#/c/575346/ This patch added support for loop devices , but adding some judgments about whether the device is a loop device, it is actually not necessary. We can get directly from find_disks.py. For example, we prepare four loopdevice: ``` sudo /sbin/parted /dev/loop0 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_B 1 -1 sudo /sbin/parted /dev/loop1 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO 1 -1 sudo /sbin/parted /dev/loop2 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_D 1 -1 sudo /sbin/parted /dev/loop3 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_W 1 -1 ``` the result of find_disks.py like this: ``` "osds_bootstrap": [ { "bs_blk_device": "/dev/loop0", "bs_blk_partition": "/dev/loop0p1", "bs_blk_partition_num": "1", "bs_blk_partition_uuid": "b4ac5b80-7015-431b-8256-407769d22907", "bs_db_device": "/dev/loop2", "bs_db_partition": "/dev/loop2p1", "bs_db_partition_num": "1", "bs_db_partition_uuid": "ae16d004-c8f9-4696-a514-ad6c0f23429e", "bs_wal_device": "/dev/loop3", "bs_wal_partition": "/dev/loop3p1", "bs_wal_partition_num": "1", "bs_wal_partition_uuid": "7d2c2986-639d-4889-a664-456137ec8fb2", "fs_label": "", "fs_uuid": "000980bc-6a28-4c0a-b18c-f6726bedeb69", "osd_device": "/dev/loop1", "osd_partition": "/dev/loop1p1", "osd_partition_num": "1", "osd_partition_uuid": "27ba1939-6cb9-4de9-9f14-027c4f6c856f", "store_type": "bluestore", "use_entire_disk": false } ] ``` So only need to use the corresponding partition. 7. If ceph luminous package is installed on the host where the osd container is located,then the osd container will fail to start after the host reboots. ``` docker logs: NAMES dd94b67a13f9 xxx/pasta-os/centos-source-ceph-osd:cephT-4.0.2.0002 "kolla_start" 2 minutes ago Restarting (1) 10 seconds ago ceph_osd_2 e6110c697e1c xxx/pasta-os/centos-source-ceph-osd:cephT-4.0.2.0002 "kolla_start" 2 minutes ago Restarting (1) 11 seconds ago df -h logs: /dev/sdc1 97M 5.3M 92M 6% /var/lib/ceph/osd/ceph-2 /dev/sdb1 97M 5.3M 92M 6% /var/lib/ceph/osd/ceph-0 ``` Need to execute the following command to fix: ``` [root@ceph-node2 ~]# systemctl stop ceph-osd@0 [root@ceph-node2 ~]# systemctl stop ceph-osd@2 [root@ceph-node2 ~]# umount /var/lib/ceph/osd/ceph-2 [root@ceph-node2 ~]# umount /var/lib/ceph/osd/ceph-0 ``` At this point, restart the osd container and we can see that the correct mount should be following: ``` /dev/sdc1 97M 5.3M 92M 6% /var/lib/ceph/osd/90a9ac9d-39bc-438e-a24b-aad71757d66a /dev/sdb1 97M 5.3M 92M 6% /var/lib/ceph/osd/2fbe7fce-2290-4bcf-9961-4227c45e0e62 ``` Ceph uses udev to automount, the corresponding osd type id is "4fbd7e29-9d25-41b8-afd0-062c0ceff05d", so as long as modify the type id of osd, we can avoid this phenomenon. https://github.com/ceph/ceph/blob/luminous/udev/95-ceph-osd.rules I did some optimizations for the above aspects (based on tonezhang's patch): kolla: https://review.openstack.org/#/c/575400/ kolla-ansible: https://review.openstack.org/#/c/575408/
2018-07-25 10:52:23	wangwei	description	I tested the latest bluestore code, it is the patch of Mr Tonezhang: kolla: https://review.openstack.org/#/c/566810/ kolla-ansible: https://review.openstack.org/#/c/566801/9 1. PS: I have only encountered the following problems on the cloud virtual machine. It is okay to use partlabel on the virtual machine of VMware. If anyone else encounters this problem, please leave the environment description. The following is the original description: In my tests, I encountered a problem that osd bootstrap failed when executing this command: ceph-osd -i "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" the los as follows: ``` ++ partprobe ++ ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_B_2 /var/lib/ceph/osd/ceph-2/block ++ '[' -n '' ']' ++ '[' -n '' ']' ++ ceph-osd -i 2 --mkfs -k /var/lib/ceph/osd/ceph-2/keyring --osd-uuid b5703869-87d1-4ab8-be11-ab24db2870cc ``` So I add "-d" parameter to debug the problem: ceph-osd -i -d "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" ``` ++ ceph-osd -d -i 0 --mkfs -k /var/lib/ceph/osd/ceph-0/keyring --osd-uuid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.216034 7f808b6a0d80 0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 78 2018-06-13 17:29:53.243358 7f808b6a0d80 0 stack NetworkStack max thread limit is 24, switching to this now. Higher thread values are unnecessary and currently unsupported. 2018-06-13 17:29:53.248479 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs path /var/lib/ceph/osd/ceph-0 2018-06-13 17:29:53.248676 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.248714 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.249134 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) _read_fsid unparsable uuid 2018-06-13 17:29:53.249141 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs using provided fsid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.249361 7f808b6a0d80 1 bdev create path /var/lib/ceph/osd/ceph-0/block type kernel 2018-06-13 17:29:53.249372 7f808b6a0d80 1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2018-06-13 17:29:53.249400 7f808b6a0d80 -1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open open got: (2) No such file or directory 2018-06-13 17:29:53.249654 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs failed, (2) No such file or directory 2018-06-13 17:29:53.249662 7f808b6a0d80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory 2018-06-13 17:29:53.249950 7f808b6a0d80 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directory ``` After my testing, I found that after executing this command: ``` sgdisk "--change-name=2:KOLLA_CEPH_DATA_BS_B_${OSD_ID}" "--typecode=2:${CEPH_OSD_TYPE_CODE}" -- "${OSD_BS_BLOCK_DEV}" ``` It took 3 seconds to generate the by-partlabel folder and partlabel on my centos virtual machine, but the partuuid was generated immediately without delay. So I think using partuuid is better than partlabel, when initializing ceph. In the ceph-deploy tool, the command to initialize osd is 'ceph-disk preapre', which also uses partuuid. 2. In the current find_disks.py logic, both bluestore and filestore return all disk information, including osd partition, journal partition, block partition, wal partition and db partition. ``` "bs_db_device": "", "bs_db_label": "", "bs_db_partition_num": "", "bs_wal_device": "", "bs_wal_label": "", "bs_wal_partition_num": "", "device": "/dev/xvdb", "external_journal": false, "fs_label": "", "fs_uuid": "cd711f44-2fa8-41c8-8f74-b43e96758edd", "journal": "", "journal_device": "", "journal_num": 0, "partition": "/dev/xvdb", "partition_label": "KOLLA_CEPH_OSD_BOOTSTRAP_BS", "partition_num": "1" ``` There is a bit of confusion here. In fact, in the filestore, there are only osd partition and journal partition. In the bluestore, there are osd data partition, block partition, wal partition and db partition. I think we should distinguish between the bluestore and filestore disk information， like this: ```bluestore "osds_bootstrap": [ { "bs_blk_device": "/dev/sdb", "bs_blk_partition": "/dev/sdb2", "bs_blk_partition_num": 2, "fs_label": "", "fs_uuid": "d5ca7d92-457e-484c-a7f3-7b0497249f87", "osd_device": "/dev/sdb", "osd_partition": "/dev/sdb1", "osd_partition_num": "1", "store_type": "bluestore", "use_entire_disk": true }, "osds_bootstrap": [ { "bs_blk_device": "/dev/sdb", "bs_blk_partition": "/dev/sdb2", "bs_blk_partition_num": 2, "bs_db_device": "/dev/sdc", "bs_db_partition": "/dev/sdc2", "bs_db_partition_num": "2", "bs_wal_device": "/dev/sdc", "bs_wal_partition": "/dev/sdc1", "bs_wal_partition_num": "1", "fs_label": "", "fs_uuid": "f1590016-2bf2-4690-b9cb-497a95eacac0", "osd_device": "/dev/sdb", "osd_partition": "/dev/sdb1", "osd_partition_num": "1", "store_type": "bluestore", "use_entire_disk": true } ] ``` ```filestore "osds_bootstrap": [ { "fs_label": "", "fs_uuid": "0d965d41-2027-4713-ba24-3e0f53ce5ec2", "journal_device": "/dev/sdb", "journal_num": 2, "journal_partition": "/dev/sdb2", "osd_device": "/dev/sdb", "osd_partition": "/dev/sdb1", "osd_partition_num": "1", "store_type": "filestore", "use_entire_disk": true }, { "fs_label": "", "fs_uuid": "", "journal_device": "/dev/sdc", "journal_num": "2", "journal_partition": "/dev/disk/by-partuuid/81f04fbf-f272-4073-9217-cf02805dda17", "osd_device": "/dev/sdc", "osd_partition": "/dev/sdc1", "osd_partition_num": "1", "store_type": "filestore", "use_entire_disk": false } ] ``` 3. The osd partition lable after successful initialization is as follows: ``` KOLLA_CEPH_BSDATA_1 KOLLA_CEPH_DATA_BS_B_1 KOLLA_CEPH_DATA_BS_D_1 KOLLA_CEPH_DATA_BS_W_1 ``` The prefix is different so we can't find the disk as the filestore's logic. So I think a good way to name it like this: ``` KOLLA_CEPH_DATA_BS_1 KOLLA_CEPH_DATA_BS_1_B KOLLA_CEPH_DATA_BS_1_D KOLLA_CEPH_DATA_BS_1_W ``` Regular naming can reduce some code. Similarly, the division of each osd partition label should take the following approach: ``` KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_D ``` The simplest label is: ``` KOLLA_CEPH_OSD_BOOTSTRAP_BS ``` 4. According to the naming method above, we can deploy in three ways. 1) The disk has only one partition or the label of the first partition is "KOLLA_CEPH_OSD_BOOTSTRAP_BS", if you use this label, kolla will default to dividing the entire disk into 100M osd data and block partitions. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS 1 -1 ``` result: ``` Number Start End Size File system Name Flags 1 1049kB 106MB 105MB xfs KOLLA_CEPH_DATA_BS_2 2 106MB 107GB 107GB KOLLA_CEPH_DATA_BS_2_B ``` 2）If a disk has only one partition, or the label of the first partition is "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1", then you do not specify a block partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B", then kolla will initialize the entire disk to osd data and block. If you specify additional wal and db partitions (not on the same disk as the osd partition) then kolla will initialize wal and db according to your definition. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 2048 sudo /sbin/parted /dev/loop0 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W 1 -1 ``` result: ``` Disk /dev/xvdb: 107GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 106MB 105MB xfs KOLLA_CEPH_DATA_BS_2 2 106MB 107GB 107GB KOLLA_CEPH_DATA_BS_2_B Model: Loopback device (loopback) Disk /dev/loop0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 10.7GB 10.7GB KOLLA_CEPH_DATA_BS_2_W ``` 3)If you specify the osd partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1" and specify the block partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B", kolla will initialize the disk according to your definition. If you specify additional wal and db partitions then kolla will initialize wal and db also according to your definition.In this case, you can arbitrarily define your partition. Four partitions can be defined on the same disk or on different disks. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 200 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W 201 2249 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_D 2250 4298 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B 4299 100% ``` result: ``` Number Start End Size File system Name Flags 1 1049kB 200MB 199MB xfs KOLLA_CEPH_DATA_BS_1 2 201MB 2249MB 2048MB KOLLA_CEPH_DATA_BS_1_W 3 2250MB 4298MB 2048MB KOLLA_CEPH_DATA_BS_1_D 4 4299MB 107GB 103GB KOLLA_CEPH_DATA_BS_1_B ``` 5. In the calculation of OSD_INITIAL_WEIGHT, if the partition is a block partition, because it is a raw device, when the following command is executed, it will give an error, but does not affect the result, so need to add "\|\| true" to ignore the error: ``` OSD_INITIAL_WEIGHT=$(parted --script ${WEIGHT_PARTITION} unit TB print \| awk 'match($0, /^Disk.* (.)TB/, a){printf("%.2f", a[1])}') ``` The error like this: ``` ++ [[ auto == \a\u\t\o ]] +++ parted --script /dev/xvdb2 unit TB print +++ awk 'match($0, /^Disk. (.*)TB/, a){printf("%.2f", a[1])}' Error: /dev/xvdb2: unrecognised disk label ``` 6. https://review.openstack.org/#/c/575346/ This patch added support for loop devices , but adding some judgments about whether the device is a loop device, it is actually not necessary. We can get directly from find_disks.py. For example, we prepare four loopdevice: ``` sudo /sbin/parted /dev/loop0 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_B 1 -1 sudo /sbin/parted /dev/loop1 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO 1 -1 sudo /sbin/parted /dev/loop2 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_D 1 -1 sudo /sbin/parted /dev/loop3 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_W 1 -1 ``` the result of find_disks.py like this: ``` "osds_bootstrap": [ { "bs_blk_device": "/dev/loop0", "bs_blk_partition": "/dev/loop0p1", "bs_blk_partition_num": "1", "bs_blk_partition_uuid": "b4ac5b80-7015-431b-8256-407769d22907", "bs_db_device": "/dev/loop2", "bs_db_partition": "/dev/loop2p1", "bs_db_partition_num": "1", "bs_db_partition_uuid": "ae16d004-c8f9-4696-a514-ad6c0f23429e", "bs_wal_device": "/dev/loop3", "bs_wal_partition": "/dev/loop3p1", "bs_wal_partition_num": "1", "bs_wal_partition_uuid": "7d2c2986-639d-4889-a664-456137ec8fb2", "fs_label": "", "fs_uuid": "000980bc-6a28-4c0a-b18c-f6726bedeb69", "osd_device": "/dev/loop1", "osd_partition": "/dev/loop1p1", "osd_partition_num": "1", "osd_partition_uuid": "27ba1939-6cb9-4de9-9f14-027c4f6c856f", "store_type": "bluestore", "use_entire_disk": false } ] ``` So only need to use the corresponding partition. 7. If ceph luminous package is installed on the host where the osd container is located,then the osd container will fail to start after the host reboots. ``` docker logs: NAMES dd94b67a13f9 xxx/pasta-os/centos-source-ceph-osd:cephT-4.0.2.0002 "kolla_start" 2 minutes ago Restarting (1) 10 seconds ago ceph_osd_2 e6110c697e1c xxx/pasta-os/centos-source-ceph-osd:cephT-4.0.2.0002 "kolla_start" 2 minutes ago Restarting (1) 11 seconds ago df -h logs: /dev/sdc1 97M 5.3M 92M 6% /var/lib/ceph/osd/ceph-2 /dev/sdb1 97M 5.3M 92M 6% /var/lib/ceph/osd/ceph-0 ``` Need to execute the following command to fix: ``` [root@ceph-node2 ~]# systemctl stop ceph-osd@0 [root@ceph-node2 ~]# systemctl stop ceph-osd@2 [root@ceph-node2 ~]# umount /var/lib/ceph/osd/ceph-2 [root@ceph-node2 ~]# umount /var/lib/ceph/osd/ceph-0 ``` At this point, restart the osd container and we can see that the correct mount should be following: ``` /dev/sdc1 97M 5.3M 92M 6% /var/lib/ceph/osd/90a9ac9d-39bc-438e-a24b-aad71757d66a /dev/sdb1 97M 5.3M 92M 6% /var/lib/ceph/osd/2fbe7fce-2290-4bcf-9961-4227c45e0e62 ``` Ceph uses udev to automount, the corresponding osd type id is "4fbd7e29-9d25-41b8-afd0-062c0ceff05d", so as long as modify the type id of osd, we can avoid this phenomenon. https://github.com/ceph/ceph/blob/luminous/udev/95-ceph-osd.rules I did some optimizations for the above aspects (based on tonezhang's patch): kolla: https://review.openstack.org/#/c/575400/ kolla-ansible: https://review.openstack.org/#/c/575408/	I tested the latest bluestore code, it is the patch of Mr Tonezhang: kolla: https://review.openstack.org/#/c/566810/ kolla-ansible: https://review.openstack.org/#/c/566801/9 1. PS: I have only encountered the following problems on the cloud virtual machine. It is okay to use partlabel on the virtual machine of VMware. If anyone else encounters this problem, please leave the environment description. The following is the original description: In my tests, I encountered a problem that osd bootstrap failed when executing this command: ceph-osd -i "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" the los as follows: ``` ++ partprobe ++ ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_B_2 /var/lib/ceph/osd/ceph-2/block ++ '[' -n '' ']' ++ '[' -n '' ']' ++ ceph-osd -i 2 --mkfs -k /var/lib/ceph/osd/ceph-2/keyring --osd-uuid b5703869-87d1-4ab8-be11-ab24db2870cc ``` So I add "-d" parameter to debug the problem: ceph-osd -i -d "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}" ``` ++ ceph-osd -d -i 0 --mkfs -k /var/lib/ceph/osd/ceph-0/keyring --osd-uuid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.216034 7f808b6a0d80 0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 78 2018-06-13 17:29:53.243358 7f808b6a0d80 0 stack NetworkStack max thread limit is 24, switching to this now. Higher thread values are unnecessary and currently unsupported. 2018-06-13 17:29:53.248479 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs path /var/lib/ceph/osd/ceph-0 2018-06-13 17:29:53.248676 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.248714 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory 2018-06-13 17:29:53.249134 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) _read_fsid unparsable uuid 2018-06-13 17:29:53.249141 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs using provided fsid e14d5061-ae41-4c16-bf3c-2e9c5973cb54 2018-06-13 17:29:53.249361 7f808b6a0d80 1 bdev create path /var/lib/ceph/osd/ceph-0/block type kernel 2018-06-13 17:29:53.249372 7f808b6a0d80 1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block 2018-06-13 17:29:53.249400 7f808b6a0d80 -1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open open got: (2) No such file or directory 2018-06-13 17:29:53.249654 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs failed, (2) No such file or directory 2018-06-13 17:29:53.249662 7f808b6a0d80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory 2018-06-13 17:29:53.249950 7f808b6a0d80 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directory ``` After my testing, I found that after executing this command: ``` sgdisk "--change-name=2:KOLLA_CEPH_DATA_BS_B_${OSD_ID}" "--typecode=2:${CEPH_OSD_TYPE_CODE}" -- "${OSD_BS_BLOCK_DEV}" ``` It took 3 seconds to generate the by-partlabel folder and partlabel on my centos virtual machine, but the partuuid was generated immediately without delay. So I think using partuuid is better than partlabel, when initializing ceph. In the ceph-deploy tool, the command to initialize osd is 'ceph-disk preapre', which also uses partuuid. 2. In the current find_disks.py logic, both bluestore and filestore return all disk information, including osd partition, journal partition, block partition, wal partition and db partition. ``` "bs_db_device": "", "bs_db_label": "", "bs_db_partition_num": "", "bs_wal_device": "", "bs_wal_label": "", "bs_wal_partition_num": "", "device": "/dev/xvdb", "external_journal": false, "fs_label": "", "fs_uuid": "cd711f44-2fa8-41c8-8f74-b43e96758edd", "journal": "", "journal_device": "", "journal_num": 0, "partition": "/dev/xvdb", "partition_label": "KOLLA_CEPH_OSD_BOOTSTRAP_BS", "partition_num": "1" ``` There is a bit of confusion here. In fact, in the filestore, there are only osd partition and journal partition. In the bluestore, there are osd data partition, block partition, wal partition and db partition. I think we should distinguish between the bluestore and filestore disk information， like this: ```bluestore "osds_bootstrap": [ { "bs_blk_device": "/dev/sdb", "bs_blk_partition": "/dev/sdb2", "bs_blk_partition_num": 2, "fs_label": "", "fs_uuid": "d5ca7d92-457e-484c-a7f3-7b0497249f87", "osd_device": "/dev/sdb", "osd_partition": "/dev/sdb1", "osd_partition_num": "1", "store_type": "bluestore", "use_entire_disk": true }, "osds_bootstrap": [ { "bs_blk_device": "/dev/sdb", "bs_blk_partition": "/dev/sdb2", "bs_blk_partition_num": 2, "bs_db_device": "/dev/sdc", "bs_db_partition": "/dev/sdc2", "bs_db_partition_num": "2", "bs_wal_device": "/dev/sdc", "bs_wal_partition": "/dev/sdc1", "bs_wal_partition_num": "1", "fs_label": "", "fs_uuid": "f1590016-2bf2-4690-b9cb-497a95eacac0", "osd_device": "/dev/sdb", "osd_partition": "/dev/sdb1", "osd_partition_num": "1", "store_type": "bluestore", "use_entire_disk": true } ] ``` ```filestore "osds_bootstrap": [ { "fs_label": "", "fs_uuid": "0d965d41-2027-4713-ba24-3e0f53ce5ec2", "journal_device": "/dev/sdb", "journal_num": 2, "journal_partition": "/dev/sdb2", "osd_device": "/dev/sdb", "osd_partition": "/dev/sdb1", "osd_partition_num": "1", "store_type": "filestore", "use_entire_disk": true }, { "fs_label": "", "fs_uuid": "", "journal_device": "/dev/sdc", "journal_num": "2", "journal_partition": "/dev/disk/by-partuuid/81f04fbf-f272-4073-9217-cf02805dda17", "osd_device": "/dev/sdc", "osd_partition": "/dev/sdc1", "osd_partition_num": "1", "store_type": "filestore", "use_entire_disk": false } ] ``` 3. The osd partition lable after successful initialization is as follows: ``` KOLLA_CEPH_BSDATA_1 KOLLA_CEPH_DATA_BS_B_1 KOLLA_CEPH_DATA_BS_D_1 KOLLA_CEPH_DATA_BS_W_1 ``` The prefix is different so we can't find the disk as the filestore's logic. So I think a good way to name it like this: ``` KOLLA_CEPH_DATA_BS_1 KOLLA_CEPH_DATA_BS_1_B KOLLA_CEPH_DATA_BS_1_D KOLLA_CEPH_DATA_BS_1_W ``` Regular naming can reduce some code. Similarly, the division of each osd partition label should take the following approach: ``` KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_D ``` The simplest label is: ``` KOLLA_CEPH_OSD_BOOTSTRAP_BS ``` 4. According to the naming method above, we can deploy in three ways. 1) The disk has only one partition or the label of the first partition is "KOLLA_CEPH_OSD_BOOTSTRAP_BS", if you use this label, kolla will default to dividing the entire disk into 100M osd data and block partitions. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS 1 -1 ``` result: ``` Number Start End Size File system Name Flags 1 1049kB 106MB 105MB xfs KOLLA_CEPH_DATA_BS_2 2 106MB 107GB 107GB KOLLA_CEPH_DATA_BS_2_B ``` 2）If a disk has only one partition, or the label of the first partition is "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1", then you do not specify a block partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B", then kolla will initialize the entire disk to osd data and block. If you specify additional wal and db partitions (not on the same disk as the osd partition) then kolla will initialize wal and db according to your definition. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 2048 sudo /sbin/parted /dev/loop0 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W 1 -1 ``` result: ``` Disk /dev/xvdb: 107GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 106MB 105MB xfs KOLLA_CEPH_DATA_BS_2 2 106MB 107GB 107GB KOLLA_CEPH_DATA_BS_2_B Model: Loopback device (loopback) Disk /dev/loop0: 10.7GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: Number Start End Size File system Name Flags 1 1049kB 10.7GB 10.7GB KOLLA_CEPH_DATA_BS_2_W ``` 3)If you specify the osd partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1" and specify the block partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B", kolla will initialize the disk according to your definition. If you specify additional wal and db partitions then kolla will initialize wal and db also according to your definition.In this case, you can arbitrarily define your partition. Four partitions can be defined on the same disk or on different disks. e.g: ``` sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 200 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W 201 2249 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_D 2250 4298 sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B 4299 100% ``` result: ``` Number Start End Size File system Name Flags 1 1049kB 200MB 199MB xfs KOLLA_CEPH_DATA_BS_1 2 201MB 2249MB 2048MB KOLLA_CEPH_DATA_BS_1_W 3 2250MB 4298MB 2048MB KOLLA_CEPH_DATA_BS_1_D 4 4299MB 107GB 103GB KOLLA_CEPH_DATA_BS_1_B ``` 5. In the calculation of OSD_INITIAL_WEIGHT, if the partition is a block partition, because it is a raw device, when the following command is executed, it will give an error, but does not affect the result, so need to add "\|\| true" to ignore the error: ``` OSD_INITIAL_WEIGHT=$(parted --script ${WEIGHT_PARTITION} unit TB print \| awk 'match($0, /^Disk.* (.)TB/, a){printf("%.2f", a[1])}') ``` The error like this: ``` ++ [[ auto == \a\u\t\o ]] +++ parted --script /dev/xvdb2 unit TB print +++ awk 'match($0, /^Disk. (.*)TB/, a){printf("%.2f", a[1])}' Error: /dev/xvdb2: unrecognised disk label ``` 6. https://review.openstack.org/#/c/575346/ This patch added support for loop devices , but adding some judgments about whether the device is a loop device, it is actually not necessary. We can get directly from find_disks.py. For example, we prepare four loopdevice: ``` sudo /sbin/parted /dev/loop0 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_B 1 -1 sudo /sbin/parted /dev/loop1 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO 1 -1 sudo /sbin/parted /dev/loop2 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_D 1 -1 sudo /sbin/parted /dev/loop3 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_W 1 -1 ``` the result of find_disks.py like this: ``` "osds_bootstrap": [ { "bs_blk_device": "/dev/loop0", "bs_blk_partition": "/dev/loop0p1", "bs_blk_partition_num": "1", "bs_blk_partition_uuid": "b4ac5b80-7015-431b-8256-407769d22907", "bs_db_device": "/dev/loop2", "bs_db_partition": "/dev/loop2p1", "bs_db_partition_num": "1", "bs_db_partition_uuid": "ae16d004-c8f9-4696-a514-ad6c0f23429e", "bs_wal_device": "/dev/loop3", "bs_wal_partition": "/dev/loop3p1", "bs_wal_partition_num": "1", "bs_wal_partition_uuid": "7d2c2986-639d-4889-a664-456137ec8fb2", "fs_label": "", "fs_uuid": "000980bc-6a28-4c0a-b18c-f6726bedeb69", "osd_device": "/dev/loop1", "osd_partition": "/dev/loop1p1", "osd_partition_num": "1", "osd_partition_uuid": "27ba1939-6cb9-4de9-9f14-027c4f6c856f", "store_type": "bluestore", "use_entire_disk": false } ] ``` So only need to use the corresponding partition. 7. If ceph luminous package is installed on the host where the osd container is located,then the osd container will fail to start after the host reboots. ``` docker logs: NAMES dd94b67a13f9 xxx/pasta-os/centos-source-ceph-osd:cephT-4.0.2.0002 "kolla_start" 2 minutes ago Restarting (1) 10 seconds ago ceph_osd_2 e6110c697e1c xxx/pasta-os/centos-source-ceph-osd:cephT-4.0.2.0002 "kolla_start" 2 minutes ago Restarting (1) 11 seconds ago df -h logs: /dev/sdc1 97M 5.3M 92M 6% /var/lib/ceph/osd/ceph-2 /dev/sdb1 97M 5.3M 92M 6% /var/lib/ceph/osd/ceph-0 ``` Need to execute the following command to fix: ``` [root@ceph-node2 ~]# systemctl stop ceph-osd@0 [root@ceph-node2 ~]# systemctl stop ceph-osd@2 [root@ceph-node2 ~]# umount /var/lib/ceph/osd/ceph-2 [root@ceph-node2 ~]# umount /var/lib/ceph/osd/ceph-0 ``` At this point, restart the osd container and we can see that the correct mount should be following: ``` /dev/sdc1 97M 5.3M 92M 6% /var/lib/ceph/osd/90a9ac9d-39bc-438e-a24b-aad71757d66a /dev/sdb1 97M 5.3M 92M 6% /var/lib/ceph/osd/2fbe7fce-2290-4bcf-9961-4227c45e0e62 ``` Ceph uses udev to automount, the corresponding osd type id is "4fbd7e29-9d25-41b8-afd0-062c0ceff05d", so as long as modify the type id of osd, we can avoid this phenomenon. https://github.com/ceph/ceph/blob/luminous/udev/95-ceph-osd.rules I did some optimizations for the above aspects (based on tonezhang's patch): kolla: https://review.openstack.org/#/c/575400/ kolla-ansible: https://review.openstack.org/#/c/575408/
2020-06-02 08:35:26	Michal Nasiadka	kolla: status	In Progress	Won't Fix