kolla

Some issues with the bluestore code

Bug #1776888 reported by wangwei on 2018-06-14

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	kolla	Won't Fix	Undecided	wangwei

Bug Description

I tested the latest bluestore code, it is the patch of Mr Tonezhang:
kolla:
https://review.openstack.org/#/c/566810/
kolla-ansible:
https://review.openstack.org/#/c/566801/9

PS:
I have only encountered the following problems on the cloud virtual machine. It is okay to use partlabel on the virtual machine of VMware. If anyone else encounters this problem, please leave the environment description.

The following is the original description:

In my tests, I encountered a problem that osd bootstrap failed when executing this command:

ceph-osd -i "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}"

the los as follows:

```
++ partprobe
++ ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_B_2 /var/lib/ceph/osd/ceph-2/block
++ '[' -n '' ']'
++ '[' -n '' ']'
++ ceph-osd -i 2 --mkfs -k /var/lib/ceph/osd/ceph-2/keyring --osd-uuid b5703869-87d1-4ab8-be11-ab24db2870cc
```

So I add "-d" parameter to debug the problem:
ceph-osd -i -d "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}"

```
++ ceph-osd -d -i 0 --mkfs -k /var/lib/ceph/osd/ceph-0/keyring --osd-uuid e14d5061-ae41-4c16-bf3c-2e9c5973cb54
2018-06-13 17:29:53.216034 7f808b6a0d80 0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 78
2018-06-13 17:29:53.243358 7f808b6a0d80 0 stack NetworkStack max thread limit is 24, switching to this now. Higher thread values are unnecessary and currently unsupported.
2018-06-13 17:29:53.248479 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs path /var/lib/ceph/osd/ceph-0
2018-06-13 17:29:53.248676 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory
2018-06-13 17:29:53.248714 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-0/block: (2) No such file or directory
2018-06-13 17:29:53.249134 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) _read_fsid unparsable uuid
2018-06-13 17:29:53.249141 7f808b6a0d80 1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs using provided fsid e14d5061-ae41-4c16-bf3c-2e9c5973cb54
2018-06-13 17:29:53.249361 7f808b6a0d80 1 bdev create path /var/lib/ceph/osd/ceph-0/block type kernel
2018-06-13 17:29:53.249372 7f808b6a0d80 1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open path /var/lib/ceph/osd/ceph-0/block
2018-06-13 17:29:53.249400 7f808b6a0d80 -1 bdev(0x563ef1b19600 /var/lib/ceph/osd/ceph-0/block) open open got: (2) No such file or directory
2018-06-13 17:29:53.249654 7f808b6a0d80 -1 bluestore(/var/lib/ceph/osd/ceph-0) mkfs failed, (2) No such file or directory
2018-06-13 17:29:53.249662 7f808b6a0d80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory
2018-06-13 17:29:53.249950 7f808b6a0d80 -1 ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-0: (2) No such file or directory
```

After my testing, I found that after executing this command:

```
sgdisk "--change-name=2:KOLLA_CEPH_DATA_BS_B_${OSD_ID}" "--typecode=2:${CEPH_OSD_TYPE_CODE}" -- "${OSD_BS_BLOCK_DEV}"
```
It took 3 seconds to generate the by-partlabel folder and partlabel on my centos virtual machine, but the partuuid was generated immediately without delay.

So I think using partuuid is better than partlabel, when initializing ceph.

In the ceph-deploy tool, the command to initialize osd is 'ceph-disk preapre', which also uses partuuid.

In the current find_disks.py logic, both bluestore and filestore return all disk information, including osd partition, journal partition, block partition, wal partition and db partition.
```
"bs_db_device": "",
"bs_db_label": "",
"bs_db_partition_num": "",
"bs_wal_device": "",
"bs_wal_label": "",
"bs_wal_partition_num": "",
"device": "/dev/xvdb",
"external_journal": false,
"fs_label": "",
"fs_uuid": "cd711f44-2fa8-41c8-8f74-b43e96758edd",
"journal": "",
"journal_device": "",
"journal_num": 0,
"partition": "/dev/xvdb",
"partition_label": "KOLLA_CEPH_OSD_BOOTSTRAP_BS",
"partition_num": "1"
```
There is a bit of confusion here. In fact, in the filestore, there are only osd partition and journal partition. In the bluestore, there are osd data partition, block partition, wal partition and db partition.

I think we should distinguish between the bluestore and filestore disk information， like this:

```bluestore
"osds_bootstrap": [
        {
            "bs_blk_device": "/dev/sdb",
            "bs_blk_partition": "/dev/sdb2",
            "bs_blk_partition_num": 2,
            "fs_label": "",
            "fs_uuid": "d5ca7d92-457e-484c-a7f3-7b0497249f87",
            "osd_device": "/dev/sdb",
            "osd_partition": "/dev/sdb1",
            "osd_partition_num": "1",
            "store_type": "bluestore",
            "use_entire_disk": true
        },

"osds_bootstrap": [
        {
            "bs_blk_device": "/dev/sdb",
            "bs_blk_partition": "/dev/sdb2",
            "bs_blk_partition_num": 2,
            "bs_db_device": "/dev/sdc",
            "bs_db_partition": "/dev/sdc2",
            "bs_db_partition_num": "2",
            "bs_wal_device": "/dev/sdc",
            "bs_wal_partition": "/dev/sdc1",
            "bs_wal_partition_num": "1",
            "fs_label": "",
            "fs_uuid": "f1590016-2bf2-4690-b9cb-497a95eacac0",
            "osd_device": "/dev/sdb",
            "osd_partition": "/dev/sdb1",
            "osd_partition_num": "1",
            "store_type": "bluestore",
            "use_entire_disk": true
        }
    ]

```

```filestore
"osds_bootstrap": [
        {
            "fs_label": "",
            "fs_uuid": "0d965d41-2027-4713-ba24-3e0f53ce5ec2",
            "journal_device": "/dev/sdb",
            "journal_num": 2,
            "journal_partition": "/dev/sdb2",
            "osd_device": "/dev/sdb",
            "osd_partition": "/dev/sdb1",
            "osd_partition_num": "1",
            "store_type": "filestore",
            "use_entire_disk": true
        },
        {
            "fs_label": "",
            "fs_uuid": "",
            "journal_device": "/dev/sdc",
            "journal_num": "2",
            "journal_partition": "/dev/disk/by-partuuid/81f04fbf-f272-4073-9217-cf02805dda17",
            "osd_device": "/dev/sdc",
            "osd_partition": "/dev/sdc1",
            "osd_partition_num": "1",
            "store_type": "filestore",
            "use_entire_disk": false
        }
    ]
```

The osd partition lable after successful initialization is as follows:

```
KOLLA_CEPH_BSDATA_1
KOLLA_CEPH_DATA_BS_B_1
KOLLA_CEPH_DATA_BS_D_1
KOLLA_CEPH_DATA_BS_W_1

```
The prefix is different so we can't find the disk as the filestore's logic.

So I think a good way to name it like this:

```
KOLLA_CEPH_DATA_BS_1
KOLLA_CEPH_DATA_BS_1_B
KOLLA_CEPH_DATA_BS_1_D
KOLLA_CEPH_DATA_BS_1_W
```
Regular naming can reduce some code.

Similarly, the division of each osd partition label should take the following approach:

```
KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1
KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B
KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W
KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_D
```
The simplest label is:

```
KOLLA_CEPH_OSD_BOOTSTRAP_BS
```

According to the naming method above, we can deploy in three ways.

1) The disk has only one partition or the label of the first partition is "KOLLA_CEPH_OSD_BOOTSTRAP_BS", if you use this label, kolla will default to dividing the entire disk into 100M osd data and block partitions.

e.g:
```
sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS 1 -1
```
result:
```
Number Start End Size File system Name Flags
1 1049kB 106MB 105MB xfs KOLLA_CEPH_DATA_BS_2
2 106MB 107GB 107GB KOLLA_CEPH_DATA_BS_2_B
```

2）If a disk has only one partition, or the label of the first partition is "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1", then you do not specify a block partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B", then kolla will initialize the entire disk to osd data and block. If you specify additional wal and db partitions (not on the same disk as the osd partition) then kolla will initialize wal and db according to your definition.

e.g:
```
sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 2048

sudo /sbin/parted /dev/loop0 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W 1 -1
```
result:
```
Disk /dev/xvdb: 107GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags
1 1049kB 106MB 105MB xfs KOLLA_CEPH_DATA_BS_2
2 106MB 107GB 107GB KOLLA_CEPH_DATA_BS_2_B

Model: Loopback device (loopback)
Disk /dev/loop0: 10.7GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags
1 1049kB 10.7GB 10.7GB KOLLA_CEPH_DATA_BS_2_W

```

3)If you specify the osd partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1" and specify the block partition "KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B", kolla will initialize the disk according to your definition. If you specify additional wal and db partitions then kolla will initialize wal and db also according to your definition.In this case, you can arbitrarily define your partition. Four partitions can be defined on the same disk or on different disks.

e.g:
```
sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 200
sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_W 201 2249
sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_D 2250 4298
sudo /sbin/parted /dev/xvdb -s mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1_B 4299 100%
```
result:
```
Number Start End Size File system Name Flags
1 1049kB 200MB 199MB xfs KOLLA_CEPH_DATA_BS_1
2 201MB 2249MB 2048MB KOLLA_CEPH_DATA_BS_1_W
3 2250MB 4298MB 2048MB KOLLA_CEPH_DATA_BS_1_D
4 4299MB 107GB 103GB KOLLA_CEPH_DATA_BS_1_B

```

In the calculation of OSD_INITIAL_WEIGHT, if the partition is a block partition, because it is a raw device, when the following command is executed, it will give an error, but does not affect the result, so need to add "|| true" to ignore the error:

```
OSD_INITIAL_WEIGHT=$(parted --script ${WEIGHT_PARTITION} unit TB print | awk 'match($0, /^Disk.* (.*)TB/, a){printf("%.2f", a[1])}')
```
The error like this:
```
++ [[ auto == \a\u\t\o ]]
+++ parted --script /dev/xvdb2 unit TB print
+++ awk 'match($0, /^Disk.* (.*)TB/, a){printf("%.2f", a[1])}'
Error: /dev/xvdb2: unrecognised disk label
```

https://review.openstack.org/#/c/575346/
This patch added support for loop devices , but adding some judgments about whether the device is a loop device, it is actually not necessary. We can get directly from find_disks.py. For example, we prepare four loopdevice:

```
sudo /sbin/parted /dev/loop0 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_B 1 -1
sudo /sbin/parted /dev/loop1 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO 1 -1
sudo /sbin/parted /dev/loop2 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_D 1 -1
sudo /sbin/parted /dev/loop3 -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO_W 1 -1
```
the result of find_disks.py like this:
```
"osds_bootstrap": [
        {
            "bs_blk_device": "/dev/loop0",
            "bs_blk_partition": "/dev/loop0p1",
            "bs_blk_partition_num": "1",
            "bs_blk_partition_uuid": "b4ac5b80-7015-431b-8256-407769d22907",
            "bs_db_device": "/dev/loop2",
            "bs_db_partition": "/dev/loop2p1",
            "bs_db_partition_num": "1",
            "bs_db_partition_uuid": "ae16d004-c8f9-4696-a514-ad6c0f23429e",
            "bs_wal_device": "/dev/loop3",
            "bs_wal_partition": "/dev/loop3p1",
            "bs_wal_partition_num": "1",
            "bs_wal_partition_uuid": "7d2c2986-639d-4889-a664-456137ec8fb2",
            "fs_label": "",
            "fs_uuid": "000980bc-6a28-4c0a-b18c-f6726bedeb69",
            "osd_device": "/dev/loop1",
            "osd_partition": "/dev/loop1p1",
            "osd_partition_num": "1",
            "osd_partition_uuid": "27ba1939-6cb9-4de9-9f14-027c4f6c856f",
            "store_type": "bluestore",
            "use_entire_disk": false
        }
    ]
```
So only need to use the corresponding partition.

If ceph luminous package is installed on the host where the osd container is located,then the osd container will fail to start after the host reboots.

```
docker logs:
NAMES
dd94b67a13f9 xxx/pasta-os/centos-source-ceph-osd:cephT-4.0.2.0002 "kolla_start" 2 minutes ago Restarting (1) 10 seconds ago ceph_osd_2
e6110c697e1c xxx/pasta-os/centos-source-ceph-osd:cephT-4.0.2.0002 "kolla_start" 2 minutes ago Restarting (1) 11 seconds ago

df -h logs:
/dev/sdc1 97M 5.3M 92M 6% /var/lib/ceph/osd/ceph-2
/dev/sdb1 97M 5.3M 92M 6% /var/lib/ceph/osd/ceph-0
```
Need to execute the following command to fix:

```
[root@ceph-node2 ~]# systemctl stop ceph-osd@0
[root@ceph-node2 ~]# systemctl stop ceph-osd@2

[root@ceph-node2 ~]# umount /var/lib/ceph/osd/ceph-2
[root@ceph-node2 ~]# umount /var/lib/ceph/osd/ceph-0
```
At this point, restart the osd container and we can see that the correct mount should be following:
```
/dev/sdc1 97M 5.3M 92M 6% /var/lib/ceph/osd/90a9ac9d-39bc-438e-a24b-aad71757d66a
/dev/sdb1 97M 5.3M 92M 6% /var/lib/ceph/osd/2fbe7fce-2290-4bcf-9961-4227c45e0e62
```
Ceph uses udev to automount, the corresponding osd type id is "4fbd7e29-9d25-41b8-afd0-062c0ceff05d", so as long as modify the type id of osd, we can avoid this phenomenon.
https://github.com/ceph/ceph/blob/luminous/udev/95-ceph-osd.rules

I did some optimizations for the above aspects (based on tonezhang's patch):
kolla:
https://review.openstack.org/#/c/575400/
kolla-ansible:
https://review.openstack.org/#/c/575408/

See original description

wangwei (wangwei-david) on 2018-06-14

description:

updated

wangwei (wangwei-david) on 2018-06-14

Changed in kolla:
assignee:	nobody → wangwei (wangwei-david)

Revision history for this message

wangwei (wangwei-david) wrote on 2018-06-15:

Download full text (11.8 KiB)

This is the log of the steps for ceph-disk to initialize osd. I have simplified some of the content. Please refer to:

```
# Get the config of bluestore_block_size bluestore_block_size bluestore_block_wal_size
INFO:ceph_disk.main:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup bluestore_block_size
INFO:ceph_disk.main:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup bluestore_block_db_size
INFO:ceph_disk.main:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup bluestore_block_size
INFO:ceph_disk.main:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup bluestore_block_wal_size
# Get the config of xfs
INFO:ceph_disk.main:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
INFO:ceph_disk.main:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
INFO:ceph_disk.main:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
INFO:ceph_disk.main:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
# Create the osd data partition
DEBUG:ceph_disk.main:Creating data partition num 1 size 100 on /dev/xvdb
INFO:ceph_disk.main:Running command: /usr/sbin/sgdisk --new=1:0:+100M --change-name=1:ceph data --partition-guid=1:59af5892-2460-4366-aa41-59be7ec71374 --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/xvdb
DEBUG:ceph_disk.main:Calling partprobe on created device /dev/xvdb
INFO:ceph_disk.main:Running command: /usr/bin/udevadm settle --timeout=600
INFO:ceph_disk.main:Running command: /usr/bin/flock -s /dev/xvdb /usr/sbin/partprobe /dev/xvdb

# Create block.db partition
DEBUG:ceph_disk.main:Creating block.db partition num 1 size 1024 on /dev/rbd0
INFO:ceph_disk.main:Running command: /usr/sbin/sgdisk --new=1:0:+1024M --change-name=1:ceph block.db --partition-guid=1:302a7204-e955-4cda-b8d6-459cee350086 --typecode=1:30cd0809-c2b2-499c-8879-2d6b785292be --mbrtogpt -- /dev/rbd0
DEBUG:ceph_disk.main:Calling partprobe on created device /dev/rbd0
INFO:ceph_disk.main:Running command: /usr/bin/udevadm settle --timeout=600
INFO:ceph_disk.main:Running command: /usr/bin/flock -s /dev/rbd0 /usr/sbin/partprobe /dev/rbd0
DEBUG:ceph_disk.main:Block.db is GPT partition /dev/disk/by-partuuid/302a7204-e955-4cda-b8d6-459cee350086
INFO:ceph_disk.main:Running command: /usr/sbin/sgdisk --typecode=1:30cd0809-c2b2-499c-8879-2d6b78529876 -- /dev/rbd0
INFO:ceph_disk.main:Running command: /usr/bin/chown ceph:ceph /dev/rbd0p1

This is the log of the steps for ceph-disk to initialize osd. I have simplified some of the content. Please refer to:

# Create block.wal partition
DEBUG:ceph_disk.main:name = block.wal
DEBUG:ceph_disk.main:Creating block.wal partition num 2 size 576 on /dev/rbd0
INFO:ceph_disk.main:Running command: /usr/sbin/sgdisk --new=2:0:+576M --change-name=2:ceph block.wal --partition-guid=2:3973042a-0df2-40c9-aff0-aad75bf55198 --typecode=2:5ce17fce-4087-4169-b7ff-056cc58472be --mbrtogpt -- /dev/rbd0
DEBUG:ceph_disk.main:Calling partprobe on created device /dev/rbd0
INFO:ceph_disk.main:Running command: /usr/bin/udevadm settle --timeout=600
INFO:ceph_disk.main:Running command: /usr/bin/flock -s /dev/rbd0 /usr/sbin/partprobe /dev/rbd0
INFO:ceph_disk.main:Running command: /usr/bin/udevadm settle --timeout=600
INFO:ceph_disk.main:Running command: /usr/bin/chown ceph:ceph /dev/rbd0p2
# Create block partition
DEBUG:ceph_disk.main:name = block
DEBUG:ceph_disk.main:get_dm_uuid /dev/xvdb uuid path is /sys/dev/block/202:16/dm/uuid
DEBUG:ceph_disk.main:Creating block partition num 2 size 0 on /dev/xvdb
INFO:ceph_disk.main:Running command: /usr/sbin/sgdisk --largest-new=2 --change-name=2:ceph block --partition-guid=2:ada90959-59d1-4d50-be92-2a2043529382 --typecode=2:cafecafe-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/xvdb
INFO:ceph_disk.main:Running command: /usr/bin/ceph-osd --get-device-fsid /dev/xvdb2
DEBUG:ceph_disk.main:Calling partprobe on created device /dev/xvdb
INFO:ceph_disk.main:Running command: /usr/bin/udevadm settle --timeout=600
INFO:ceph_disk.main:Running command: /usr/bin/chown ceph:ceph /dev/xvdb2
DEBUG:ceph_disk.main:Block is GPT partition /dev/disk/by-partuuid/ada90959-59d1-4d50-be92-2a2043529382
INFO:ceph_disk.main:Running command: /usr/sbin/sgdisk --typecode=2:cafecafe-9b03-4f30-b4c6-b4b80ceff106 -- /dev/xvdb
# Run mkfs
DEBUG:ceph_disk.main:Creating xfs fs on /dev/xvdb1
INFO:ceph_disk.main:Running command: /usr/sbin/mkfs -t xfs -f -i size=2048 -- /dev/xvdb1
# Mount
INFO:ceph_disk.main:Running command: /usr/bin/mount -t xfs -o noatime,inode64 -- /dev/xvdb1 /var/lib/ceph/tmp/mnt.Bi7hQz
INFO:ceph_disk.main:Running command: /usr/sbin/restorecon /var/lib/ceph/tmp/mnt.Bi7hQz
# Write the osd parameters
DEBUG:ceph_disk.main:Preparing osd data dir /var/lib/ceph/tmp/mnt.Bi7hQz
#ceph_fsid
INFO:ceph_disk.main:Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.Bi7hQz/ceph_fsid.111259.tmp
INFO:ceph_disk.main:Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.Bi7hQz/ceph_fsid.111259.tmp
#fsid
INFO:ceph_disk.main:Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.Bi7hQz/fsid.111259.tmp
INFO:ceph_disk.main:Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.Bi7hQz/fsid.111259.tmp
#magic
INFO:ceph_disk.main:Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.Bi7hQz/magic.111259.tmp
INFO:ceph_disk.main:Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.Bi7hQz/magic.111259.tmp
# Create block.db_uuid
INFO:ceph_disk.main:Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.Bi7hQz/block.db_uuid.111259.tmp
INFO:ceph_disk.main:Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.Bi7hQz/block.db_uuid.111259.tmp
# Create symlink of block.db
DEBUG:ceph_disk.main:Creating symlink /var/lib/ceph/tmp/mnt.Bi7hQz/block.db -> /dev/disk/by-partuuid/302a7204-e955-4cda-b8d6-459cee350086
# Create block.wal_uuid
INFO:ceph_disk.main:Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.Bi7hQz/block.wal_uuid.111259.tmp
INFO:ceph_disk.main:Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.Bi7hQz/block.wal_uuid.111259.tmp
# Create symlink of block.wal
DEBUG:ceph_disk.main:Creating symlink /var/lib/ceph/tmp/mnt.Bi7hQz/block.wal -> /dev/disk/by-partuuid/3973042a-0df2-40c9-aff0-aad75bf55198
# Create block_uuid
INFO:ceph_disk.main:Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.Bi7hQz/block_uuid.111259.tmp
INFO:ceph_disk.main:Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.Bi7hQz/block_uuid.111259.tmp
# Create symlink of block 
DEBUG:ceph_disk.main:Creating symlink /var/lib/ceph/tmp/mnt.Bi7hQz/block -> /dev/disk/by-partuuid/ada90959-59d1-4d50-be92-2a2043529382

# Create type
INFO:ceph_disk.main:Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.Bi7hQz/type.111259.tmp
INFO:ceph_disk.main:Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.Bi7hQz/type.111259.tmp

INFO:ceph_disk.main:Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.Bi7hQz
INFO:ceph_disk.main:Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.Bi7hQz
DEBUG:ceph_disk.main:Unmounting /var/lib/ceph/tmp/mnt.Bi7hQz
# Umount
INFO:ceph_disk.main:Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.Bi7hQz
DEBUG:ceph_disk.main:get_dm_uuid /dev/xvdb uuid path is /sys/dev/block/202:16/dm/uuid

#Change typecode
INFO:ceph_disk.main:Running command: /usr/sbin/sgdisk --typecode=1:4fbd7e29-9d25-41b8-afd0-062c0ceff05d -- /dev/xvdb

INFO:ceph_disk.main:Running command: /usr/bin/chown ceph:ceph /dev/xvdb1

#
INFO:ceph_disk.main:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
INFO:ceph_disk.main:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
DEBUG:ceph_disk.main:Mounting /dev/xvdb1 on /var/lib/ceph/tmp/mnt.9SQ2Ol with options noatime,inode64
INFO:ceph_disk.main:Running command: /usr/bin/mount -t xfs -o noatime,inode64 -- /dev/xvdb1 /var/lib/ceph/tmp/mnt.9SQ2Ol
INFO:ceph_disk.main:Running command: /usr/sbin/restorecon /var/lib/ceph/tmp/mnt.9SQ2Ol
DEBUG:ceph_disk.main:Cluster uuid is ae98fec2-d983-4170-b5f7-89a3860c6405
INFO:ceph_disk.main:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
DEBUG:ceph_disk.main:Cluster name is ceph
DEBUG:ceph_disk.main:OSD uuid is 59af5892-2460-4366-aa41-59be7ec71374
DEBUG:ceph_disk.main:Allocating OSD id...

# Create keyring
INFO:ceph_disk.main:Running command: /usr/bin/ceph-authtool --gen-print-key

INFO:ceph_disk.main:Running command with stdin: ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 59af5892-2460-4366-aa41-59be7ec71374
INFO:ceph_disk.main:Running command: /usr/bin/ceph-authtool /var/lib/ceph/tmp/mnt.9SQ2Ol/keyring --create-keyring --name osd.2 --add-key AQDmS7NaIc8wBhAARAPh+5mPNp+sUmA1i5+QsA==
INFO:ceph_disk.main:Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.9SQ2Ol/keyring
INFO:ceph_disk.main:Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.9SQ2Ol/keyring
# whoami
INFO:ceph_disk.main:Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.9SQ2Ol/whoami.112345.tmp
INFO:ceph_disk.main:Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.9SQ2Ol/whoami.112345.tmp
#
DEBUG:ceph_disk.main:OSD id is 2
# init osd
DEBUG:ceph_disk.main:Initializing OSD...
INFO:ceph_disk.main:Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/tmp/mnt.9SQ2Ol/activate.monmap
DEBUG:ceph_disk.main:Calling partprobe on prepared device /dev/xvdb
INFO:ceph_disk.main:Running command: /usr/bin/udevadm settle --timeout=600
INFO:ceph_disk.main:Running command: /usr/bin/flock -s /dev/xvdb /usr/sbin/partprobe /dev/xvdb
INFO:ceph_disk.main:Running command: /usr/bin/ceph-osd --cluster ceph --mkfs -i 2 --monmap /var/lib/ceph/tmp/mnt.9SQ2Ol/activate.monmap --osd-data /var/lib/ceph/tmp/mnt.9SQ2Ol --osd-uuid 59af5892-2460-4366-aa41-59be7ec71374 --setuser ceph --setgroup ceph

DEBUG:ceph_disk.main:Unmounting /var/lib/ceph/tmp/mnt.9SQ2Ol
INFO:ceph_disk.main:Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.9SQ2Ol
INFO:ceph_disk.main:Running command: /usr/bin/udevadm settle --timeout=600
INFO:ceph_disk.main:creating /var/lib/ceph/tmp/mnt.9SQ2Ol/keyring
added entity osd.2 auth auth(auid = 18446744073709551615 key=AQDmS7NaIc8wBhAARAPh+5mPNp+sUmA1i5+QsA== with 0 caps)

#systemd
DEBUG:ceph_disk.main:Marking with init system systemd
INFO:ceph_disk.main:Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.V4tML2/systemd
INFO:ceph_disk.main:Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.V4tML2/systemd
INFO:ceph_disk.main:Running command: /usr/sbin/restorecon -R /var/lib/ceph/tmp/mnt.V4tML2/active.112486.tmp
INFO:ceph_disk.main:Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/tmp/mnt.V4tML2/active.112486.tmp
DEBUG:ceph_disk.main:ceph osd.2 data dir is ready at /var/lib/ceph/tmp/mnt.V4tML2
#
DEBUG:ceph_disk.main:Moving mount to final location...
INFO:ceph_disk.main:Running command: /bin/mount -o noatime,inode64 -- /dev/xvdb1 /var/lib/ceph/osd/ceph-2
INFO:ceph_disk.main:Running command: /bin/umount -l -- /var/lib/ceph/tmp/mnt.V4tML2
DEBUG:ceph_disk.main:Starting ceph osd.2...
INFO:ceph_disk.main:Running command: /usr/bin/systemctl disable ceph-osd@2
INFO:ceph_disk.main:Running command: /usr/bin/systemctl disable ceph-osd@2 --runtime
INFO:ceph_disk.main:Running command: /usr/bin/systemctl enable ceph-osd@2 --runtime
INFO:ceph_disk.main:Running command: /usr/bin/systemctl start ceph-osd@2
DEBUG:ceph_disk.main:
DEBUG:ceph_disk.main:got monmap epoch 1
2018-03-22 15:23:38.420471 7fa15fa14d00 -1 created object store /var/lib/ceph/tmp/mnt.V4tML2 for osd.2 fsid 9535f824-9bce-46fe-890a-1b4f36d17804
Created symlink from /run/systemd/system/ceph-osd.target.wants/ceph-osd@2.service to /usr/lib/systemd/system/ceph-osd@.service.

```

OpenStack Infra (hudson-openstack) on 2018-06-27

Changed in kolla:
status:	New → In Progress

wangwei (wangwei-david) on 2018-06-28

description:

updated

Revision history for this message

Tone Zhang (tone.zhang) wrote on 2018-06-28:

could you please rerun the test with the latest Kolla and Kolla-ansible code?

In the description, when you run you test, the patches are still under review.
https://review.openstack.org/#/c/566810/
https://review.openstack.org/#/c/566801/9

Please rerun the test. Thanks a lot!

Revision history for this message

wangwei (wangwei-david) wrote on 2018-06-28:

Download full text (5.6 KiB)

hi Tone Zhang:

I just tested the latest maste branch code and the result is the same.

The disk is prepared as follows:
```
Model: Xen Virtual Block Device (xvd)
Disk /dev/xvdb: 107GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags
1 1049kB 107GB 107GB xfs KOLLA_CEPH_OSD_BOOTSTRAP_BS
```

And find_disk result is:

```
{
  u 'fs_uuid': u 'f3dfc4a9-2913-44ea-a3cd-0f5b85436a21',
  u 'partition': u '/dev/xvdb',
  u 'external_journal': False,
  u 'bs_blk_label': u '',
  u 'bs_db_partition_num': u '',
  u 'journal_device': u '',
  u 'journal': u '',
  u 'bs_wal_label': u '',
  u 'bs_wal_partition_num': u '',
  u 'fs_label': u '',
  u 'journal_num': 0,
  u 'bs_wal_device': u '',
  u 'partition_num': u '1',
  u 'bs_db_label': u '',
  u 'bs_blk_partition_num': u '',
  u 'device': u '/dev/xvdb',
  u 'bs_db_device': u '',
  u 'partition_label': u 'KOLLA_CEPH_OSD_BOOTSTRAP_BS',
  u 'bs_blk_device': u ''
}
```

and error logs is:

hi Tone Zhang:

I just tested the latest maste branch code and the result is the same.

The disk is prepared as follows:
```
Model: Xen Virtual Block Device (xvd)
Disk /dev/xvdb: 107GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End    Size   File system  Name                         Flags
 1      1049kB  107GB  107GB  xfs          KOLLA_CEPH_OSD_BOOTSTRAP_BS
```

And find_disk result is:

```
{
		u 'fs_uuid': u 'f3dfc4a9-2913-44ea-a3cd-0f5b85436a21',
		u 'partition': u '/dev/xvdb',
		u 'external_journal': False,
		u 'bs_blk_label': u '',
		u 'bs_db_partition_num': u '',
		u 'journal_device': u '',
		u 'journal': u '',
		u 'bs_wal_label': u '',
		u 'bs_wal_partition_num': u '',
		u 'fs_label': u '',
		u 'journal_num': 0,
		u 'bs_wal_device': u '',
		u 'partition_num': u '1',
		u 'bs_db_label': u '',
		u 'bs_blk_partition_num': u '',
		u 'device': u '/dev/xvdb',
		u 'bs_db_device': u '',
		u 'partition_label': u 'KOLLA_CEPH_OSD_BOOTSTRAP_BS',
		u 'bs_blk_device': u ''
	}
```

and error logs is:

```
++ [[ False == \F\a\l\s\e ]]
++ [[ bluestore == \b\l\u\e\s\t\o\r\e ]]
++ [[ /dev/xvdb =~ /dev/loop ]]
++ sgdisk --zap-all -- /dev/xvdb1
Creating new GPT entries.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
++ '[' -n '' ']'
++ sgdisk --zap-all -- /dev/xvdb
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
++ sgdisk --new=1:0:+100M --mbrtogpt -- /dev/xvdb
Creating new GPT entries.
The operation has completed successfully.
++ sgdisk --largest-new=2 --mbrtogpt -- /dev/xvdb
The operation has completed successfully.
++ sgdisk --zap-all -- /dev/xvdb2
Creating new GPT entries.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
++ '[' -n '' ']'
++ '[' -n '' ']'
++ partprobe
++ [[ bluestore == \b\l\u\e\s\t\o\r\e ]]
+++ uuidgen
++ OSD_UUID=5a5c1c56-7618-4fca-9847-f58542add2e8
+++ ceph osd new 5a5c1c56-7618-4fca-9847-f58542add2e8
++ OSD_ID=1
++ OSD_DIR=/var/lib/ceph/osd/ceph-1
++ mkdir -p /var/lib/ceph/osd/ceph-1
++ [[ /dev/xvdb =~ /dev/loop ]]
++ mkfs.xfs -f /dev/xvdb1
meta-data=/dev/xvdb1             isize=512    agcount=4, agsize=6400 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=25600, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=855, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
++ mount /dev/xvdb1 /var/lib/ceph/osd/ceph-1
++ ceph-osd -i 1 --mkkey
++ echo bluestore
++ '[' -n '' ']'
++ sgdisk --change-name=2:KOLLA_CEPH_DATA_BS_B_1 --typecode=2:4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D -- /dev/xvdb
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
++ '[' -n '' ']'
++ '[' -n '' ']'
++ partprobe
++ ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_B_1 /var/lib/ceph/osd/ceph-1/block
++ '[' -n '' ']'
++ '[' -n '' ']'
++ ceph-osd -d -i 1 --mkfs -k /var/lib/ceph/osd/ceph-1/keyring --osd-uuid 5a5c1c56-7618-4fca-9847-f58542add2e8
2018-06-28 16:10:17.396895 7feef145ad80  0 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable), process (unknown), pid 79
2018-06-28 16:10:17.424607 7feef145ad80  0 stack NetworkStack max thread limit is 24, switching to this now. Higher thread values are unnecessary and currently unsupported.
2018-06-28 16:10:17.429836 7feef145ad80  1 bluestore(/var/lib/ceph/osd/ceph-1) mkfs path /var/lib/ceph/osd/ceph-1
2018-06-28 16:10:17.429872 7feef145ad80 -1 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-1/block: (2) No such file or directory
2018-06-28 16:10:17.429906 7feef145ad80 -1 bluestore(/var/lib/ceph/osd/ceph-1/block) _read_bdev_label failed to open /var/lib/ceph/osd/ceph-1/block: (2) No such file or directory
2018-06-28 16:10:17.430137 7feef145ad80 -1 bluestore(/var/lib/ceph/osd/ceph-1) _read_fsid unparsable uuid 
2018-06-28 16:10:17.430146 7feef145ad80  1 bluestore(/var/lib/ceph/osd/ceph-1) mkfs using provided fsid 5a5c1c56-7618-4fca-9847-f58542add2e8
2018-06-28 16:10:17.430187 7feef145ad80  1 bdev create path /var/lib/ceph/osd/ceph-1/block type kernel
2018-06-28 16:10:17.430194 7feef145ad80  1 bdev(0x5621e5d91600 /var/lib/ceph/osd/ceph-1/block) open path /var/lib/ceph/osd/ceph-1/block
2018-06-28 16:10:17.430201 7feef145ad80 -1 bdev(0x5621e5d91600 /var/lib/ceph/osd/ceph-1/block) open open got: (2) No such file or directory
2018-06-28 16:10:17.430215 7feef145ad80 -1 bluestore(/var/lib/ceph/osd/ceph-1) mkfs failed, (2) No such file or directory
2018-06-28 16:10:17.430223 7feef145ad80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (2) No such file or directory
2018-06-28 16:10:17.430334 7feef145ad80 -1  ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-1: (2) No such file or directory
```

Adding a certain amount of sleep time can solve this problem, but I don't think it's a good idea.
So can you review my patch? I used partuuid and solved this problem.
kolla:
https://review.openstack.org/#/c/575400/
kolla-ansible:
https://review.openstack.org/#/c/575408/

Revision history for this message

Tone Zhang (tone.zhang) wrote on 2018-06-28:

Hi Wei,

Thanks!

Could you please show me the result of commands "lsblk" and "blkid"? And could you please show me the "parted" command you used?

In the above log, kolla-ceph identify there is two OSD at least (OSD ID is 1 with /dev/xvdb, not 0). I only see one device for Ceph OSD.

Thanks a lot.

Revision history for this message

wangwei (wangwei-david) wrote on 2018-06-28:

Download full text (14.2 KiB)

Hi Tone,

Because I deployed three osds, but I only took one to show you the error log.
I have three ceph node, there is only one disk per node:

node1:
```
sudo sgdisk --zap-all -- /dev/xvdb
sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS 1 -1
```
node2:
```
sudo sgdisk --zap-all -- /dev/xvdb
sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 -1
```
node3:
```
sudo sgdisk --zap-all -- /dev/xvdb
sudo /sbin/parted /dev/xvdb -s -- mklabel gpt mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1 1 -1
```

there is "lsblk" and "blkid" result:

node1:
```
[root@dev-ww-ceph001-xxx xxx]# lsblk /dev/xvdb
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvdb 202:16 0 100G 0 disk
└─xvdb1 202:17 0 100G 0 part
[root@dev-ww-ceph001-xxx xxx]# blkid /dev/xvdb
/dev/xvdb: PTTYPE="gpt"

```

node2:
```
[root@dev-ww-ceph002-xxx irteamsu]# lsblk /dev/xvdb
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvdb 202:16 0 100G 0 disk
└─xvdb1 202:17 0 100G 0 part
[root@dev-ww-ceph002-xxx irteamsu]# blkid /dev/xvdb
/dev/xvdb: PTTYPE="gpt"
```

node3:

```
[root@dev-ww-ceph003-xxx xxx]# lsblk /dev/xvdb
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
xvdb 202:16 0 100G 0 disk
└─xvdb1 202:17 0 100G 0 part
[root@dev-ww-ceph003-xxx xxx]# blkid /dev/xvdb
/dev/xvdb: PTTYPE="gpt"

```

And i add this command "ls -al /dev/disk" before the following command to show you why it is wrong:
```
ls -al /dev/disk
partprobe || true
ls -al /dev/disk

ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_B_"${OSD_ID}" "${OSD_DIR}"/block

if [ -n "${OSD_BS_WAL_DEV}" ] && [ "${OSD_BS_BLK_DEV}" != "${OSD_BS_WAL_DEV}" ] && [ -n "${OSD_BS_WAL_PARTNUM}" ]; then
ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_W_"${OSD_ID}" "${OSD_DIR}"/block.wal
fi

if [ -n "${OSD_BS_DB_DEV}" ] && [ "${OSD_BS_BLK_DEV}" != "${OSD_BS_DB_DEV}" ] && [ -n "${OSD_BS_DB_PARTNUM}" ]; then
ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_D_"${OSD_ID}" "${OSD_DIR}"/block.db
fi

for (( i=10; i>=0; i=i-1 )); do
    ls -al /dev/disk
    sleep 1
    echo "sleep 1s"
done

ceph-osd -d -i "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}"
```

there are logs for each nodes:

Hi Tone,

Because I deployed three osds, but I only took one to show you the error log.
I have three ceph node, there is only one disk per node:

node1:
```
sudo sgdisk --zap-all -- /dev/xvdb
sudo /sbin/parted  /dev/xvdb  -s  -- mklabel  gpt  mkpart KOLLA_CEPH_OSD_BOOTSTRAP_BS  1 -1
```
node2:
```
sudo sgdisk --zap-all -- /dev/xvdb
sudo /sbin/parted  /dev/xvdb  -s  -- mklabel  gpt  mkpart  KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1  1  -1
```
node3:
```
sudo sgdisk --zap-all -- /dev/xvdb
sudo /sbin/parted  /dev/xvdb  -s  -- mklabel  gpt  mkpart  KOLLA_CEPH_OSD_BOOTSTRAP_BS_FOO1  1  -1
```

there is "lsblk" and "blkid" result:

node1:
```
[root@dev-ww-ceph001-xxx xxx]# lsblk /dev/xvdb
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvdb    202:16   0  100G  0 disk 
└─xvdb1 202:17   0  100G  0 part 
[root@dev-ww-ceph001-xxx xxx]# blkid /dev/xvdb
/dev/xvdb: PTTYPE="gpt"

```

node2:
```
[root@dev-ww-ceph002-xxx irteamsu]# lsblk /dev/xvdb
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvdb    202:16   0  100G  0 disk 
└─xvdb1 202:17   0  100G  0 part 
[root@dev-ww-ceph002-xxx irteamsu]# blkid /dev/xvdb
/dev/xvdb: PTTYPE="gpt"
```

node3:

```
[root@dev-ww-ceph003-xxx xxx]# lsblk /dev/xvdb
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvdb    202:16   0  100G  0 disk 
└─xvdb1 202:17   0  100G  0 part 
[root@dev-ww-ceph003-xxx xxx]# blkid /dev/xvdb
/dev/xvdb: PTTYPE="gpt"

```

And i add this command "ls -al /dev/disk" before the following command to show you why it is wrong:
```
ls -al /dev/disk
partprobe || true
ls -al /dev/disk

ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_B_"${OSD_ID}" "${OSD_DIR}"/block

if [ -n "${OSD_BS_WAL_DEV}" ] && [ "${OSD_BS_BLK_DEV}" != "${OSD_BS_WAL_DEV}" ] && [ -n "${OSD_BS_WAL_PARTNUM}" ]; then
    ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_W_"${OSD_ID}" "${OSD_DIR}"/block.wal
fi

if [ -n "${OSD_BS_DB_DEV}" ] && [ "${OSD_BS_BLK_DEV}" != "${OSD_BS_DB_DEV}" ] && [ -n "${OSD_BS_DB_PARTNUM}" ]; then
    ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_D_"${OSD_ID}" "${OSD_DIR}"/block.db
fi

for (( i=10; i>=0; i=i-1 )); do
    ls -al /dev/disk
    sleep 1
    echo "sleep 1s"
done

ceph-osd -d -i "${OSD_ID}" --mkfs -k "${OSD_DIR}"/keyring --osd-uuid "${OSD_UUID}"
```

there are logs for each nodes:

node1:
```
++ [[ False == \F\a\l\s\e ]]
++ [[ bluestore == \b\l\u\e\s\t\o\r\e ]]
++ [[ /dev/xvdb =~ /dev/loop ]]
++ sgdisk --zap-all -- /dev/xvdb1
Creating new GPT entries.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
++ '[' -n '' ']'
++ sgdisk --zap-all -- /dev/xvdb
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
++ sgdisk --new=1:0:+100M --mbrtogpt -- /dev/xvdb
Creating new GPT entries.
The operation has completed successfully.
++ sgdisk --largest-new=2 --mbrtogpt -- /dev/xvdb
The operation has completed successfully.
++ sgdisk --zap-all -- /dev/xvdb2
Creating new GPT entries.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
++ '[' -n '' ']'
++ '[' -n '' ']'
++ partprobe
++ [[ bluestore == \b\l\u\e\s\t\o\r\e ]]
+++ uuidgen
++ OSD_UUID=c455c7a4-9a67-4bbe-b350-c8bf2a4ce1de
+++ ceph osd new c455c7a4-9a67-4bbe-b350-c8bf2a4ce1de
++ OSD_ID=0
++ OSD_DIR=/var/lib/ceph/osd/ceph-0
++ mkdir -p /var/lib/ceph/osd/ceph-0
++ [[ /dev/xvdb =~ /dev/loop ]]
++ mkfs.xfs -f /dev/xvdb1
meta-data=/dev/xvdb1             isize=512    agcount=4, agsize=6400 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=25600, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=855, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
++ mount /dev/xvdb1 /var/lib/ceph/osd/ceph-0
++ ceph-osd -i 0 --mkkey
++ echo bluestore
++ '[' -n '' ']'
++ sgdisk --change-name=2:KOLLA_CEPH_DATA_BS_B_0 --typecode=2:4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D -- /dev/xvdb
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
++ '[' -n '' ']'
++ '[' -n '' ']'
++ ls -al /dev/disk
total 0
drwxr-xr-x  5 root root  100 Jun 28 20:20 .
drwxr-xr-x 19 root root 3380 Jun 28 20:20 ..
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-parttypeuuid
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-partuuid
drwxr-xr-x  2 root root  140 Jun 28 20:20 by-uuid
++ partprobe
++ ls -al /dev/disk
total 0
drwxr-xr-x  5 root root  100 Jun 28 20:20 .
drwxr-xr-x 19 root root 3380 Jun 28 20:20 ..
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-parttypeuuid
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-partuuid
drwxr-xr-x  2 root root  140 Jun 28 20:20 by-uuid
++ ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_B_0 /var/lib/ceph/osd/ceph-0/block
++ '[' -n '' ']'
++ '[' -n '' ']'
++ (( i=10 ))
++ (( i>=0 ))
++ ls -al /dev/disk
total 0
drwxr-xr-x  5 root root  100 Jun 28 20:20 .
drwxr-xr-x 19 root root 3380 Jun 28 20:20 ..
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-parttypeuuid
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-partuuid
drwxr-xr-x  2 root root  140 Jun 28 20:20 by-uuid
++ sleep 1
++ echo 'sleep 1s'
sleep 1s
++ (( i=i-1  ))
++ (( i>=0 ))
++ ls -al /dev/disk
total 0
drwxr-xr-x  5 root root  100 Jun 28 20:20 .
drwxr-xr-x 19 root root 3380 Jun 28 20:20 ..
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-parttypeuuid
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-partuuid
drwxr-xr-x  2 root root  140 Jun 28 20:20 by-uuid
++ sleep 1
++ echo 'sleep 1s'
sleep 1s
++ (( i=i-1  ))
++ (( i>=0 ))
++ ls -al /dev/disk
total 0
drwxr-xr-x  6 root root  120 Jun 28 20:20 .
drwxr-xr-x 19 root root 3380 Jun 28 20:20 ..
drwxr-xr-x  2 root root   60 Jun 28 20:20 by-partlabel
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-parttypeuuid
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-partuuid
drwxr-xr-x  2 root root  140 Jun 28 20:20 by-uuid

```

node2:
```
++ [[ False == \F\a\l\s\e ]]
++ [[ bluestore == \b\l\u\e\s\t\o\r\e ]]
++ [[ /dev/xvdb =~ /dev/loop ]]
++ sgdisk --zap-all -- /dev/xvdb1
Creating new GPT entries.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
++ '[' -n '' ']'
++ sgdisk --zap-all -- /dev/xvdb
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
++ sgdisk --new=1:0:+100M --mbrtogpt -- /dev/xvdb
Creating new GPT entries.
The operation has completed successfully.
++ sgdisk --largest-new=2 --mbrtogpt -- /dev/xvdb
The operation has completed successfully.
++ sgdisk --zap-all -- /dev/xvdb2
Creating new GPT entries.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
++ '[' -n '' ']'
++ '[' -n '' ']'
++ partprobe
++ [[ bluestore == \b\l\u\e\s\t\o\r\e ]]
+++ uuidgen
++ OSD_UUID=154a2b72-9809-4b72-9799-6314f2ac550b
+++ ceph osd new 154a2b72-9809-4b72-9799-6314f2ac550b
++ OSD_ID=1
++ OSD_DIR=/var/lib/ceph/osd/ceph-1
++ mkdir -p /var/lib/ceph/osd/ceph-1
++ [[ /dev/xvdb =~ /dev/loop ]]
++ mkfs.xfs -f /dev/xvdb1
meta-data=/dev/xvdb1             isize=512    agcount=4, agsize=6400 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=25600, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=855, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
++ mount /dev/xvdb1 /var/lib/ceph/osd/ceph-1
++ ceph-osd -i 1 --mkkey
++ echo bluestore
++ '[' -n '' ']'
++ sgdisk --change-name=2:KOLLA_CEPH_DATA_BS_B_1 --typecode=2:4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D -- /dev/xvdb
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
++ '[' -n '' ']'
++ '[' -n '' ']'
++ ls -al /dev/disk
total 0
drwxr-xr-x  5 root root  100 Jun 28 20:20 .
drwxr-xr-x 19 root root 3260 Jun 28 20:20 ..
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-parttypeuuid
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-partuuid
drwxr-xr-x  2 root root  120 Jun 28 20:20 by-uuid
++ partprobe
++ ls -al /dev/disk
total 0
drwxr-xr-x  5 root root  100 Jun 28 20:20 .
drwxr-xr-x 19 root root 3260 Jun 28 20:20 ..
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-parttypeuuid
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-partuuid
drwxr-xr-x  2 root root  120 Jun 28 20:20 by-uuid
++ ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_B_1 /var/lib/ceph/osd/ceph-1/block
++ '[' -n '' ']'
++ '[' -n '' ']'
++ (( i=10 ))
++ (( i>=0 ))
++ ls -al /dev/disk
total 0
drwxr-xr-x  5 root root  100 Jun 28 20:20 .
drwxr-xr-x 19 root root 3260 Jun 28 20:20 ..
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-parttypeuuid
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-partuuid
drwxr-xr-x  2 root root  120 Jun 28 20:20 by-uuid
++ sleep 1
++ echo 'sleep 1s'
sleep 1s
++ (( i=i-1  ))
++ (( i>=0 ))
++ ls -al /dev/disk
total 0
drwxr-xr-x  5 root root  100 Jun 28 20:20 .
drwxr-xr-x 19 root root 3260 Jun 28 20:20 ..
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-parttypeuuid
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-partuuid
drwxr-xr-x  2 root root  120 Jun 28 20:20 by-uuid
++ sleep 1
++ echo 'sleep 1s'
sleep 1s
++ (( i=i-1  ))
++ (( i>=0 ))
++ ls -al /dev/disk
total 0
drwxr-xr-x  6 root root  120 Jun 28 20:20 .
drwxr-xr-x 19 root root 3260 Jun 28 20:20 ..
drwxr-xr-x  2 root root   60 Jun 28 20:20 by-partlabel
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-parttypeuuid
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-partuuid
drwxr-xr-x  2 root root  120 Jun 28 20:20 by-uuid

```

node3:
```
++ [[ False == \F\a\l\s\e ]]
++ [[ bluestore == \b\l\u\e\s\t\o\r\e ]]
++ [[ /dev/xvdb =~ /dev/loop ]]
++ sgdisk --zap-all -- /dev/xvdb1
Creating new GPT entries.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
++ '[' -n '' ']'
++ sgdisk --zap-all -- /dev/xvdb
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
++ sgdisk --new=1:0:+100M --mbrtogpt -- /dev/xvdb
Creating new GPT entries.
The operation has completed successfully.
++ sgdisk --largest-new=2 --mbrtogpt -- /dev/xvdb
The operation has completed successfully.
++ sgdisk --zap-all -- /dev/xvdb2
Creating new GPT entries.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
++ '[' -n '' ']'
++ '[' -n '' ']'
++ partprobe
++ [[ bluestore == \b\l\u\e\s\t\o\r\e ]]
+++ uuidgen
++ OSD_UUID=08b6cdab-8359-4f50-80d0-eeb265681310
+++ ceph osd new 08b6cdab-8359-4f50-80d0-eeb265681310
++ OSD_ID=2
++ OSD_DIR=/var/lib/ceph/osd/ceph-2
++ mkdir -p /var/lib/ceph/osd/ceph-2
++ [[ /dev/xvdb =~ /dev/loop ]]
++ mkfs.xfs -f /dev/xvdb1
meta-data=/dev/xvdb1             isize=512    agcount=4, agsize=6400 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=0, sparse=0
data     =                       bsize=4096   blocks=25600, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=855, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
++ mount /dev/xvdb1 /var/lib/ceph/osd/ceph-2
++ ceph-osd -i 2 --mkkey
++ echo bluestore
++ '[' -n '' ']'
++ sgdisk --change-name=2:KOLLA_CEPH_DATA_BS_B_2 --typecode=2:4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D -- /dev/xvdb
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
++ '[' -n '' ']'
++ '[' -n '' ']'
++ ls -al /dev/disk
total 0
drwxr-xr-x  4 root root   80 Jun 28 20:20 .
drwxr-xr-x 19 root root 3220 Jun 28 20:20 ..
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-partuuid
drwxr-xr-x  2 root root  120 Jun 28 20:20 by-uuid
++ partprobe
++ ls -al /dev/disk
total 0
drwxr-xr-x  4 root root   80 Jun 28 20:20 .
drwxr-xr-x 19 root root 3220 Jun 28 20:20 ..
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-partuuid
drwxr-xr-x  2 root root  120 Jun 28 20:20 by-uuid
++ ln -sf /dev/disk/by-partlabel/KOLLA_CEPH_DATA_BS_B_2 /var/lib/ceph/osd/ceph-2/block
++ '[' -n '' ']'
++ '[' -n '' ']'
++ (( i=10 ))
++ (( i>=0 ))
++ ls -al /dev/disk
total 0
drwxr-xr-x  4 root root   80 Jun 28 20:20 .
drwxr-xr-x 19 root root 3220 Jun 28 20:20 ..
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-partuuid
drwxr-xr-x  2 root root  120 Jun 28 20:20 by-uuid
++ sleep 1
++ echo 'sleep 1s'
sleep 1s
++ (( i=i-1  ))
++ (( i>=0 ))
++ ls -al /dev/disk
total 0
drwxr-xr-x  4 root root   80 Jun 28 20:20 .
drwxr-xr-x 19 root root 3220 Jun 28 20:20 ..
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-partuuid
drwxr-xr-x  2 root root  120 Jun 28 20:20 by-uuid
++ sleep 1
++ echo 'sleep 1s'
sleep 1s
++ (( i=i-1  ))
++ (( i>=0 ))
++ ls -al /dev/disk
total 0
drwxr-xr-x  4 root root   80 Jun 28 20:20 .
drwxr-xr-x 19 root root 3220 Jun 28 20:20 ..
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-partuuid
drwxr-xr-x  2 root root  120 Jun 28 20:20 by-uuid
++ sleep 1
++ echo 'sleep 1s'
sleep 1s
++ (( i=i-1  ))
++ (( i>=0 ))
++ ls -al /dev/disk
total 0
drwxr-xr-x  5 root root  100 Jun 28 20:20 .
drwxr-xr-x 19 root root 3220 Jun 28 20:20 ..
drwxr-xr-x  2 root root   60 Jun 28 20:20 by-partlabel
drwxr-xr-x  2 root root   80 Jun 28 20:20 by-partuuid
drwxr-xr-x  2 root root  120 Jun 28 20:20 by-uuid

```

Revision history for this message

wangwei (wangwei-david) wrote on 2018-06-29:

Hi Tone,

I only have xen's virtual machine, no other virtual machine, so I didn't test it on the others.

Revision history for this message

Tone Zhang (tone.zhang) wrote on 2018-06-29:

Hi Wei,

I have tried create partition then access it with partlable in my test bed with KVM/Qemu and xen. I tested the operation with debian, ubuntu and centos. I cannot reproduce the issue you saw.

I created VM on top of bare metal looks more reasonable, not vm-in-vm.

So could you please run your test with KVM/Qemu and observe the result? In fact, it is very hard to understand the delay, because from kernel point of view, there is no difference. Regarding it happens in virtual machine, the VM (even VM-in-VM) performance will be affected by host and guest.

Thanks!

Revision history for this message

Tone Zhang (tone.zhang) wrote on 2018-06-29:

Hi Wei,

As I mentioned that I have tested the partition operation in my test bed with KVM/Qemu and xen, and I can not reproduce the issue you saw.

I suggest you to re-run the test with other VMs.

Thanks a lot!

Revision history for this message

wangwei (wangwei-david) wrote on 2018-06-29:

Hi Tone,

I tested the partlabel in the virtual machine can be generated immediately, but running docker on a virtual machine, the above phenomenon will appear.

Because my virtual machine is our company's cloud virtual machine, our workflow is to verify the deployment on the virtual machine first, and then deploy the cluster on the physical machine to verify the performance. I think this scenario should be very common, so I think we should use the most common deployment method. As you said, ceph supports partuuid and partlabel, but partuuid is a more common one.

I'm glad you have implemented partlabel's bluestore, but the process is somewhat different from that of the filestore, I think the filestore implementation is better, with less code and easier to read.

So I've made some optimizations based on your implementation,my patch is more similar to the way filestore handles disk information, and I refer to the initialization process of ceph-disk, I hope you can review it.

Thanks very much!^^

Revision history for this message

Tone Zhang (tone.zhang) wrote on 2018-06-29:

#10

Hi Wei,

Thanks for your information.

I think the first point is that why there is delay in you test environment. I have tested in several cases, but I cannot reproduce the issue in my company's lab (with different bare metal, VM, distro and different servers). And I believe you and I are not the only persons who test kolla ceph bluestore OSD.

I agree with you that test Kolla with VM is the common way, but there are several kinds of virtual machine projects, and the configuration of host and VM is the same significant. We cannot judge the fault only according to the special environment.

If the issue depends on special distro, HW and condition/configuration, it is better to validate with different environment and collect more information.

Thanks.

wangwei (wangwei-david) on 2018-07-04

description:

updated

Revision history for this message

wangwei (wangwei-david) wrote on 2018-07-04:

#11

Hi Tone,

Thank you for your feedback, I tested it on vmware's virtual machine and found no partlabel delay.
This issue may be related to the kernel of the cloud vm I am using, so I agree with you, we continue to use partlabel, and if there are other people who have encountered this problem, we can discuss it again.

On the basis of continuing to use partlabel, I made some optimizations for the bluestore deployment process. For the second point to the sixth point of the bug description, please review it.

Thank you very much!

Revision history for this message

Tone Zhang (tone.zhang) wrote on 2018-07-05:

#12

Hi Wei,

I appreciate for your test and for your comments. According to your feedback, I think we can close point 1. Correct?

For point 2, I think it is not functional fault. The current version of Kolla and Kolla-ansible can handle filestore and bluestore well, correct?

For point 3, regarding the label name has been defined in spec clearly, and the spec has been released several months before. The code is align with spec. The label name does not introduce any faults, doesn't it?

For point 4, I think you plan to update document. In fact, kolla ceph supports more than three deployment manners. I have some concern with the case 3. Deploy one bluestore OSD with one device is meanful, but format the device with 4 partitions are meaningless. In Ceph document (http://docs.ceph.com/docs/master), block.wal and block.db should be faster than primary device. Allocate block, block.wal and block.db within the same device is bad solution and it impacts on performance negatively. I have tested it. So kolla ceph should not support case 3 you mentioned.

For point 5, could you please share me the error information? I run the command in my test bed, I did not find the error information. Thanks in advance.

Thanks!

Revision history for this message

wangwei (wangwei-david) wrote on 2018-07-05:

#13

Hi Tone,

For point 1, we can close it, right.

For point 2, working well doesn't mean good code. I think the charm of the open source community is here. Good code can make it easier for others to read, easier to understand, and more convenient to maintain, isn't it?

For point 3, Can't the things released be changed? Where is the meaning of open source? Now the label logic is obviously more cumbersome to process, so why not use the previous logic?

For point 4, I am not just updating the documentation, I am re-defining the disk information of the three deployment methods in the logic of find_disks.py. I know the meaning of each partition of ceph. The third way is to provide a more free way to deploy ceph for users. It can be used for testing. It can also be used to customize the deployment that users want. I know that four partitions are deployed on a disk is meaningless, but users can test the deployment of the kolla with fewer disks, similar to deploying an osd with 4 loop devices, isn't it?

For point 5, you test no problem because you have not changed this place to bluestore:

```
OSD_INITIAL_WEIGHT=$(parted --script ${OSD_PARTITION} unit TB print | awk 'match($0, /^Disk.* (.*)TB/, a){printf("%.2f", a[1])}')
```
The partition information here should be block partitioned in the bluestore, isn't it?

Please take a look at my code flow and then compare it.

Thanks!

Revision history for this message

wangwei (wangwei-david) wrote on 2018-07-05:

#14

Hi Tone,

I explain why I made these changes now, because you just modified the documentation a few months ago, I am not sure what your modification idea is, until I see your code, I think this way is not good, so I made these comments when reviewing your code, but you didn't accept it. So now I have made these changes in the way I think it is better.

Revision history for this message

wangwei (wangwei-david) wrote on 2018-07-06:

#15

Download full text (3.6 KiB)

Hi Tone,
Since you don't understand why I said that your implementation is not good, then I will tell you carefully:

1. in find_disks.py

1) First of all, this script is not only used by ceph, but also by swift. Later, there will be other components that will be used. So we want to ensure that each component can be used and isolate its logic as much as possible.
But you added a lot of logic about bluestore in the main function, although there is no problem in function, but it is easy to cause misunderstanding of others. If the user does not know "bluestore", he will not know what "_BS" is.

2) I think your code for the bluestore disk is too redundant, and each partition has to execute "extract_disk_info_bs" four times.

3) And if the label is "KOLLA_CEPH_OSD_BOOTSTRAP_BS", your code will only recognize the last one, not every one. This is the first deployment method I mentioned above, and your code does not support it.

4) In the final result, you returned all the disk information, even if they are empty, I don't think this problem can be ignored, why the bluestore disk information also includes the filestore disk information, ansible support to pass null variables, since we can solve this problem in kolla-ansible, why should we pass these useless variables in the kolla?

2.in extend_start.sh

1) About ceph type code
Bluestore has four type code, but you only listed three, and set the osd type code to the block partition.
https://github.com/ceph/ceph/blob/luminous/udev/95-ceph-osd.rules

2) We all know that journal is the partition of the filestore, so why use USE_EXTERNAL_JOURNAL to determine the partition of the bluestore?
And in your code, as long as it is bluestore, this variable is false, so what is the significance of this variable?

3)You add a lot of logic to determine if it is a loop device, so why not use disk partition variables directly?

4) Finally, about calculating the weight of osd, in the filestore, the size of the osd partition is calculated. In the bluestore, the size of the block partition is calculated, but you have not noticed this problem.

3. in bootstrap_osds.yml

I don't understand why you are passing this parameter. Are you useful to this?

```
OSD_BS_LABEL: "{{ item.1.partition_label | default('') }}"
OSD_BS_BLK_LABEL: "{{ item.1.bs_blk_label | default('') }}"
OSD_BS_WAL_LABEL: "{{ item.1.bs_wal_label | default('') }}"
OSD_BS_DB_LABEL: "{{ item.1.bs_db_label | default('') }}"
```
4. in start_osds.yml
Same problem as above:

```
OSD_BS_FSUUID: "{{ item.1['fs_uuid'] }}"
```

5. in ceph-guide.rst

Your description says that if there are multiple osd on the same node, then the user should add the suffix. I think we should support the case where all the disks on the same node have the label "KOLLA_CEPH_OSD_BOOTSTRAP_BS", which can reduce the trouble of initializing a lot of tags.

Other bug subscribers

Related blueprints

Allow Kolla Ceph to deploy bluestore OSDs

Remote bug watches

Bug watches keep track of this bug in other bug trackers.