Bug #1589309 “Problem Bootstrapping ceph from Partitions due to ...” : Bugs : kolla

Revision history for this message

Steve Hindle (shindle) wrote on 2016-06-06:

#1

just add the 'quote-tag' to the play Edit (452 bytes, text/plain)

The following patches _seem_ to fix the problem for me - I need to rebuild the cluster from scratch and do a 'bare metal' test.

Blah - have to attach 1 patch at a time :-/

Revision history for this message

Steve Hindle (shindle) wrote on 2016-06-06:

#2

fixup find_disks.py to use 'current' partition names Edit (1.5 KiB, text/plain)

This patch changes find_disks.py to shell out to sgdisk to get the 'real' partition names...
note, this requires ROOT access

Revision history for this message

Steve Hindle (shindle) wrote on 2016-06-06:

#3

Dockerfile fixups Edit (589 bytes, text/plain)

This patch adds 'sgdisk' to the kolla-toolbox image, and changes it to run as root.

Revision history for this message

Steve Hindle (shindle) wrote on 2016-06-06:

#4

Note: the patch in comment #1 (https://bugs.launchpad.net/kolla/+bug/1589309/comments/1) is NOT required. I guess ansible screaming at me to wrap stuff in quote-tags is just to annoy me :-P

Revision history for this message

Jeffrey Zhang (jeffrey4l) wrote on 2016-06-08:

#5

I have no issue for the kolla ceph( at least in the mitaka branch, iirc, we never change
the master branch for this ceph).

So could u show me how to reproduce this?

Revision history for this message

Swapnil Kulkarni (coolsvap-deactivatedaccount) wrote on 2016-06-08:

#6

Please provide steps and available log if possible.

Revision history for this message

Paul Bourke (pauldbourke) wrote on 2016-06-09:

#7

Hi Steve, can you post the following info:

* Host OS and version
* Version of udev

I have also seen issues around partition label detection possibly relating to outdated versions of udev on older distros (https://bugs.launchpad.net/kolla/+bug/1585185). Potentially the udev method of accessing disks is unreliable, though yours is the first other complaint I've seen which is why I'm hoping to get more info on the kinds of setups it's not working on.

Revision history for this message

Steve Hindle (shindle) wrote on 2016-06-09: Re: [Bug 1589309] Re: Problem Bootstrapping ceph from Partitions due to stale part. table

#8

Download full text (4.3 KiB)

Hi Paul,

  It occurs with fresh builds of ubuntu 14.04:
root@c1n1:/home/kolla/kolla/src# dpkg -l | grep udev
ii libudev1:amd64 204-5ubuntu20.19
  amd64 libudev shared library
ii udev 204-5ubuntu20.19
  amd64 /dev/ and hotplug management daemon
root@c1n1:/home/kolla/kolla/src# more /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.4 LTS"

On Thu, Jun 9, 2016 at 3:37 AM, Paul Bourke <email address hidden> wrote:
> Hi Steve, can you post the following info:
>
> * Host OS and version
> * Version of udev
>
> I have also seen issues around partition label detection possibly
> relating to outdated versions of udev on older distros
> (https://bugs.launchpad.net/kolla/+bug/1585185). Potentially the udev
> method of accessing disks is unreliable, though yours is the first other
> complaint I've seen which is why I'm hoping to get more info on the
> kinds of setups it's not working on.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1589309
>
> Title:
> Problem Bootstrapping ceph from Partitions due to stale part. table
>
> Status in kolla:
> New
>
> Bug description:
>
> It appears kolla-ceph is having problems bootstrapping partitions due to stale kernel/sys info.
> The bootstrap process looks for 'magic names' in the partition table AND CHANGES THEM. This seems
> to work fine. However, the next phase (start_osds.yml) looks for the NEW partition names, and gets the old names from /sys. This causes it to fail startup.
>
> As a potential workaround, I'm going to try modifying is_dev_matched_by_name in docker/kolla-toolbox/find_disks.py to shell out to sgdisk to read partition names. also, it appears https://github.com/openstack/kolla/blob/stable/mitaka/ansible/roles/ceph/tasks/start_osds.yml#L21
> should be with_items: "{{ osds }}" (needs the wrapping quote-tag )
>
>
> The output below shows a 're-run' where ceph had been deployed once, and the partition names were manually reset for the next run. Notice that since the box was NOT rebooted, the /sys info is stale:
> stack@c1n7:~$ sudo sgdisk -p /dev/sda
> Disk /dev/sda: 250069680 sectors, 119.2 GiB
> Logical sector size: 512 bytes
> Disk identifier (GUID): CEA98805-36A2-4FF6-9357-5999AA267F48
> Partition table holds up to 128 entries
> First usable sector is 34, last usable sector is 250069646
> Partitions will be aligned on 1-sector boundaries
> Total free space is 16 sectors (8.0 KiB)
>
> Number Start (sector) End (sector) Size Code Name
> 1 34 1987 977.0 KiB EF02
> 2 1988 160001988 76.3 GiB EF00
> 3 160001989 170001989 4.8 GiB 8300 KOLLA_CEPH_OSD_BOOTSTRA
> 4 170001990 250069630 38.2 GiB 8300 KOLLA_CEPH_OSD_BOOTSTRA
> stack@c1n7:~$ sudo partprobe /dev/sda
> stack@c1n7:~$ python ./test_steve.py
> checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda') named
> checking Device(u'/sys/device...

Hi Paul,

It occurs with fresh builds of ubuntu 14.04:
root@c1n1:/home/kolla/kolla/src# dpkg -l | grep udev
ii  libudev1:amd64                      204-5ubuntu20.19
  amd64        libudev shared library
ii  udev                                204-5ubuntu20.19
  amd64        /dev/ and hotplug management daemon
root@c1n1:/home/kolla/kolla/src# more /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.4 LTS"

On Thu, Jun 9, 2016 at 3:37 AM, Paul Bourke <paul.bourke@oracle.com> wrote:
> Hi Steve, can you post the following info:
>
> * Host OS and version
> * Version of udev
>
> I have also seen issues around partition label detection possibly
> relating to outdated versions of udev on older distros
> (https://bugs.launchpad.net/kolla/+bug/1585185). Potentially the udev
> method of accessing disks is unreliable, though yours is the first other
> complaint I've seen which is why I'm hoping to get more info on the
> kinds of setups it's not working on.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1589309
>
> Title:
>   Problem Bootstrapping ceph from Partitions due to stale part. table
>
> Status in kolla:
>   New
>
> Bug description:
>
>   It appears kolla-ceph is having problems bootstrapping partitions due to stale kernel/sys info.
>   The bootstrap process looks for 'magic names' in the partition table AND CHANGES THEM.  This seems
>   to work fine.  However, the next phase (start_osds.yml) looks for the NEW partition names, and gets the old names from /sys.  This causes it to fail startup.
>
>   As a potential workaround, I'm going to try modifying  is_dev_matched_by_name in docker/kolla-toolbox/find_disks.py to shell out to sgdisk to read partition names. also, it appears https://github.com/openstack/kolla/blob/stable/mitaka/ansible/roles/ceph/tasks/start_osds.yml#L21
>   should be with_items: "{{ osds }}" (needs the wrapping quote-tag )
>
>
>   The output below shows a 're-run' where ceph had been deployed once, and the partition names were manually reset for the next run.  Notice that since the box was NOT rebooted, the /sys info is stale:
>   stack@c1n7:~$ sudo sgdisk -p /dev/sda
>   Disk /dev/sda: 250069680 sectors, 119.2 GiB
>   Logical sector size: 512 bytes
>   Disk identifier (GUID): CEA98805-36A2-4FF6-9357-5999AA267F48
>   Partition table holds up to 128 entries
>   First usable sector is 34, last usable sector is 250069646
>   Partitions will be aligned on 1-sector boundaries
>   Total free space is 16 sectors (8.0 KiB)
>
>   Number  Start (sector)    End (sector)  Size       Code  Name
>      1              34            1987   977.0 KiB   EF02
>      2            1988       160001988   76.3 GiB    EF00
>      3       160001989       170001989   4.8 GiB     8300  KOLLA_CEPH_OSD_BOOTSTRA
>      4       170001990       250069630   38.2 GiB    8300  KOLLA_CEPH_OSD_BOOTSTRA
>   stack@c1n7:~$ sudo partprobe /dev/sda
>   stack@c1n7:~$ python ./test_steve.py
>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda') named
>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda1') named
>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda2') named
>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda3') named KOLLA_CEPH_DATA_0_J
>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda4') named KOLLA_CEPH_DATA_2
>   checking Device(u'/sys/devices/virtual/block/loop0') named
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/kolla/+bug/1589309/+subscriptions

-- 
Stephen Hindle - Senior Systems Engineer
480.807.8189 480.807.8189
www.limelight.com Delivering Faster Better

Join the conversation

at Limelight Connect

-- 
The information in this message may be confidential.  It is intended solely 
for
the addressee(s).  If you are not the intended recipient, any disclosure,
copying or distribution of the message, or any action or omission taken by 
you
in reliance on it, is prohibited and may be unlawful.  Please immediately
contact the sender if you have received this message in error.

Revision history for this message

Jeffrey Zhang (jeffrey4l) wrote on 2016-06-11:

#9

Is anyone using ubuntu as host OS? Have u ever see this kind of issue?
I am using CentOS 7

$ rpm -qa | grep udev
python-pyudev-0.15-7.el7_2.1.noarch
libgudev1-219-19.el7_2.9.x86_64

I never hit such issue.

Revision history for this message

Steve Hindle (shindle) wrote on 2016-06-11:

#10

Download full text (4.1 KiB)

Yes, I use ubuntu as the host OS. You will see the issue if you try
to deploy kolla 2 times without rebooting the nodes - deploy,
cleanup-containers, cleanup-images, cleanup-host, RESET the partition
names, and deploy again...

another thing you can try is running sgdisk and changing the partition
names - then reading them back from /dev/disk/by-partlabel/ - notice
it shows the stale partition names? run partprobe on the device - now
look at /dev/disk/by-partlable again - still stale...

On Fri, Jun 10, 2016 at 5:01 PM, Jeffrey Zhang
<email address hidden> wrote:
> Is anyone using ubuntu as host OS? Have u ever see this kind of issue?
> I am using CentOS 7
>
> $ rpm -qa | grep udev
> python-pyudev-0.15-7.el7_2.1.noarch
> libgudev1-219-19.el7_2.9.x86_64
>
> I never hit such issue.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1589309
>
> Title:
> Problem Bootstrapping ceph from Partitions due to stale part. table
>
> Status in kolla:
> New
>
> Bug description:
>
> It appears kolla-ceph is having problems bootstrapping partitions due to stale kernel/sys info.
> The bootstrap process looks for 'magic names' in the partition table AND CHANGES THEM. This seems
> to work fine. However, the next phase (start_osds.yml) looks for the NEW partition names, and gets the old names from /sys. This causes it to fail startup.
>
> As a potential workaround, I'm going to try modifying is_dev_matched_by_name in docker/kolla-toolbox/find_disks.py to shell out to sgdisk to read partition names. also, it appears https://github.com/openstack/kolla/blob/stable/mitaka/ansible/roles/ceph/tasks/start_osds.yml#L21
> should be with_items: "{{ osds }}" (needs the wrapping quote-tag )
>
>
> The output below shows a 're-run' where ceph had been deployed once, and the partition names were manually reset for the next run. Notice that since the box was NOT rebooted, the /sys info is stale:
> stack@c1n7:~$ sudo sgdisk -p /dev/sda
> Disk /dev/sda: 250069680 sectors, 119.2 GiB
> Logical sector size: 512 bytes
> Disk identifier (GUID): CEA98805-36A2-4FF6-9357-5999AA267F48
> Partition table holds up to 128 entries
> First usable sector is 34, last usable sector is 250069646
> Partitions will be aligned on 1-sector boundaries
> Total free space is 16 sectors (8.0 KiB)
>
> Number Start (sector) End (sector) Size Code Name
> 1 34 1987 977.0 KiB EF02
> 2 1988 160001988 76.3 GiB EF00
> 3 160001989 170001989 4.8 GiB 8300 KOLLA_CEPH_OSD_BOOTSTRA
> 4 170001990 250069630 38.2 GiB 8300 KOLLA_CEPH_OSD_BOOTSTRA
> stack@c1n7:~$ sudo partprobe /dev/sda
> stack@c1n7:~$ python ./test_steve.py
> checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda') named
> checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda1') named
> checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda2') named
> checking Device(u'/sys/dev...

Yes, I use ubuntu as the host OS.  You will see the issue if you try
to deploy kolla 2 times without rebooting the nodes - deploy,
cleanup-containers, cleanup-images, cleanup-host, RESET the partition
names, and deploy again...

another thing you can try is running sgdisk and changing the partition
names - then reading them back from /dev/disk/by-partlabel/ - notice
it shows the stale partition names?  run partprobe on the device - now
look at /dev/disk/by-partlable again - still stale...

On Fri, Jun 10, 2016 at 5:01 PM, Jeffrey Zhang
<1589309@bugs.launchpad.net> wrote:
> Is anyone using ubuntu as host OS? Have u ever see this kind of issue?
> I am using CentOS 7
>
> $ rpm -qa | grep udev
> python-pyudev-0.15-7.el7_2.1.noarch
> libgudev1-219-19.el7_2.9.x86_64
>
> I never hit such issue.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1589309
>
> Title:
>   Problem Bootstrapping ceph from Partitions due to stale part. table
>
> Status in kolla:
>   New
>
> Bug description:
>
>   It appears kolla-ceph is having problems bootstrapping partitions due to stale kernel/sys info.
>   The bootstrap process looks for 'magic names' in the partition table AND CHANGES THEM.  This seems
>   to work fine.  However, the next phase (start_osds.yml) looks for the NEW partition names, and gets the old names from /sys.  This causes it to fail startup.
>
>   As a potential workaround, I'm going to try modifying  is_dev_matched_by_name in docker/kolla-toolbox/find_disks.py to shell out to sgdisk to read partition names. also, it appears https://github.com/openstack/kolla/blob/stable/mitaka/ansible/roles/ceph/tasks/start_osds.yml#L21
>   should be with_items: "{{ osds }}" (needs the wrapping quote-tag )
>
>
>   The output below shows a 're-run' where ceph had been deployed once, and the partition names were manually reset for the next run.  Notice that since the box was NOT rebooted, the /sys info is stale:
>   stack@c1n7:~$ sudo sgdisk -p /dev/sda
>   Disk /dev/sda: 250069680 sectors, 119.2 GiB
>   Logical sector size: 512 bytes
>   Disk identifier (GUID): CEA98805-36A2-4FF6-9357-5999AA267F48
>   Partition table holds up to 128 entries
>   First usable sector is 34, last usable sector is 250069646
>   Partitions will be aligned on 1-sector boundaries
>   Total free space is 16 sectors (8.0 KiB)
>
>   Number  Start (sector)    End (sector)  Size       Code  Name
>      1              34            1987   977.0 KiB   EF02
>      2            1988       160001988   76.3 GiB    EF00
>      3       160001989       170001989   4.8 GiB     8300  KOLLA_CEPH_OSD_BOOTSTRA
>      4       170001990       250069630   38.2 GiB    8300  KOLLA_CEPH_OSD_BOOTSTRA
>   stack@c1n7:~$ sudo partprobe /dev/sda
>   stack@c1n7:~$ python ./test_steve.py
>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda') named
>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda1') named
>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda2') named
>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda3') named KOLLA_CEPH_DATA_0_J
>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda4') named KOLLA_CEPH_DATA_2
>   checking Device(u'/sys/devices/virtual/block/loop0') named
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/kolla/+bug/1589309/+subscriptions

-- 
Stephen Hindle - Senior Systems Engineer
480.807.8189 480.807.8189
www.limelight.com Delivering Faster Better

Join the conversation

at Limelight Connect

-- 
The information in this message may be confidential.  It is intended solely 
for
the addressee(s).  If you are not the intended recipient, any disclosure,
copying or distribution of the message, or any action or omission taken by 
you
in reliance on it, is prohibited and may be unlawful.  Please immediately
contact the sender if you have received this message in error.

Revision history for this message

Jeffrey Zhang (jeffrey4l) wrote on 2016-06-11:

#11

see my test using sgdisk: I can not reproduce this.

root@ubuntu:~# sgdisk -p /dev/vdb
Creating new GPT entries.
Disk /dev/vdb: 41943040 sectors, 20.0 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): AB2B3575-58E9-4696-9276-DFD92A9FC2BF
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 41943006
Partitions will be aligned on 2048-sector boundaries
Total free space is 41942973 sectors (20.0 GiB)

Number Start (sector) End (sector) Size Code Name
root@ubuntu:~# sgdisk -n1:2048:204800 -c1:KOLLA_BOOTSTRAP /dev/vdb
Creating new GPT entries.
The operation has completed successfully.
root@ubuntu:~# ls -alh /dev/disk/by-partlabel/
total 0
drwxr-xr-x 2 root root 60 Jun 11 10:00 .
drwxr-xr-x 6 root root 120 Jun 11 10:00 ..
lrwxrwxrwx 1 root root 10 Jun 11 10:00 KOLLA_BOOTSTRAP -> ../../vdb1

root@ubuntu:~# sgdisk -Z /dev/vdb
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
root@ubuntu:~# ls -alh /dev/disk/by-partlabel/
ls: cannot access /dev/disk/by-partlabel/: No such file or directory

root@ubuntu:~# sgdisk -n1:20480:204800 -c1:KOLLA_BOOTSTRAP_NEW /dev/vdb
Creating new GPT entries.
The operation has completed successfully.
root@ubuntu:~# sgdisk -p /dev/vdb
Disk /dev/vdb: 41943040 sectors, 20.0 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 26B03061-E3E4-436F-A463-8A37D7BAF939
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 41943006
Partitions will be aligned on 2048-sector boundaries
Total free space is 41758652 sectors (19.9 GiB)

Number Start (sector) End (sector) Size Code Name
1 20480 204800 90.0 MiB 8300 KOLLA_BOOTSTRAP_NEW
root@ubuntu:~# ls -alh /dev/disk/by-partlabel/
total 0
drwxr-xr-x 2 root root 60 Jun 11 10:01 .
drwxr-xr-x 6 root root 120 Jun 11 10:01 ..
lrwxrwxrwx 1 root root 10 Jun 11 10:01 KOLLA_BOOTSTRAP_NEW -> ../../vdb1

Revision history for this message

Jeffrey Zhang (jeffrey4l) wrote on 2016-06-11:

#12

btw, the test env is

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.2 LTS
Release: 14.04
Codename: trusty

Revision history for this message

Steve Hindle (shindle) wrote on 2016-06-11:

#13

Download full text (3.8 KiB)

Umm - is /dev/sdb actually in use? I didn't see the kernel whining
that it couldn't reload the partition table and changes would take
effect on next boot, etc etc

This problem manifests in that situation (eg when the kernel refuses
to re-read the partition table because the device is in use or
whatever)

On Sat, Jun 11, 2016 at 1:04 AM, Jeffrey Zhang
<email address hidden> wrote:
> btw, the test env is
>
> $ lsb_release -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description: Ubuntu 14.04.2 LTS
> Release: 14.04
> Codename: trusty
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1589309
>
> Title:
> Problem Bootstrapping ceph from Partitions due to stale part. table
>
> Status in kolla:
> New
>
> Bug description:
>
> It appears kolla-ceph is having problems bootstrapping partitions due to stale kernel/sys info.
> The bootstrap process looks for 'magic names' in the partition table AND CHANGES THEM. This seems
> to work fine. However, the next phase (start_osds.yml) looks for the NEW partition names, and gets the old names from /sys. This causes it to fail startup.
>
> As a potential workaround, I'm going to try modifying is_dev_matched_by_name in docker/kolla-toolbox/find_disks.py to shell out to sgdisk to read partition names. also, it appears https://github.com/openstack/kolla/blob/stable/mitaka/ansible/roles/ceph/tasks/start_osds.yml#L21
> should be with_items: "{{ osds }}" (needs the wrapping quote-tag )
>
>
> The output below shows a 're-run' where ceph had been deployed once, and the partition names were manually reset for the next run. Notice that since the box was NOT rebooted, the /sys info is stale:
> stack@c1n7:~$ sudo sgdisk -p /dev/sda
> Disk /dev/sda: 250069680 sectors, 119.2 GiB
> Logical sector size: 512 bytes
> Disk identifier (GUID): CEA98805-36A2-4FF6-9357-5999AA267F48
> Partition table holds up to 128 entries
> First usable sector is 34, last usable sector is 250069646
> Partitions will be aligned on 1-sector boundaries
> Total free space is 16 sectors (8.0 KiB)
>
> Number Start (sector) End (sector) Size Code Name
> 1 34 1987 977.0 KiB EF02
> 2 1988 160001988 76.3 GiB EF00
> 3 160001989 170001989 4.8 GiB 8300 KOLLA_CEPH_OSD_BOOTSTRA
> 4 170001990 250069630 38.2 GiB 8300 KOLLA_CEPH_OSD_BOOTSTRA
> stack@c1n7:~$ sudo partprobe /dev/sda
> stack@c1n7:~$ python ./test_steve.py
> checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda') named
> checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda1') named
> checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda2') named
> checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda3') named KOLLA_CEPH_DATA_0_J
> checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda4') named KOLLA...

Umm - is /dev/sdb actually in use?  I didn't see the kernel whining
that it couldn't reload the partition table and changes would take
effect on next boot, etc etc

This problem manifests in that situation (eg when the kernel refuses
to re-read the partition table because the device is in use or
whatever)

On Sat, Jun 11, 2016 at 1:04 AM, Jeffrey Zhang
<1589309@bugs.launchpad.net> wrote:
> btw, the test env is
>
> $ lsb_release  -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description:    Ubuntu 14.04.2 LTS
> Release:        14.04
> Codename:       trusty
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1589309
>
> Title:
>   Problem Bootstrapping ceph from Partitions due to stale part. table
>
> Status in kolla:
>   New
>
> Bug description:
>
>   It appears kolla-ceph is having problems bootstrapping partitions due to stale kernel/sys info.
>   The bootstrap process looks for 'magic names' in the partition table AND CHANGES THEM.  This seems
>   to work fine.  However, the next phase (start_osds.yml) looks for the NEW partition names, and gets the old names from /sys.  This causes it to fail startup.
>
>   As a potential workaround, I'm going to try modifying  is_dev_matched_by_name in docker/kolla-toolbox/find_disks.py to shell out to sgdisk to read partition names. also, it appears https://github.com/openstack/kolla/blob/stable/mitaka/ansible/roles/ceph/tasks/start_osds.yml#L21
>   should be with_items: "{{ osds }}" (needs the wrapping quote-tag )
>
>
>   The output below shows a 're-run' where ceph had been deployed once, and the partition names were manually reset for the next run.  Notice that since the box was NOT rebooted, the /sys info is stale:
>   stack@c1n7:~$ sudo sgdisk -p /dev/sda
>   Disk /dev/sda: 250069680 sectors, 119.2 GiB
>   Logical sector size: 512 bytes
>   Disk identifier (GUID): CEA98805-36A2-4FF6-9357-5999AA267F48
>   Partition table holds up to 128 entries
>   First usable sector is 34, last usable sector is 250069646
>   Partitions will be aligned on 1-sector boundaries
>   Total free space is 16 sectors (8.0 KiB)
>
>   Number  Start (sector)    End (sector)  Size       Code  Name
>      1              34            1987   977.0 KiB   EF02
>      2            1988       160001988   76.3 GiB    EF00
>      3       160001989       170001989   4.8 GiB     8300  KOLLA_CEPH_OSD_BOOTSTRA
>      4       170001990       250069630   38.2 GiB    8300  KOLLA_CEPH_OSD_BOOTSTRA
>   stack@c1n7:~$ sudo partprobe /dev/sda
>   stack@c1n7:~$ python ./test_steve.py
>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda') named
>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda1') named
>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda2') named
>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda3') named KOLLA_CEPH_DATA_0_J
>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda4') named KOLLA_CEPH_DATA_2
>   checking Device(u'/sys/devices/virtual/block/loop0') named
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/kolla/+bug/1589309/+subscriptions

-- 
Stephen Hindle - Senior Systems Engineer
480.807.8189 480.807.8189
www.limelight.com Delivering Faster Better

Join the conversation

at Limelight Connect

-- 
The information in this message may be confidential.  It is intended solely 
for
the addressee(s).  If you are not the intended recipient, any disclosure,
copying or distribution of the message, or any action or omission taken by 
you
in reliance on it, is prohibited and may be unlawful.  Please immediately
contact the sender if you have received this message in error.

Revision history for this message

Steve Hindle (shindle) wrote on 2016-06-11:

#14

Download full text (5.4 KiB)

Here's an example:
root@c1n7:~# sgdisk -i 3 /dev/sda
Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
Partition unique GUID: BCBA3F76-9E2A-4BE0-9CF8-1B05BCCCA9E3
First sector: 160001989 (at 76.3 GiB)
Last sector: 170001989 (at 81.1 GiB)
Partition size: 10000001 sectors (4.8 GiB)
Attribute flags: 0000000000000000
Partition name: 'KOLLA_CEPH_OSD_BOOTSTRAP_1_J'
root@c1n7:~# sgdisk -c 3:FOO /dev/sda
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
root@c1n7:~# sgdisk -i 3 /dev/sda
Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
Partition unique GUID: BCBA3F76-9E2A-4BE0-9CF8-1B05BCCCA9E3
First sector: 160001989 (at 76.3 GiB)
Last sector: 170001989 (at 81.1 GiB)
Partition size: 10000001 sectors (4.8 GiB)
Attribute flags: 0000000000000000
Partition name: 'FOO'
root@c1n7:~# s -alh /dev/disk/by-partlabel/
s: command not found
root@c1n7:~# ls -alh /dev/disk/by-partlabel/
total 0
drwxr-xr-x 2 root root 80 Jun 11 03:44 .
drwxr-xr-x 6 root root 120 Jun 10 02:52 ..
lrwxrwxrwx 1 root root 10 Jun 11 03:44 KOLLA_CEPH_DATA_0_J -> ../../sda3
lrwxrwxrwx 1 root root 10 Jun 11 02:39 KOLLA_CEPH_OSD_BOOTSTRAP_1 -> ../../sda4
root@c1n7:~#

On Sat, Jun 11, 2016 at 3:47 AM, Stephen Hindle <email address hidden> wrote:
> Umm - is /dev/sdb actually in use? I didn't see the kernel whining
> that it couldn't reload the partition table and changes would take
> effect on next boot, etc etc
>
> This problem manifests in that situation (eg when the kernel refuses
> to re-read the partition table because the device is in use or
> whatever)
>
>
> On Sat, Jun 11, 2016 at 1:04 AM, Jeffrey Zhang
> <email address hidden> wrote:
>> btw, the test env is
>>
>> $ lsb_release -a
>> No LSB modules are available.
>> Distributor ID: Ubuntu
>> Description: Ubuntu 14.04.2 LTS
>> Release: 14.04
>> Codename: trusty
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1589309
>>
>> Title:
>> Problem Bootstrapping ceph from Partitions due to stale part. table
>>
>> Status in kolla:
>> New
>>
>> Bug description:
>>
>> It appears kolla-ceph is having problems bootstrapping partitions due to stale kernel/sys info.
>> The bootstrap process looks for 'magic names' in the partition table AND CHANGES THEM. This seems
>> to work fine. However, the next phase (start_osds.yml) looks for the NEW partition names, and gets the old names from /sys. This causes it to fail startup.
>>
>> As a potential workaround, I'm going to try modifying is_dev_matched_by_name in docker/kolla-toolbox/find_disks.py to shell out to sgdisk to read partition names. also, it appears https://github.com/openstack/kolla/blob/stable/mitaka/ansible/roles/ceph/tasks/start_osds.yml#L21
>> should be with_items: "{{ osds }}" (needs the wrapping quote-tag )
>>
>>
>> The output below shows a 're-run' where ceph had been deployed once, and the partition names were manually reset for the next run. Notice that since the box was NOT rebooted, the /sys info is stale:
>> stack@c1n7:~$ sudo...

Here's an example:
root@c1n7:~# sgdisk -i 3 /dev/sda
Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
Partition unique GUID: BCBA3F76-9E2A-4BE0-9CF8-1B05BCCCA9E3
First sector: 160001989 (at 76.3 GiB)
Last sector: 170001989 (at 81.1 GiB)
Partition size: 10000001 sectors (4.8 GiB)
Attribute flags: 0000000000000000
Partition name: 'KOLLA_CEPH_OSD_BOOTSTRAP_1_J'
root@c1n7:~# sgdisk -c 3:FOO /dev/sda
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
root@c1n7:~# sgdisk -i 3 /dev/sda
Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
Partition unique GUID: BCBA3F76-9E2A-4BE0-9CF8-1B05BCCCA9E3
First sector: 160001989 (at 76.3 GiB)
Last sector: 170001989 (at 81.1 GiB)
Partition size: 10000001 sectors (4.8 GiB)
Attribute flags: 0000000000000000
Partition name: 'FOO'
root@c1n7:~# s -alh /dev/disk/by-partlabel/
s: command not found
root@c1n7:~# ls -alh /dev/disk/by-partlabel/
total 0
drwxr-xr-x 2 root root  80 Jun 11 03:44 .
drwxr-xr-x 6 root root 120 Jun 10 02:52 ..
lrwxrwxrwx 1 root root  10 Jun 11 03:44 KOLLA_CEPH_DATA_0_J -> ../../sda3
lrwxrwxrwx 1 root root  10 Jun 11 02:39 KOLLA_CEPH_OSD_BOOTSTRAP_1 -> ../../sda4
root@c1n7:~#

On Sat, Jun 11, 2016 at 3:47 AM, Stephen Hindle <shindle@llnw.com> wrote:
> Umm - is /dev/sdb actually in use?  I didn't see the kernel whining
> that it couldn't reload the partition table and changes would take
> effect on next boot, etc etc
>
> This problem manifests in that situation (eg when the kernel refuses
> to re-read the partition table because the device is in use or
> whatever)
>
>
> On Sat, Jun 11, 2016 at 1:04 AM, Jeffrey Zhang
> <1589309@bugs.launchpad.net> wrote:
>> btw, the test env is
>>
>> $ lsb_release  -a
>> No LSB modules are available.
>> Distributor ID: Ubuntu
>> Description:    Ubuntu 14.04.2 LTS
>> Release:        14.04
>> Codename:       trusty
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1589309
>>
>> Title:
>>   Problem Bootstrapping ceph from Partitions due to stale part. table
>>
>> Status in kolla:
>>   New
>>
>> Bug description:
>>
>>   It appears kolla-ceph is having problems bootstrapping partitions due to stale kernel/sys info.
>>   The bootstrap process looks for 'magic names' in the partition table AND CHANGES THEM.  This seems
>>   to work fine.  However, the next phase (start_osds.yml) looks for the NEW partition names, and gets the old names from /sys.  This causes it to fail startup.
>>
>>   As a potential workaround, I'm going to try modifying  is_dev_matched_by_name in docker/kolla-toolbox/find_disks.py to shell out to sgdisk to read partition names. also, it appears https://github.com/openstack/kolla/blob/stable/mitaka/ansible/roles/ceph/tasks/start_osds.yml#L21
>>   should be with_items: "{{ osds }}" (needs the wrapping quote-tag )
>>
>>
>>   The output below shows a 're-run' where ceph had been deployed once, and the partition names were manually reset for the next run.  Notice that since the box was NOT rebooted, the /sys info is stale:
>>   stack@c1n7:~$ sudo sgdisk -p /dev/sda
>>   Disk /dev/sda: 250069680 sectors, 119.2 GiB
>>   Logical sector size: 512 bytes
>>   Disk identifier (GUID): CEA98805-36A2-4FF6-9357-5999AA267F48
>>   Partition table holds up to 128 entries
>>   First usable sector is 34, last usable sector is 250069646
>>   Partitions will be aligned on 1-sector boundaries
>>   Total free space is 16 sectors (8.0 KiB)
>>
>>   Number  Start (sector)    End (sector)  Size       Code  Name
>>      1              34            1987   977.0 KiB   EF02
>>      2            1988       160001988   76.3 GiB    EF00
>>      3       160001989       170001989   4.8 GiB     8300  KOLLA_CEPH_OSD_BOOTSTRA
>>      4       170001990       250069630   38.2 GiB    8300  KOLLA_CEPH_OSD_BOOTSTRA
>>   stack@c1n7:~$ sudo partprobe /dev/sda
>>   stack@c1n7:~$ python ./test_steve.py
>>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda') named
>>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda1') named
>>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda2') named
>>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda3') named KOLLA_CEPH_DATA_0_J
>>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda4') named KOLLA_CEPH_DATA_2
>>   checking Device(u'/sys/devices/virtual/block/loop0') named
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/kolla/+bug/1589309/+subscriptions
>
>
>
> --
> Stephen Hindle - Senior Systems Engineer
> 480.807.8189 480.807.8189
> www.limelight.com Delivering Faster Better
>
> Join the conversation
>
> at Limelight Connect

-- 
Stephen Hindle - Senior Systems Engineer
480.807.8189 480.807.8189
www.limelight.com Delivering Faster Better

Join the conversation

at Limelight Connect

-- 
The information in this message may be confidential.  It is intended solely 
for
the addressee(s).  If you are not the intended recipient, any disclosure,
copying or distribution of the message, or any action or omission taken by 
you
in reliance on it, is prohibited and may be unlawful.  Please immediately
contact the sender if you have received this message in error.

Revision history for this message

Steve Hindle (shindle) wrote on 2016-06-11:

#15

Download full text (6.4 KiB)

btw - running partprobe doesn't fix it either:
root@c1n7:~# ls -alh /dev/disk/by-partlabel/
total 0
drwxr-xr-x 2 root root 80 Jun 11 03:44 .
drwxr-xr-x 6 root root 120 Jun 10 02:52 ..
lrwxrwxrwx 1 root root 10 Jun 11 03:44 KOLLA_CEPH_DATA_0_J -> ../../sda3
lrwxrwxrwx 1 root root 10 Jun 11 02:39 KOLLA_CEPH_OSD_BOOTSTRAP_1 -> ../../sda4
root@c1n7:~# partprobe /dev/sda
root@c1n7:~# ls -alh /dev/disk/by-partlabel/
total 0
drwxr-xr-x 2 root root 80 Jun 11 03:44 .
drwxr-xr-x 6 root root 120 Jun 10 02:52 ..
lrwxrwxrwx 1 root root 10 Jun 11 03:44 KOLLA_CEPH_DATA_0_J -> ../../sda3
lrwxrwxrwx 1 root root 10 Jun 11 02:39 KOLLA_CEPH_OSD_BOOTSTRAP_1 -> ../../sda4

On Sat, Jun 11, 2016 at 3:54 AM, Stephen Hindle <email address hidden> wrote:
> Here's an example:
> root@c1n7:~# sgdisk -i 3 /dev/sda
> Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
> Partition unique GUID: BCBA3F76-9E2A-4BE0-9CF8-1B05BCCCA9E3
> First sector: 160001989 (at 76.3 GiB)
> Last sector: 170001989 (at 81.1 GiB)
> Partition size: 10000001 sectors (4.8 GiB)
> Attribute flags: 0000000000000000
> Partition name: 'KOLLA_CEPH_OSD_BOOTSTRAP_1_J'
> root@c1n7:~# sgdisk -c 3:FOO /dev/sda
> Warning: The kernel is still using the old partition table.
> The new table will be used at the next reboot.
> The operation has completed successfully.
> root@c1n7:~# sgdisk -i 3 /dev/sda
> Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
> Partition unique GUID: BCBA3F76-9E2A-4BE0-9CF8-1B05BCCCA9E3
> First sector: 160001989 (at 76.3 GiB)
> Last sector: 170001989 (at 81.1 GiB)
> Partition size: 10000001 sectors (4.8 GiB)
> Attribute flags: 0000000000000000
> Partition name: 'FOO'
> root@c1n7:~# s -alh /dev/disk/by-partlabel/
> s: command not found
> root@c1n7:~# ls -alh /dev/disk/by-partlabel/
> total 0
> drwxr-xr-x 2 root root 80 Jun 11 03:44 .
> drwxr-xr-x 6 root root 120 Jun 10 02:52 ..
> lrwxrwxrwx 1 root root 10 Jun 11 03:44 KOLLA_CEPH_DATA_0_J -> ../../sda3
> lrwxrwxrwx 1 root root 10 Jun 11 02:39 KOLLA_CEPH_OSD_BOOTSTRAP_1 -> ../../sda4
> root@c1n7:~#
>
> On Sat, Jun 11, 2016 at 3:47 AM, Stephen Hindle <email address hidden> wrote:
>> Umm - is /dev/sdb actually in use? I didn't see the kernel whining
>> that it couldn't reload the partition table and changes would take
>> effect on next boot, etc etc
>>
>> This problem manifests in that situation (eg when the kernel refuses
>> to re-read the partition table because the device is in use or
>> whatever)
>>
>>
>> On Sat, Jun 11, 2016 at 1:04 AM, Jeffrey Zhang
>> <email address hidden> wrote:
>>> btw, the test env is
>>>
>>> $ lsb_release -a
>>> No LSB modules are available.
>>> Distributor ID: Ubuntu
>>> Description: Ubuntu 14.04.2 LTS
>>> Release: 14.04
>>> Codename: trusty
>>>
>>> --
>>> You received this bug notification because you are subscribed to the bug
>>> report.
>>> https://bugs.launchpad.net/bugs/1589309
>>>
>>> Title:
>>> Problem Bootstrapping ceph from Partitions due to stale part. table
>>>
>>> Status in kolla:
>>> New
>>>
>>> Bug description:
>>>
>>> It appears kolla-ceph is having problems bootstrapping partitions due to stale kernel/sys info.
>>> The bootstrap pr...

btw  - running partprobe doesn't fix it either:
root@c1n7:~# ls -alh /dev/disk/by-partlabel/
total 0
drwxr-xr-x 2 root root  80 Jun 11 03:44 .
drwxr-xr-x 6 root root 120 Jun 10 02:52 ..
lrwxrwxrwx 1 root root  10 Jun 11 03:44 KOLLA_CEPH_DATA_0_J -> ../../sda3
lrwxrwxrwx 1 root root  10 Jun 11 02:39 KOLLA_CEPH_OSD_BOOTSTRAP_1 -> ../../sda4
root@c1n7:~# partprobe /dev/sda
root@c1n7:~# ls -alh /dev/disk/by-partlabel/
total 0
drwxr-xr-x 2 root root  80 Jun 11 03:44 .
drwxr-xr-x 6 root root 120 Jun 10 02:52 ..
lrwxrwxrwx 1 root root  10 Jun 11 03:44 KOLLA_CEPH_DATA_0_J -> ../../sda3
lrwxrwxrwx 1 root root  10 Jun 11 02:39 KOLLA_CEPH_OSD_BOOTSTRAP_1 -> ../../sda4

On Sat, Jun 11, 2016 at 3:54 AM, Stephen Hindle <shindle@llnw.com> wrote:
> Here's an example:
> root@c1n7:~# sgdisk -i 3 /dev/sda
> Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
> Partition unique GUID: BCBA3F76-9E2A-4BE0-9CF8-1B05BCCCA9E3
> First sector: 160001989 (at 76.3 GiB)
> Last sector: 170001989 (at 81.1 GiB)
> Partition size: 10000001 sectors (4.8 GiB)
> Attribute flags: 0000000000000000
> Partition name: 'KOLLA_CEPH_OSD_BOOTSTRAP_1_J'
> root@c1n7:~# sgdisk -c 3:FOO /dev/sda
> Warning: The kernel is still using the old partition table.
> The new table will be used at the next reboot.
> The operation has completed successfully.
> root@c1n7:~# sgdisk -i 3 /dev/sda
> Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
> Partition unique GUID: BCBA3F76-9E2A-4BE0-9CF8-1B05BCCCA9E3
> First sector: 160001989 (at 76.3 GiB)
> Last sector: 170001989 (at 81.1 GiB)
> Partition size: 10000001 sectors (4.8 GiB)
> Attribute flags: 0000000000000000
> Partition name: 'FOO'
> root@c1n7:~# s -alh /dev/disk/by-partlabel/
> s: command not found
> root@c1n7:~# ls -alh /dev/disk/by-partlabel/
> total 0
> drwxr-xr-x 2 root root  80 Jun 11 03:44 .
> drwxr-xr-x 6 root root 120 Jun 10 02:52 ..
> lrwxrwxrwx 1 root root  10 Jun 11 03:44 KOLLA_CEPH_DATA_0_J -> ../../sda3
> lrwxrwxrwx 1 root root  10 Jun 11 02:39 KOLLA_CEPH_OSD_BOOTSTRAP_1 -> ../../sda4
> root@c1n7:~#
>
> On Sat, Jun 11, 2016 at 3:47 AM, Stephen Hindle <shindle@llnw.com> wrote:
>> Umm - is /dev/sdb actually in use?  I didn't see the kernel whining
>> that it couldn't reload the partition table and changes would take
>> effect on next boot, etc etc
>>
>> This problem manifests in that situation (eg when the kernel refuses
>> to re-read the partition table because the device is in use or
>> whatever)
>>
>>
>> On Sat, Jun 11, 2016 at 1:04 AM, Jeffrey Zhang
>> <1589309@bugs.launchpad.net> wrote:
>>> btw, the test env is
>>>
>>> $ lsb_release  -a
>>> No LSB modules are available.
>>> Distributor ID: Ubuntu
>>> Description:    Ubuntu 14.04.2 LTS
>>> Release:        14.04
>>> Codename:       trusty
>>>
>>> --
>>> You received this bug notification because you are subscribed to the bug
>>> report.
>>> https://bugs.launchpad.net/bugs/1589309
>>>
>>> Title:
>>>   Problem Bootstrapping ceph from Partitions due to stale part. table
>>>
>>> Status in kolla:
>>>   New
>>>
>>> Bug description:
>>>
>>>   It appears kolla-ceph is having problems bootstrapping partitions due to stale kernel/sys info.
>>>   The bootstrap process looks for 'magic names' in the partition table AND CHANGES THEM.  This seems
>>>   to work fine.  However, the next phase (start_osds.yml) looks for the NEW partition names, and gets the old names from /sys.  This causes it to fail startup.
>>>
>>>   As a potential workaround, I'm going to try modifying  is_dev_matched_by_name in docker/kolla-toolbox/find_disks.py to shell out to sgdisk to read partition names. also, it appears https://github.com/openstack/kolla/blob/stable/mitaka/ansible/roles/ceph/tasks/start_osds.yml#L21
>>>   should be with_items: "{{ osds }}" (needs the wrapping quote-tag )
>>>
>>>
>>>   The output below shows a 're-run' where ceph had been deployed once, and the partition names were manually reset for the next run.  Notice that since the box was NOT rebooted, the /sys info is stale:
>>>   stack@c1n7:~$ sudo sgdisk -p /dev/sda
>>>   Disk /dev/sda: 250069680 sectors, 119.2 GiB
>>>   Logical sector size: 512 bytes
>>>   Disk identifier (GUID): CEA98805-36A2-4FF6-9357-5999AA267F48
>>>   Partition table holds up to 128 entries
>>>   First usable sector is 34, last usable sector is 250069646
>>>   Partitions will be aligned on 1-sector boundaries
>>>   Total free space is 16 sectors (8.0 KiB)
>>>
>>>   Number  Start (sector)    End (sector)  Size       Code  Name
>>>      1              34            1987   977.0 KiB   EF02
>>>      2            1988       160001988   76.3 GiB    EF00
>>>      3       160001989       170001989   4.8 GiB     8300  KOLLA_CEPH_OSD_BOOTSTRA
>>>      4       170001990       250069630   38.2 GiB    8300  KOLLA_CEPH_OSD_BOOTSTRA
>>>   stack@c1n7:~$ sudo partprobe /dev/sda
>>>   stack@c1n7:~$ python ./test_steve.py
>>>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda') named
>>>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda1') named
>>>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda2') named
>>>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda3') named KOLLA_CEPH_DATA_0_J
>>>   checking Device(u'/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda/sda4') named KOLLA_CEPH_DATA_2
>>>   checking Device(u'/sys/devices/virtual/block/loop0') named
>>>
>>> To manage notifications about this bug go to:
>>> https://bugs.launchpad.net/kolla/+bug/1589309/+subscriptions
>>
>>
>>
>> --
>> Stephen Hindle - Senior Systems Engineer
>> 480.807.8189 480.807.8189
>> www.limelight.com Delivering Faster Better
>>
>> Join the conversation
>>
>> at Limelight Connect
>
>
>
> --
> Stephen Hindle - Senior Systems Engineer
> 480.807.8189 480.807.8189
> www.limelight.com Delivering Faster Better
>
> Join the conversation
>
> at Limelight Connect

-- 
Stephen Hindle - Senior Systems Engineer
480.807.8189 480.807.8189
www.limelight.com Delivering Faster Better

Join the conversation

at Limelight Connect

-- 
The information in this message may be confidential.  It is intended solely 
for
the addressee(s).  If you are not the intended recipient, any disclosure,
copying or distribution of the message, or any action or omission taken by 
you
in reliance on it, is prohibited and may be unlawful.  Please immediately
contact the sender if you have received this message in error.

Revision history for this message

Jeffrey Zhang (jeffrey4l) wrote on 2016-06-11:

#16

after talked with Steve Hindle in IRC. Get some clue.

Steve is trying to change a online drive( root disk ) partition name. The label is
not updated in the kernel, because it is in used. But after run `udevadm trigger`
the new label is shown up.

even if it do not work. I think it is a bug in udev. we(kolla) should not handle this case.
if the udev still no idea for the new label. It may cause other issue in the future.

Paul Bourke (pauldbourke) on 2016-06-23

Changed in kolla:
status:	New → Triaged

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-28: Related fix proposed to kolla (master)

#17

Related fix proposed to branch: master
Review: https://review.openstack.org/334970

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-28:

#18

Related fix proposed to branch: master
Review: https://review.openstack.org/335046

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-28: Change abandoned on kolla (master)

#19

Change abandoned by Paul Bourke (<email address hidden>) on branch: master
Review: https://review.openstack.org/335046
Reason: dupe of 334970

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-07-01:

#20

Change abandoned by Paul Bourke (<email address hidden>) on branch: master
Review: https://review.openstack.org/334970
Reason: Thanks for the reviews on this all.

After discussing all the different partition schemes allowed by the existing label based approach, I've concluded this patch is not a good substitute. What Steve has in https://review.openstack.org/#/c/326609/ solves my particular problem, though we may need to revisit this if udev continues to cause problems.

Paul Bourke (pauldbourke) on 2016-11-25

Changed in kolla:
status:	Triaged → Fix Released

kolla

Problem Bootstrapping ceph from Partitions due to stale part. table

Bug Description

Other bug subscribers

Patches

Remote bug watches