Problem Bootstrapping ceph from Partitions due to stale part. table
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kolla |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
It appears kolla-ceph is having problems bootstrapping partitions due to stale kernel/sys info.
The bootstrap process looks for 'magic names' in the partition table AND CHANGES THEM. This seems
to work fine. However, the next phase (start_osds.yml) looks for the NEW partition names, and gets the old names from /sys. This causes it to fail startup.
As a potential workaround, I'm going to try modifying is_dev_
should be with_items: "{{ osds }}" (needs the wrapping quote-tag )
The output below shows a 're-run' where ceph had been deployed once, and the partition names were manually reset for the next run. Notice that since the box was NOT rebooted, the /sys info is stale:
stack@c1n7:~$ sudo sgdisk -p /dev/sda
Disk /dev/sda: 250069680 sectors, 119.2 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): CEA98805-
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 250069646
Partitions will be aligned on 1-sector boundaries
Total free space is 16 sectors (8.0 KiB)
Number Start (sector) End (sector) Size Code Name
1 34 1987 977.0 KiB EF02
2 1988 160001988 76.3 GiB EF00
3 160001989 170001989 4.8 GiB 8300 KOLLA_CEPH_
4 170001990 250069630 38.2 GiB 8300 KOLLA_CEPH_
stack@c1n7:~$ sudo partprobe /dev/sda
stack@c1n7:~$ python ./test_steve.py
checking Device(
checking Device(
checking Device(
checking Device(
checking Device(
checking Device(
Changed in kolla: | |
status: | New → Triaged |
Changed in kolla: | |
status: | Triaged → Fix Released |
The following patches _seem_ to fix the problem for me - I need to rebuild the cluster from scratch and do a 'bare metal' test.
Blah - have to attach 1 patch at a time :-/