kolla

Error when bootstrapping cache OSD on NVMe drive.

Bug #1847014 reported by Eddie Yen on 2019-10-07

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
kolla	Fix Released	Medium	Eddie Yen
Rocky	Fix Released	Medium	Unassigned	kolla 7.1.0 "rocky"
Stein	Fix Released	Medium	Unassigned	kolla 8.0.2 "stein"
Train	Fix Released	Medium	Eddie Yen	kolla 9.0.0 "Train"

Bug Description

When I trying to deploy Ceph with cache tier, I got an error when bootstrapping cache OSDs on NVMe drive.

The error is inside the attachment.

I think the root cause is the partition number. The nornal drive using "1, 2, ..." as partition number, like "sda1, sda2, ...".
But the NVMe drive using "p1, p2, ..." as partition number, like "nvme0n1p1, nvme0n1p2".
Kolla-ansible just "forget" to add "p" into device path. It went to "nvme0n11, nvme0n12" output when generating the command, then caused the error because the path is not exist.

But Idk how to fix it, because Idk where the value generated.
Please help.

OS: Ubuntu 18.04
Kolla Release: stable-rocky

See original description

Tags:

Revision history for this message

Eddie Yen (aksn74) wrote on 2019-10-07:

osd_cache_error_msg.txt Edit (8.8 KiB, text/plain)

description:	updated
tags:	added: ceph stable-rocky

Revision history for this message

Mark Goddard (mgoddard) wrote on 2019-10-07:

I have seen bugs like this before. It's not just NVMe, it happens with loopbacks too. The rule is "if the device ends in a number, add a 'p' before the partition number".

Changed in kolla-ansible:
status:	New → Triaged
importance:	Undecided → Medium

Revision history for this message

Mark Goddard (mgoddard) wrote on 2019-10-07:

Here is a fix for the same issue with loopback devices: https://review.opendev.org/#/c/668222/1/docker/ceph/ceph-osd/extend_start.sh

I think we need to make a new function in that file to get the partition device name. Can you do it?

Revision history for this message

Eddie Yen (aksn74) wrote on 2019-10-07:

Hmm, I'm not very well about coding, but I think we can do the scenario like add "p" in front of PARTNUM if the last character of DEV is number. Is that what you want?
If so, perhaps can try create the function about this, then addition the function into "if [[ "${OSD_BS_DEV}" =~ "/dev/loop" ]];" with OR. Like:

if [[ "${OSD_BS_DEV}" =~ "/dev/loop" || ${OSD_BS_NVME_DEV} = "True" ]];

Little challenge to me, but I may try it out.

Revision history for this message

Mark Goddard (mgoddard) wrote on 2019-10-08:

Here's a starting point: a function that generates a partition device name:

function part_name {
   if [[ $1 =~ .*[0-9] ]]; then
     echo ${1}p${2}
   else
     echo ${1}${2}
   fi
}

echo $(part_name /dev/sda 1)
echo $(part_name /dev/loop1 2)
echo $(part_name /dev/nvme1 1)

Revision history for this message

Eddie Yen (aksn74) wrote on 2019-10-08:

Thanks for your hint! I'll try it out. May take about little longer since I'm very busy in these days.

Mark Goddard (mgoddard) on 2019-10-16

no longer affects:	kolla-ansible/stein
no longer affects:	kolla-ansible/rocky
Changed in kolla:
importance:	Undecided → Medium
no longer affects:	kolla-ansible/train
no longer affects:	kolla-ansible

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-17: Fix merged to kolla (stable/stein)

Reviewed: https://review.opendev.org/688926
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=6bc6469bc6750ccc388971d5b2f7e3fe98aba8f9
Submitter: Zuul
Branch: stable/stein

commit 6bc6469bc6750ccc388971d5b2f7e3fe98aba8f9
Author: Eddie Yen <email address hidden>
Date: Mon Oct 14 05:24:46 2019 +0000

Add disk dev name check function

    This patch will add new function in extend_start.sh for OSD
    creation. Not only support loop device but also others that
    disk dev layout is end with numbers.

    Change-Id: Iee5f8b8581d70166de6eba1bdc9e42766fe8cb48
    Closes-Bug: #1847014
    (cherry picked from commit 1d5f753fb13bcc3659b4abd1bb768de8550a6dc4)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-17: Fix merged to kolla (stable/rocky)

Reviewed: https://review.opendev.org/688918
Committed: https://git.openstack.org/cgit/openstack/kolla/commit/?id=5da1c35cc3d113ef702e8e2515c8178c413a4af2
Submitter: Zuul
Branch: stable/rocky

commit 5da1c35cc3d113ef702e8e2515c8178c413a4af2
Author: Eddie Yen <email address hidden>
Date: Mon Oct 14 05:24:46 2019 +0000

Add disk dev name check function

    This patch will add new function in extend_start.sh for OSD
    creation. Not only support loop device but also others that
    disk dev layout is end with numbers.

Change-Id: Iee5f8b8581d70166de6eba1bdc9e42766fe8cb48
Closes-Bug: #1847014

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-11-06: Fix included in openstack/kolla 9.0.0.0rc1

This issue was fixed in the openstack/kolla 9.0.0.0rc1 release candidate.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-01-30: Fix included in openstack/kolla 7.1.0

#10

This issue was fixed in the openstack/kolla 7.1.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-01-30: Fix included in openstack/kolla 8.0.2

#11

This issue was fixed in the openstack/kolla 8.0.2 release.