ephemeral0 of /dev/sda1 triggers 'did not find entry for sda1 in /sys/block'

Bug #1263294 reported by Robert Collins
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
cloud-init
Expired
Medium
Unassigned
tripleo
Won't Fix
High
Unassigned

Bug Description

This is due to line 227 of ./cloudinit/config/cc_mounts.py::

    short_name = os.path.basename(device)
    sys_path = "/sys/block/%s" % short_name

    if not os.path.exists(sys_path):
        LOG.debug("did not find entry for %s in /sys/block", short_name)
        return None

The sys path for /dev/sda1 is /sys/block/sda/sda1.

Revision history for this message
Robert Collins (lifeless) wrote :

This can calculate the correct /sys/block path.

    short_name = os.path.basename(device)
    if short_name[-1].isdigit():
        for offset in range(1, len(short_name)):base_dev = ''
            if not short_name[-offset].isdigit():
                break
        sys_path = "/sys/block/%s/%s" % (short_name[:-offset+1], short_name)
    else:
        sys_path = "/sys/block/%s" % short_name

Revision history for this message
Robert Collins (lifeless) wrote :

Well, in some cases. The code needs tests badly ;)

Revision history for this message
Robert Collins (lifeless) wrote :

I'm failing to understand the current code - I dont' see how it can be even vaguely correct:

    short_name = os.path.basename(device)
    sys_path = "/sys/block/%s" % short_name

    if not os.path.exists(sys_path):
        LOG.debug("did not find entry for %s in /sys/block", short_name)
        return None

    sys_long_path = sys_path + "/" + short_name

if dev = /dev/sda

Then
sys_path = /sys/block/sda
and sys_long_path = /sys/block/sda/sda -- which doesn't ever exist.

and this is then appended to by the 'valid mappings' bit, which means we're expecting partitions to have been split out before calling into this code, but the dev token thing uses an entirely nonstandard '.' separator.

Are we perhaps meant to use '/dev/sda.1' to mean '/dev/sda1' ?

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

This is blocking ephemeral disk usage on hardware which is critical to our image based updates via rebuild code path.

Changed in tripleo:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-image-elements (master)

Fix proposed to branch: master
Review: https://review.openstack.org/63584

Changed in tripleo:
assignee: nobody → Clint Byrum (clint-fewbar)
status: Triaged → In Progress
Revision history for this message
Robert Collins (lifeless) wrote :

Using /dev/sda.1:

Dec 21 20:35:55 overcloud-notcompute-4qf5igdly3as [CLOUDINIT] cc_mounts.py[DEBUG]: Attempting to determine the real name of ephemeral0
Dec 21 20:35:55 overcloud-notcompute-4qf5igdly3as [CLOUDINIT] cc_mounts.py[DEBUG]: Ignoring nonexistant default named mount ephemeral0
Dec 21 20:35:55 overcloud-notcompute-4qf5igdly3as [CLOUDINIT] cc_mounts.py[DEBUG]: Attempting to determine the real name of swap
Dec 21 20:35:55 overcloud-notcompute-4qf5igdly3as [CLOUDINIT] DataSourceEc2.py[DEBUG]: Unable to convert swap to a device
Dec 21 20:35:55 overcloud-notcompute-4qf5igdly3as [CLOUDINIT] cc_mounts.py[DEBUG]: Ignoring nonexistant default named mount swap
Dec 21 20:35:55 overcloud-notcompute-4qf5igdly3as [CLOUDINIT] cc_mounts.py[DEBUG]: No modifications to fstab needed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-image-elements (master)

Reviewed: https://review.openstack.org/63584
Committed: https://git.openstack.org/cgit/openstack/tripleo-image-elements/commit/?id=7d0f9758772c00267581b87018f6081ac806e690
Submitter: Jenkins
Branch: master

commit 7d0f9758772c00267581b87018f6081ac806e690
Author: Clint Byrum <email address hidden>
Date: Sat Dec 21 08:09:46 2013 -0800

    Work around broken cloud-init ephemeral disk code

    Cloud-init cannot handle ephemeral disks like '/dev/sda1'. This prevents
    the nova baremetal default from working properly. So if the ephemeral
    disk isn't already mounted, we mount it just before we need it.

    Change-Id: Ide0e5ed3eff91755aac7d8f1e9c43f723f7bf3d5
    Closes-Bug: #1263294

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
Scott Moser (smoser) wrote :

I agree that code is very hard to understand, and yes, I wrote it.

Changed in cloud-init:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Clint Byrum (clint-fewbar) wrote : Re: [Bug 1263294] Re: ephemeral0 of /dev/sda1 triggers 'did not find entry for sda1 in /sys/block'

Thanks for responding Scott.

Can I suggest a High importance?

While the number of users impacted is low (OpenStack Nova baremetal/ironic
users who want an ephemeral partition), There is no work-around for it
for us.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-incubator (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/69167

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-incubator (master)

Reviewed: https://review.openstack.org/69167
Committed: https://git.openstack.org/cgit/openstack/tripleo-incubator/commit/?id=fdbde78f5df519ce43fa7753e13943077b2c8584
Submitter: Jenkins
Branch: master

commit fdbde78f5df519ce43fa7753e13943077b2c8584
Author: Robert Collins <email address hidden>
Date: Sun Jan 26 17:41:26 2014 +1300

    Workaround bug 1263294 for all images we build.

    The workaround in tripleo-image-elements for bug 1263294 only takes
    effect when an element depends on some of the other got in
    'use-ephemeral'. Explicitly drag use-ephemeral in so that we always
    have the workaround.

    Change-Id: Iacaaba3b71b80292ad8d0d8ad4354f3c8860a07f
    Related-Bug: #1263294

Revision history for this message
Robert Collins (lifeless) wrote :

Reopening this - once we've configured services to use /mnt, its a fairly fatal error when we nova rebuild the machine (preserving the state partition) but said services start up after cloud init but before our workaround-hack to fix things. E.g. our hack is insufficient.

Changed in tripleo:
status: Fix Released → Triaged
Changed in tripleo:
assignee: Clint Byrum (clint-fewbar) → Michael Kerrin (michael-kerrin-w)
status: Triaged → In Progress
James Polley (tchaypo)
Changed in tripleo:
assignee: Michael Kerrin (michael-kerrin-w) → nobody
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Bugs with workarounds can be set to 'High'

Changed in tripleo:
importance: Critical → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-image-elements (master)

Change abandoned by Michael Kerrin (<email address hidden>) on branch: master
Review: https://review.openstack.org/90993
Reason: Moving on

Revision history for this message
Ben Nemec (bnemec) wrote :

We've moved away from ephemeral partitions in TripleO, so this no longer needs to be fixed there.

Changed in tripleo:
status: In Progress → Won't Fix
Revision history for this message
James Falcon (falcojr) wrote :
Changed in cloud-init:
status: Triaged → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.