block_device_resource script incorrectly sets hot-swap HDDs as "removable" causing disk tests to not run

Bug #1610401 reported by Jeff Lane 
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Checkbox Provider - Resource
Fix Released
Critical
Jeff Lane 

Bug Description

The current block_device_resource script does the following:

def device_state(name):
    """Follow pmount policy to find if a device is removable or internal."""
    with open('/sys/block/%s/device/block/%s/removable' % (name, name),
              "rt") as f:
        if f.read(1) == '1':
            print("%s: first clause"% name)
            return 'removable'

    path = rootdir_pattern.sub('', os.readlink('/sys/block/%s' % name))
    hotplug_buses = ("usb", "ieee1394", "mmc", "pcmcia", "firewire")
    for bus in hotplug_buses:
        if os.path.exists('/sys/bus/%s' % bus):
            for device_bus in os.listdir('/sys/bus/%s/devices' % bus):
                device_link = rootdir_pattern.sub('', os.readlink(
                    '/sys/bus/%s/devices/%s' % (bus, device_bus)))
                if re.search(device_link, path):
                    return 'removable'

    return 'internal'

The first match causes any hot-swap hard drives as you'd find in any hostswap capable server... because technically, hot-swap HDDs ARE removable, BUT they're also considered "internal". Because of this, the HDD tests are being skipped because device.state is set to "removable" for these hot-swap HDDs.

Here's what it looks like in practice on the system we discovered this on:
sda: first clause
sda_state: removable
sda_usb2: unsupported
sda_usb3: unsupported
sda_rotation: yes
sda_smart: False

sdb: first clause
sdb_state: removable
sdb_usb2: unsupported
sdb_usb3: unsupported
sdb_rotation: yes
sdb_smart: False

sdc: first clause
sdc_state: removable
sdc_usb2: supported
sdc_usb3: supported
sdc_rotation: yes
sdc_smart: False

sdd: first clause
sdd_state: removable
sdd_usb2: supported
sdd_usb3: unsupported
sdd_rotation: yes
sdd_smart: False

Note the "first clause" entry is a tracer I put in to figure out which of the two code branches were causing the problem. sda and sdb are both spinning HDDs, but both are in hot-swap bays with hot-swap enabled (can't be disabled). Because sda|sdb_state is "removable", the various disk/* tests that depend on device.state != "removable" all get skipped.

If I remove that first code block that simply checks sys/.../.../removable for 0 or 1, I get a more realistic result:
sda_state: internal
sda_usb2: unsupported
sda_usb3: unsupported
sda_rotation: yes
sda_smart: False

sdb_state: internal
sdb_usb2: unsupported
sdb_usb3: unsupported
sdb_rotation: yes
sdb_smart: False

sdc_state: removable
sdc_usb2: supported
sdc_usb3: supported
sdc_rotation: yes
sdc_smart: False

sdd_state: removable
sdd_usb2: supported
sdd_usb3: unsupported
sdd_rotation: yes
sdd_smart: False

Now it correctly identifies my two HDDs as internal (because they're not on the list of hot pluggable busses in the second code block), and the two USB devices are correctly identified as "removable".

Related branches

Revision history for this message
Jeff Lane  (bladernr) wrote :

I have a patch for this that simply removes that first block. I toyed around with other things, like adding this to the second block, and moving that to first place,

def device_state(name):
    """Follow pmount policy to find if a device is removable or internal."""

    path = rootdir_pattern.sub('', os.readlink('/sys/block/%s' % name))
    hotplug_buses = ("scsi", "usb", "ieee1394", "mmc", "pcmcia", "firewire")
    for bus in hotplug_buses:
        if os.path.exists('/sys/bus/%s' % bus):
            for device_bus in os.listdir('/sys/bus/%s/devices' % bus):
                device_link = rootdir_pattern.sub('', os.readlink(
                    '/sys/bus/%s/devices/%s' % (bus, device_bus)))
                if bus == "scsi" and re.search(device_link, path):
                    print("path: %s" % path)
                    print("device_link: %s" % device_link)
                    print("device_bus: %s" % device_bus)
                    return 'internal'
                elif re.search(device_link, path):
                    return 'removable'

    with open('/sys/block/%s/device/block/%s/removable' % (name, name),
              "rt") as f:
        if f.read(1) == '1':
            return 'removable'

    return 'internal'

This doesn't work though, because USB devices are ALSO on the scsi bus.

In the end, simply removing that check for /sys/block/%s/device/block/%s/removable fixes the issue.

So I wonder why that check is even there...

what cases are there where a device on "usb", "ieee1394", "mmc", "pcmcia", "firewire" busses are NOT removable? I can't think of any...

However, as this likely can affect client testing I want to make sure this is reasonable, and am open to suggestions if there's a better way to ID actual HDDs vs removabe storage devices in block_device_resource.

I do, however, need this fix pretty fast-like, as it IS a blocker on any server with active hot-swap drives.

Note, this does NOT affect hardware RAID setups with hot-swap drives. In that case, the RAID controller presents a non-removable LUN to the OS, and handles the removability of the drive internally to itself.

Changed in plainbox-provider-resource:
assignee: nobody → Jeff Lane (bladernr)
importance: Undecided → High
importance: High → Critical
Revision history for this message
Jeff Lane  (bladernr) wrote :

here's output from my desktop with three internal disks and two external USB HDDs and a usb thumbdrive...

sda_state: internal
sda_usb2: unsupported
sda_usb3: unsupported
sda_rotation: no
sda_smart: False

sdb_state: internal
sdb_usb2: unsupported
sdb_usb3: unsupported
sdb_rotation: yes
sdb_smart: False

sdc_state: internal
sdc_usb2: unsupported
sdc_usb3: unsupported
sdc_rotation: yes
sdc_smart: False

sdd_state: removable
sdd_usb2: supported
sdd_usb3: unsupported
sdd_rotation: yes
sdd_smart: False

sde_state: removable
sde_usb2: supported
sde_usb3: unsupported
sde_rotation: yes
sde_smart: False

sdf_state: removable
sdf_usb2: supported
sdf_usb3: unsupported
sdf_rotation: yes
sdf_smart: False

and here's same system with USB devices removed:
sda_state: internal
sda_usb2: unsupported
sda_usb3: unsupported
sda_rotation: no
sda_smart: False

sdb_state: internal
sdb_usb2: unsupported
sdb_usb3: unsupported
sdb_rotation: yes
sdb_smart: False

sdc_state: internal
sdc_usb2: unsupported
sdc_usb3: unsupported
sdc_rotation: yes
sdc_smart: False

Revision history for this message
Jeff Lane  (bladernr) wrote :

Intel NUC with 1 mSATA SSD and 1 thumb drive:
sda_state: internal
sda_usb2: unsupported
sda_usb3: unsupported
sda_rotation: no
sda_smart: False

sdb_state: removable
sdb_usb2: supported
sdb_usb3: unsupported
sdb_rotation: yes
sdb_smart: False

Revision history for this message
Jeff Lane  (bladernr) wrote :

Server with 1 thumbdrive and HW RAID and a DVD drive:
sda_state: internal
sda_usb2: unsupported
sda_usb3: unsupported
sda_rotation: yes
sda_smart: False

sdb_state: removable
sdb_usb2: supported
sdb_usb3: unsupported
sdb_rotation: yes
sdb_smart: False

sr0_state: internal
sr0_usb2: unsupported
sr0_usb3: unsupported
sr0_rotation: yes
sr0_smart: False

Revision history for this message
Jeff Lane  (bladernr) wrote :

That one COULD be an issue as the optical drive is set as removable:
ubuntu@rs140:~$ cat /sys/block/sr0/device/block/sr0/removable
1

but likely not as no optical drive tests use device.state constraints.

the only tests that use that constraint are in disk.txt

Revision history for this message
Jeff Lane  (bladernr) wrote :

One more on another server with two HDDs, a DVD drive and no USB devices:
sda_state: internal
sda_usb2: unsupported
sda_usb3: unsupported
sda_rotation: yes
sda_smart: False

sdb_state: internal
sdb_usb2: unsupported
sdb_usb3: unsupported
sdb_rotation: yes
sdb_smart: False

sr0_state: internal
sr0_usb2: unsupported
sr0_usb3: unsupported
sr0_rotation: yes
sr0_smart: False

Changed in plainbox-provider-resource:
status: New → In Progress
Jeff Lane  (bladernr)
tags: added: blocks-hwcert-server
removed: hwcert-server
Revision history for this message
Mike Rushton (leftyfb) wrote :

I had this fail on 2 openpower boxes. I have attached the output from the test.

Revision history for this message
Mike Rushton (leftyfb) wrote :

ignore the above comment. Wrong bug

Jeff Lane  (bladernr)
Changed in plainbox-provider-resource:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.