sg_scan returns wrong HLU number if it's greater than 255

Bug #1793259 reported by Sam Wan
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Sam Wan
os-brick
Fix Released
Undecided
Sam Wan

Bug Description

The tempest test case 'test_volumes_extend.VolumesExtendAttachedTest.test_extend_attached_volume' will fail if the volume created is assigned a HLU number greater than 255.

The cause is that os-brick uses 'sg_scan' to get device information(H:C:T:L)
======
 92 def get_device_info(self, device):
 93 (out, _err) = self._execute('sg_scan', device, run_as_root=True,
 94 root_helper=self._root_helper)
 95 dev_info = {'device': device, 'host': None,
 96 'channel': None, 'id': None, 'lun': None}
 97 if out:
 98 line = out.strip()
 99 line = line.replace(device + ": ", "")
100 info = line.split(" ")
101
102 for item in info:
103 if '=' in item:
104 pair = item.split('=')
105 dev_info[pair[0]] = pair[1]
106 elif 'scsi' in item:
107 dev_info['host'] = item.replace('scsi', '')
108
109 return dev_info
======

sg_scan uses 'ioctl SCSI_IOCTL_GET_IDLUN' to get this device information.
https://github.com/hreinecke/sg3_utils/blob/master/src/sg_scan_linux.c#L321
======
        res = ioctl(sg_fd, SCSI_IOCTL_GET_IDLUN, &my_idlun);
...
        printf("%s: scsi%d channel=%d id=%d lun=%d", file_namep, host_no,
               (my_idlun.dev_id >> 16) & 0xff, my_idlun.dev_id & 0xff,
               (my_idlun.dev_id >> 8) & 0xff); # <--- only 8-bit represents the device id.
======

however the device_id that sg_scan can return is only one byte which means it can only return number lower than 255.

Below is an example.
======
here's a device
------
# multipath -ll 3600601602220440062449f5b82796e05
3600601602220440062449f5b82796e05 dm-6 DGC ,VRAID
size=1.0G features='1 retain_attached_hw_handler' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 6:0:0:15797 sdm 8:192 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
  `- 7:0:2:15797 sdn 8:208 active ready running
------

we can see that the device id is 15797.
this number can also be got using 'lsscsi'
------
# lsscsi |grep 15797
[6:0:0:15797] disk DGC VRAID 4400 /dev/sdm
[7:0:2:15797] disk DGC VRAID 4400 /dev/sdn
------

However sg_scan returned different device id
------
# sg_scan -i /dev/sdm
/dev/sdm: scsi6 channel=0 id=0 lun=181 [em]
    DGC VRAID 4400 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
# sg_scan -i /dev/sdn
/dev/sdn: scsi7 channel=0 id=2 lun=181 [em]
    DGC VRAID 4400 [rmb=0 cmdq=1 pqual=0 pdev=0x0]
------
The device id returned by sg_scan is 181.

By doing some simple binary calculation, we can find out that 181 is the last 8-bit of 15797
------
15797 && 0xff = 0b11110110110101 && 0x11111111 = 0b10110101 = 181
-----

Since wrong device id is returned by sg_scan, when os-brick tries to scan a non-exist device it will cause extend_attached_volume to fail.

We should use 'lsscsi' to get device id.

Changed in os-brick:
status: New → Confirmed
Changed in nova:
status: New → Confirmed
Revision history for this message
Sam Wan (sam-wan) wrote :
Changed in nova:
assignee: nobody → Sam Wan (sam-wan)
Changed in os-brick:
assignee: nobody → Sam Wan (sam-wan)
status: In Progress → Confirmed
Changed in os-brick:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-brick (master)

Reviewed: https://review.opendev.org/742784
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=fc6ca22bdb955137d97cb9bcfc84104426e53842
Submitter: Zuul
Branch: master

commit fc6ca22bdb955137d97cb9bcfc84104426e53842
Author: Sam Wan <email address hidden>
Date: Thu Jul 23 22:35:27 2020 -0400

    Replace sg_scan with lsscsi to get '[H:C:T:L]'

    The current get_device_info uses sg_scan to get device info but it only
    returns HLU number lower than 255 due to bug#1793259. sg_scan was
    designed for old days when 255 LUNs were enough. However we now have
    requirement to support HLU number greater than 255. Since lsscsi doesn't
    have the limit of 255, we should use lsscsi to get device info.

    The 'device' of get_device_info can be of 2 types:
    o /dev/disk/by-path/xxx, which is a symlink to /dev/sdX
    o /dev/sdX

    sg_scan can take any device name but lsscsi only show /dev/sdx names.
    So if the device is a symlink, we use the device name it links to,
    otherwise we use it directly.
    Then get the device info '[H:C:T:L]' by comparing the device name with the
    last column of lsscsi output
    Also lsscsi doesn't require privilege.

    Depends-on: https://review.opendev.org/743548
    Change-Id: I867c972d9f712c0df4260ebc8211b786006ed7a2
    Closes-bug: #1793259

Changed in os-brick:
status: In Progress → Fix Released
Sam Wan (sam-wan)
Changed in nova:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.