sg devices are not being created for SCSI storage controller devices starting from v4.4.0-30

Bug #1622894 reported by Amit Oren
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
High
Unassigned

Bug Description

Most storage array expose a store controller device to logical unit 0.
Following LP #1567602, when this happens and the alua driver is loaded, Ubuntu doesn't create the sg device for it, and linux fails to discover the disks that are attached to higher logical unit numbers.

With kernel version 4.4.0-31, we tried a number of things:
- With a disk mapped to LUN0, everything is working.
- Without a disk mapped to LUN0, nothing works - linux doesn't see disks mapped to higher LUNs, calling rescan_scsi_bus throws a stack trace to syslog: https://gist.github.com/grzn/20c05ce3fc96062ec9572fd5b8093b55
- When turning on SCSI logging to the maximum (https://access.redhat.com/articles/337903 - 0xfffffff), we see that after ALUA driver rejected the SCSI controller nothing happens (look for 18:0:0:0 which is a controller here (https://gist.github.com/grzn/f8b65dd07b2df5f3f72ca8121d8c93ad) vs 19:0:00 which is a disk here (https://gist.github.com/grzn/48da86b7f7390dab2983927862cda949).
- When removing the scsi_dh_alua driver and then reloading the lpfc driver, everything is working as expected when no disk is mapped to LUN0.
---
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Sep 12 18:00 seq
 crw-rw---- 1 root audio 116, 33 Sep 12 18:00 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 16.04
HibernationDevice: RESUME=UUID=1b089ce2-057f-44d6-9e53-5ee1b7fb5be9
IwConfig: Error: [Errno 2] No such file or directory
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: VMware, Inc. VMware Virtual Platform
Package: linux (not installed)
PciMultimedia:

ProcFB: 0 svgadrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-31-generic root=UUID=6738fbd7-7243-4860-949f-717dfd084c43 ro quiet splash
ProcVersionSignature: Ubuntu 4.4.0-31.50-generic 4.4.13
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-31-generic N/A
 linux-backports-modules-4.4.0-31-generic N/A
 linux-firmware 1.157.3
RfKill: Error: [Errno 2] No such file or directory
Tags: xenial xenial
Uname: Linux 4.4.0-31-generic x86_64
UnreportableReason: The report belongs to a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: False
dmi.bios.date: 09/17/2015
dmi.bios.vendor: Phoenix Technologies LTD
dmi.bios.version: 6.00
dmi.board.name: 440BX Desktop Reference Platform
dmi.board.vendor: Intel Corporation
dmi.board.version: None
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 1
dmi.chassis.vendor: No Enclosure
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLTD:bvr6.00:bd09/17/2015:svnVMware,Inc.:pnVMwareVirtualPlatform:pvrNone:rvnIntelCorporation:rn440BXDesktopReferencePlatform:rvrNone:cvnNoEnclosure:ct1:cvrN/A:
dmi.product.name: VMware Virtual Platform
dmi.product.version: None
dmi.sys.vendor: VMware, Inc.

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1622894

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Amit Oren (amito) wrote : CRDA.txt

apport information

tags: added: apport-collected xenial
description: updated
Revision history for this message
Amit Oren (amito) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Amit Oren (amito) wrote : JournalErrors.txt

apport information

Revision history for this message
Amit Oren (amito) wrote : Lspci.txt

apport information

Revision history for this message
Amit Oren (amito) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Amit Oren (amito) wrote : ProcEnviron.txt

apport information

Revision history for this message
Amit Oren (amito) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Amit Oren (amito) wrote : ProcModules.txt

apport information

Revision history for this message
Amit Oren (amito) wrote : UdevDb.txt

apport information

Revision history for this message
Amit Oren (amito) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.8 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.8-rc6

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Xenial test kernel with a cherry-pick of commit 221255aee67. Can you test this kernel and see if it resolves this bug? It can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1622894/

With this kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Revision history for this message
Guy Rozendorn (guy-8) wrote :

Hi,
Amit is on vacation for the following week, so I ran the test again with '4.4.0-36-generic #55~lp1622894' and the sg device gets created when the alua driver is loaded:
https://gist.github.com/2c1a1f9f81262fa8e490877e91bade95

but when I remove the lpfc driver there's still a stack trace in syslog: https://gist.github.com/6774cb3e95f8a6f71246b9c9d05fc977

I'll try now with the mainline kernel.

Revision history for this message
Guy Rozendorn (guy-8) wrote :

With the mainline kernel, 'modprobe lpfc' doesn't exit: https://gist.github.com/e1d29ee22d54372a7a5e38b2c0b6a21b

tags: added: kernel-bug-exists-upstream
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This bug does not exist in kernels prior to v4.4.0-30? If that is the case, we might be able to perform a kernel bisect to identify the commit that introduced the regression.

Revision history for this message
Guy Rozendorn (guy-8) wrote :

We haven't tested v4.4.0-30 since there's no deb package for it (I guess it was removed from the repo?).

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The v4.4.0-30 kernel is still available in the ppa:
https://launchpad.net/ubuntu/xenial/+source/linux

It might be good to test the v4.4.0-29 and v4.4.0-30 kernels. We can bisect between the two iv -29 is good and -30 is bad.

To downloaded the kernels, click on the version number from that pages. On the next page, find the section "Builds". Click the link for your build. On the final page, downloaded and install the linux-image and linux-image-extra .deb packages.

Thanks!

Revision history for this message
Amit Oren (amito) wrote :

Finished the kernel bisection process, this is the result:
"8d83f8015312b0d600157dd4b6a70ae6d9ee17f9 is the first bad commit
commit 8d83f8015312b0d600157dd4b6a70ae6d9ee17f9
Author: Hannes Reinecke <email address hidden>
Date: Tue Dec 1 10:16:57 2015 +0100

    scsi: Add scsi_vpd_tpg_id()

    Implement scsi_vpd_tpg_id() to extract the target
    port group id and the relative port id from
    SCSI VPD page 0x83.

    Reviewed-by: Johannes Thumshirn <email address hidden>
    Reviewed-by: Christoph Hellwig <email address hidden>
    Signed-off-by: Hannes Reinecke <email address hidden>
    Signed-off-by: Martin K. Petersen <email address hidden>

    BugLink: http://bugs.launchpad.net/bugs/1567602

    (cherry-picked from commit a8aa3978588a4fa2d9edabc151adedd97bbed091)
    Signed-off-by: Stefan Bader <email address hidden>
    Acked-by: Brad Figg <email address hidden>
    Signed-off-by: Kamal Mostafa <email address hidden>

:040000 040000 d72abb718bfaeada82e7585afeb1a63b8d546395 e7d570069d9649a70a4f2c6ac0a0f8845036df37 M drivers
:040000 040000 110a71ec936859fb0eb36f19750c7d58e555d471 29349614b62275e58c12832ebed2dd77ae44e9d3 M include"

Revision history for this message
Amit Oren (amito) wrote :

Hi, any news?
Should I re-open #1567602 (you can see from the bisection the commit leads to this issue)?

Thanks.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Does Xenial still exhibit this bug with the latest updates?

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Amit Oren (amito) wrote :

Hi Joseph,

Seems like it doesn't.

Thanks,
Amit

Changed in linux (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.