Processes hang on attempted access of WDC WD30-EZRX 3TB HDD

Bug #1730746 reported by Chuck Burt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
Confirmed
Medium
linux (Ubuntu)
Confirmed
Medium
Unassigned
Artful
Won't Fix
Medium
Unassigned
Bionic
Confirmed
Medium
Unassigned

Bug Description

When booted under kernel 4.13.x, processes (such as parted, boot-info, gparted, etc) always hang when attempting to run due to a Western Digital Green WD30-EZRX 3TB HDD. Drive works as expected under kernel 4.10.x (even when all other things are the same about the system but only booted kernel is different).

Full specs of the machine if useful: https://www.support.hp.com/id-en/document/c03277050

ProblemType: Bug
DistroRelease: Ubuntu 17.10
Package: linux-image-4.13.0-16-generic 4.13.0-16.19
ProcVersionSignature: Ubuntu 4.13.0-16.19-generic 4.13.4
Uname: Linux 4.13.0-16-generic x86_64
ApportVersion: 2.20.7-0ubuntu3
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: ubuntu 1781 F.... pulseaudio
 /dev/snd/controlC0: ubuntu 1781 F.... pulseaudio
CasperVersion: 1.387
CurrentDesktop: ubuntu:GNOME
Date: Tue Nov 7 20:00:57 2017
IwConfig:
 eno1 no wireless extensions.

 lo no wireless extensions.
LiveMediaBuild: Ubuntu 17.10 "Artful Aardvark" - Release amd64 (20171018)
MachineType: Hewlett-Packard HP Z420 Workstation
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/casper/vmlinuz.efi file=/cdrom/preseed/username.seed boot=casper quiet splash ---
RelatedPackageVersions:
 linux-restricted-modules-4.13.0-16-generic N/A
 linux-backports-modules-4.13.0-16-generic N/A
 linux-firmware 1.169
RfKill:

SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 10/17/2016
dmi.bios.vendor: Hewlett-Packard
dmi.bios.version: J61 v03.91
dmi.board.asset.tag: HP5012436
dmi.board.name: 1589
dmi.board.vendor: Hewlett-Packard
dmi.board.version: 0.00
dmi.chassis.asset.tag: HP5012436
dmi.chassis.type: 6
dmi.chassis.vendor: Hewlett-Packard
dmi.modalias: dmi:bvnHewlett-Packard:bvrJ61v03.91:bd10/17/2016:svnHewlett-Packard:pnHPZ420Workstation:pvr:rvnHewlett-Packard:rn1589:rvr0.00:cvnHewlett-Packard:ct6:cvr:
dmi.product.family: 103C_53335X G=D
dmi.product.name: HP Z420 Workstation
dmi.sys.vendor: Hewlett-Packard

Revision history for this message
Chuck Burt (chucksense) wrote :
Chuck Burt (chucksense)
summary: - System hangs on attempted access of WDC WD30-EZRX 3TB HDD
+ Processes hang on attempted access of WDC WD30-EZRX 3TB HDD
description: updated
description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.14 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14-rc8

Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu Artful):
status: New → Incomplete
Changed in linux (Ubuntu Bionic):
status: Confirmed → Incomplete
Changed in linux (Ubuntu Artful):
importance: Undecided → Medium
tags: added: kernel-da-key
Revision history for this message
Chuck Burt (chucksense) wrote :

Yes, issue still exists in 4.14.0-041400rc8.

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu Artful):
status: Incomplete → Confirmed
Chuck Burt (chucksense)
Changed in linux (Ubuntu Bionic):
status: Incomplete → Confirmed
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Please file an upstream bug at https://bugzilla.kernel.org/

Product: IO/Storage
Component: SCSI

Revision history for this message
In , chuck.burt+kernel.org (chuck.burt+kernel.org-linux-kernel-bugs) wrote :

When booted under kernel 4.13.x, processes (such as parted, boot-info, gparted, etc) always hang when attempting to run due to a Western Digital Green WD30-EZRX 3TB HDD. Drive works as expected under kernel 4.10.x (even when all other things are the same about the system but only booted kernel is different).

Full specs of the machine if useful: https://www.support.hp.com/id-en/document/c03277050

More details on Ubuntu's LaunchPad (where they asked me to come here to file an upstream bug): https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1730746

Revision history for this message
Chuck Burt (chucksense) wrote :
Revision history for this message
In , bvanassche (bvanassche-linux-kernel-bugs) wrote :

Please provide the output of the following command after having reproduced the hang:

    dmesg -c >/dev/null; echo w > /proc/sysrq-trigger; dmesg

Additionally, if you know how to build the kernel yourself, it would be helpful if you could bisect this issue. Documentation is available e.g. at https://git-scm.com/docs/git-bisect.

Revision history for this message
In , chuck.burt+kernel.org (chuck.burt+kernel.org-linux-kernel-bugs) wrote :

Created attachment 260925
Output of requested command

I reproduced the hang and ran the command as requested. See attached file output-20171129.txt

Building the kernel is something I could attempt tackling, but as a newbie I'm highly likely to mess something up. Either way, it will be a few weeks before I can get to it (best case). So I _really_ hope this provides the clue needed!

Revision history for this message
In , chuck.burt+kernel.org (chuck.burt+kernel.org-linux-kernel-bugs) wrote :

Created attachment 260927
Output of requested command as su

After reading the first few lines of the last attachment, it occurred to me that running this command as su might be useful. See attached.

Revision history for this message
In , chuck.burt+kernel.org (chuck.burt+kernel.org-linux-kernel-bugs) wrote :

Created attachment 260929
Output of requested command as su

Revision history for this message
In , bvanassche (bvanassche-linux-kernel-bugs) wrote :

So command processing got stuck. Since there are two code paths in recent kernels we need to know whether or not scsi-mq was used. Hence please provide the output of the following command:

for d in /sys/block/*; do sfx=""; [ -e "$d/mq" ] && sfx=" [mq]"; echo "$d$sfx"; done

If the above command reports that scsi-mq is being used for the WDC disk, please check whether the following command resolves the lockup:

for d in /sys/kernel/debug/block/*/state; do echo kick >$d; done

Revision history for this message
In , chuck.burt+kernel.org (chuck.burt+kernel.org-linux-kernel-bugs) wrote :

> # for d in /sys/block/*; do sfx=""; [ -e "$d/mq" ] && sfx=" [mq]"; echo
> "$d$sfx"; done
> /sys/block/loop0 [mq]
> /sys/block/loop1 [mq]
> /sys/block/loop2 [mq]
> /sys/block/loop3 [mq]
> /sys/block/loop4 [mq]
> /sys/block/loop5 [mq]
> /sys/block/loop6 [mq]
> /sys/block/loop7 [mq]
> /sys/block/sda
> /sys/block/sdb
> /sys/block/sdc
> /sys/block/sr0

Revision history for this message
In , bvanassche (bvanassche-linux-kernel-bugs) wrote :

That's weird, there are no known queue lockup bugs in the legacy block/SCSI core layers. Is the WDC harddisk perhaps controlled by a HBA? Can you provide the output of lspci (run as root)?

Revision history for this message
In , chuck.burt+kernel.org (chuck.burt+kernel.org-linux-kernel-bugs) wrote :

Created attachment 260953
Output of lspci on 4.10.x kernel

I won't be able to boot into the newer kernel for about a week, however since `lspci` is hardware-oriented, sharing the output under the older kernel in case it's helpful. Please let me know if you want me to run it on the new one instead and I'll get it when I can.

Revision history for this message
In , bvanassche (bvanassche-linux-kernel-bugs) wrote :

My hope was that the list of PCI devices would show a PCI HBA of which the driver has been modified recently. Since that's not the case I'm out of ideas about what could be the root cause of this bug. Unless someone else has an idea about how to find the root cause of this issue I think your only option is to perform a bisect of the Linux kernel.

Revision history for this message
In , chuck.burt+kernel.org (chuck.burt+kernel.org-linux-kernel-bugs) wrote :

Created attachment 274895
Git Bisect Log 1 - 20180323

Revision history for this message
In , chuck.burt+kernel.org (chuck.burt+kernel.org-linux-kernel-bugs) wrote :

Created attachment 274897
Git Bisect Log 2 - 20180323

Revision history for this message
In , chuck.burt+kernel.org (chuck.burt+kernel.org-linux-kernel-bugs) wrote :

I finally got around to bisecting.

I had to do it twice as I identified two issues here.

Git Bisect Log 1 - https://bugzilla.kernel.org/attachment.cgi?id=274895
This identifies the commit where processes would full on hang as a result of the drive being connected.

Git Bisect Log 2 - https://bugzilla.kernel.org/attachment.cgi?id=274897
This identifies a separate issue (should I file a separate bug for this?) where mounting/unmounting caused error:

> Device /dev/sdb3 is already mounted at `/media/temp/[identifier]`.
> (udisks-error-quark, 6)

Revision history for this message
In , bvanassche (bvanassche-linux-kernel-bugs) wrote :

Thanks for having run a bisect, that really helps.

Recently the following commit went upstream:

commit c9f926000fe3b84135a81602a9f7e63a6a7898e2 (mkp-scsi/4.15/scsi-fixes)
Author: Hannes Reinecke <email address hidden>
Date: Wed Jan 10 09:34:02 2018 +0100

    scsi: libsas: Disable asynchronous aborts for SATA devices

    Handling CD-ROM devices from libsas is decidedly odd, as libata relies
    on SCSI EH to be started to figure out that no medium is present. So we
    cannot do asynchronous aborts for SATA devices.

    Fixes: 909657615d9 ("scsi: libsas: allow async aborts")
    Cc: <email address hidden> # 4.12+
    Signed-off-by: Hannes Reinecke <email address hidden>
    Reviewed-by: Christoph Hellwig <email address hidden>
    Tested-by: Yves-Alexis Perez <email address hidden>
    Signed-off-by: Martin K. Petersen <email address hidden>

So you may want to try one of the kernel versions that includes that fix, e.g. v4.14.15 or v4.15.

Revision history for this message
In , chuck.burt+kernel.org (chuck.burt+kernel.org-linux-kernel-bugs) wrote :

I tested with v4.15.0-041500. Great news, the hang is resolved!

The second issue I found still exists, but that is not nearly as severe (it doesn't block my usage). It also occurs on more drives. Should I break that into a separate issue?

Thank you very very much for your help, Bart.

Revision history for this message
In , bvanassche (bvanassche-linux-kernel-bugs) wrote :

Sorry but I lost track. What was the second issue?

Revision history for this message
In , chuck.burt+kernel.org (chuck.burt+kernel.org-linux-kernel-bugs) wrote :

Git Bisect Log 2 - https://bugzilla.kernel.org/attachment.cgi?id=274897
This identifies a separate issue (should I file a separate bug for this?) where mounting/unmounting caused error:

> Device /dev/sdb3 is already mounted at `/media/temp/[identifier]`.
> (udisks-error-quark, 6)

Unmounting gives a similar error about being unable to unmount (I can provide the exact error in a bit if you need it).

This mounting/unmounting error still exists in the v4.15 kernel and was introduced in the commit isolated in the above bisect (Git Bisect Log 2).

Revision history for this message
In , bvanassche (bvanassche-linux-kernel-bugs) wrote :

At the end of bisect log 2 I found the following:

first bad commit: [8d65b08debc7e62b2c6032d7fe7389d895b92cbc] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next

It seems unlikely to me that any of the commits in the networking tree would cause mounting of a local filesystem to fail.

Revision history for this message
In , chuck.burt+kernel.org (chuck.burt+kernel.org-linux-kernel-bugs) wrote :

Yet it appears there were numerous revisions in the `drivers/scsi` area...?

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=8d65b08debc7e62b2c6032d7fe7389d895b92cbc

I'm a newbie, so... I could obviously be reading this completely wrong...

Revision history for this message
In , chuck.burt+kernel.org (chuck.burt+kernel.org-linux-kernel-bugs) wrote :

(Also, to clarify, the mounting does not actually fail... it produces that error as a dialog in the GUI, but mounting does actually succeed.)

Revision history for this message
In , bvanassche (bvanassche-linux-kernel-bugs) wrote :

As far as I can see merging Dave's tree pulled in only the following three SCSI changes:
* qed*: Utilize Firmware 8.15.3.0
* qedf: fix wrong le16 conversion
* netlink: extended ACK reporting

Unless you are using the qedi or qedf driver I think that's it's unlikely that these changes are related to the issue you reported.

Revision history for this message
In , chuck.burt+kernel.org (chuck.burt+kernel.org-linux-kernel-bugs) wrote :

Thank you again. Should we close this issue as duplicate / resolves elsewhere?

Revision history for this message
In , bvanassche (bvanassche-linux-kernel-bugs) wrote :

This ticket has category IO/Storage; SCSI. That category does not cover mounting filesystems. I'm fine with closing this ticket and creating a new ticket if for the mount issue if necessary.

Revision history for this message
Andy Whitcroft (apw) wrote : Closing unsupported series nomination.

This bug was nominated against a series that is no longer supported, ie artful. The bug task representing the artful nomination is being closed as Won't Fix.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu Artful):
status: Confirmed → Won't Fix
Revision history for this message
In , jigneshsharm601 (jigneshsharm601-linux-kernel-bugs) wrote :

whatever you have written in you blog is knowledgeable and any one can easily understand but if you don’t mind I would like to tell everyone if anyone Having problems Like hp printer showing offline then don’t be panic we are here to support you and solve all your issues of HP printers.
for more information visit our website.
https://www.800customersupport24x7.com/hp-printer-support/

Revision history for this message
In , jerrysmith3592 (jerrysmith3592-linux-kernel-bugs) wrote :

All the blog which is provided by you is having valuable and useful content. Many bloggers learn many things from you and enhance their writting skills. As I also write blogs and in that blogs we provide information related to Canon Printer and also provide services to resolve problems of Canon Printer. If you have any query or need any help you can use canon printer customer support or can contact to our experts or you can visit our site:- https://www.800customersupport24x7.com/canon-printer-support/

Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.