blk_update_request: I/O error when accessing a disk that is spun down

Bug #1504909 reported by Felix Matouschek
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

Hello,

I have 6 3TB SATA disks that are attached via a IBM M1015 in IT mode and the mpt2sas driver. The filesystem is zfsonlinux.

I can't name the exact 3.19 kernel version when this error began, but it was not the inital 3.19 Vivid LTS Kernel because after initally upgrading to 3.19 everything still worked.

After a recent update I have the following problem:

When disks are spun down and then accessed an error occurs which corrupts data.
When the disks are finally spun up everything works as expected, also preventing them from spinning down works as a workaround.

Could this be a NCQ bug?

Drives are 5x Seagate ST3000DM001 and 1x HGST HDN724030ALE640.

[59526.359997] sd 0:0:1:0: [sdc] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[59526.360003] sd 0:0:1:0: [sdc] CDB:
[59526.360006] Read(16): 88 00 00 00 00 00 31 28 fd 58 00 00 00 08 00 00
[59526.360022] blk_update_request: I/O error, dev sdc, sector 824769880
[59544.111090] sd 0:0:0:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[59544.111097] sd 0:0:0:0: [sdb] CDB:
[59544.111100] Read(16): 88 00 00 00 00 00 31 28 fd 50 00 00 00 08 00 00
[59544.111115] blk_update_request: I/O error, dev sdb, sector 824769872
[59544.114465] sd 0:0:4:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[59544.114468] sd 0:0:4:0: [sdf] CDB:
[59544.114469] Read(16): 88 00 00 00 00 00 31 28 fd 58 00 00 00 08 00 00
[59544.114483] blk_update_request: I/O error, dev sdf, sector 824769880
[59552.117436] sd 0:0:3:0: [sde] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[59552.117443] sd 0:0:3:0: [sde] CDB:
[59552.117446] Read(16): 88 00 00 00 00 00 31 28 fd b0 00 00 00 08 00 00
[59552.117462] blk_update_request: I/O error, dev sde, sector 824769968
[59572.951158] sd 0:0:2:0: [sdd] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[59572.951167] sd 0:0:2:0: [sdd] CDB:
[59572.951170] Read(16): 88 00 00 00 00 00 31 28 fd b0 00 00 00 08 00 00
[59572.951192] blk_update_request: I/O error, dev sdd, sector 824769968
[59572.955679] sd 0:0:5:0: [sdg] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[59572.955695] sd 0:0:5:0: [sdg] CDB:
[59572.955701] Read(16): 88 00 00 00 00 00 31 28 fd b0 00 00 00 08 00 00
[59572.955720] blk_update_request: I/O error, dev sdg, sector 824769968
[70357.782677] sd 0:0:4:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[70357.782686] sd 0:0:4:0: [sdf] CDB:
[70357.782690] Read(16): 88 00 00 00 00 00 85 c1 c9 08 00 00 00 08 00 00
[70357.782712] blk_update_request: I/O error, dev sdf, sector 2244069640
[70368.087947] sd 0:0:0:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[70368.087953] sd 0:0:0:0: [sdb] CDB:
[70368.087955] Read(16): 88 00 00 00 00 00 85 c1 c9 00 00 00 00 08 00 00
[70368.087969] blk_update_request: I/O error, dev sdb, sector 2244069632

lsb_release -rd:
Description: Ubuntu 14.04.3 LTS
Release: 14.04

Kernel Version: 3.19.0-30-generic
---
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Okt 8 18:46 seq
 crw-rw---- 1 root audio 116, 33 Okt 8 18:46 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.14.1-0ubuntu3.15
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 14.04
HibernationDevice: RESUME=/dev/mapper/eternity--vg-swap_1
InstallationDate: Installed on 2014-09-25 (380 days ago)
InstallationMedia: Ubuntu-Server 14.04.1 LTS "Trusty Tahr" - Release amd64 (20140722.3)
IwConfig:
 lo no wireless extensions.

 em1 no wireless extensions.
MachineType: Dell Inc. PowerEdge T20
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=de_DE.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.19.0-30-generic root=/dev/mapper/hostname--vg-root ro console=tty0 console=ttyS4,115200 intel_pstate=enable cgroup_enable=memory swapaccount=1 intel_iommu=on enable_mtrr_cleanup mtrr_spare_reg_nr=1 mtrr_gran_size=16M mtrr_chunk_size=64M i915.enable_rc6=0 nmi_watchdog=0
ProcVersionSignature: Ubuntu 3.19.0-30.34~14.04.1-generic 3.19.8-ckt6
RelatedPackageVersions:
 linux-restricted-modules-3.19.0-30-generic N/A
 linux-backports-modules-3.19.0-30-generic N/A
 linux-firmware 1.127.15
RfKill: Error: [Errno 2] No such file or directory
Tags: trusty
Uname: Linux 3.19.0-30-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 09/18/2014
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A05
dmi.board.name: 0VD5HY
dmi.board.vendor: Dell Inc.
dmi.board.version: A01
dmi.chassis.type: 6
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA05:bd09/18/2014:svnDellInc.:pnPowerEdgeT20:pvr01:rvnDellInc.:rn0VD5HY:rvrA01:cvnDellInc.:ct6:cvr:
dmi.product.name: PowerEdge T20
dmi.product.version: 01
dmi.sys.vendor: Dell Inc.

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1504909

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Revision history for this message
Felix Matouschek (felix-matouschek) wrote : BootDmesg.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Felix Matouschek (felix-matouschek) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Felix Matouschek (felix-matouschek) wrote : Lspci.txt

apport information

Revision history for this message
Felix Matouschek (felix-matouschek) wrote : Lsusb.txt

apport information

Revision history for this message
Felix Matouschek (felix-matouschek) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Felix Matouschek (felix-matouschek) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Felix Matouschek (felix-matouschek) wrote : ProcModules.txt

apport information

Revision history for this message
Felix Matouschek (felix-matouschek) wrote : UdevDb.txt

apport information

Revision history for this message
Felix Matouschek (felix-matouschek) wrote : UdevLog.txt

apport information

Revision history for this message
Felix Matouschek (felix-matouschek) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
penalvch (penalvch)
tags: added: bios-outdated-a06
penalvch (penalvch)
Changed in linux (Ubuntu):
importance: Undecided → Low
status: Confirmed → Incomplete
Revision history for this message
Felix Matouschek (felix-matouschek) wrote :

Hello Christopher,

I updated the BIOS of the machine to version A06, I also updated the LSI 2008 card to FW version 20.00.04.00.
There is also no FW update available for any of the drives.

Still the errors persist.

    [ 527.450007] sd 3:0:0:0: [sdb] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
    [ 527.450015] sd 3:0:0:0: [sdb] CDB:
    [ 527.450019] Write(16): 8a 00 00 00 00 00 8d 55 89 30 00 00 00 08 00 00
    [ 527.450041] blk_update_request: I/O error, dev sdb, sector 2371193136
    [ 527.454490] sd 3:0:1:0: [sdc] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
    [ 527.454502] sd 3:0:1:0: [sdc] CDB:
    [ 527.454505] Write(16): 8a 00 00 00 00 00 8d 55 89 30 00 00 00 08 00 00
    [ 527.454523] blk_update_request: I/O error, dev sdc, sector 2371193136
    [ 527.457231] sd 3:0:2:0: [sdd] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
    [ 527.457235] sd 3:0:2:0: [sdd] CDB:
    [ 527.457237] Write(16): 8a 00 00 00 00 00 8d 55 89 38 00 00 00 08 00 00
    [ 527.457255] blk_update_request: I/O error, dev sdd, sector 2371193144
    [ 527.460273] sd 3:0:3:0: [sde] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
    [ 527.460276] sd 3:0:3:0: [sde] CDB:
    [ 527.460278] Write(16): 8a 00 00 00 00 00 8d 55 89 30 00 00 00 08 00 00
    [ 527.460312] blk_update_request: I/O error, dev sde, sector 2371193136
    [ 527.463067] sd 3:0:4:0: [sdf] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
    [ 527.463070] sd 3:0:4:0: [sdf] CDB:
    [ 527.463072] Write(16): 8a 00 00 00 00 00 8d 55 89 30 00 00 00 08 00 00
    [ 527.463088] blk_update_request: I/O error, dev sdf, sector 2371193136
    [ 527.465819] sd 3:0:5:0: [sdg] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
    [ 527.465822] sd 3:0:5:0: [sdg] CDB:
    [ 527.465824] Write(16): 8a 00 00 00 00 00 8d 55 89 30 00 00 00 08 00 00
    [ 527.465839] blk_update_request: I/O error, dev sdg, sector 2371193136

I can reproduce the bug by issuing "hdparm -y" for all drives, then any kind of access on the disks (for example creating a file on the zfs pool) results in one of the errors above.

Output of "sudo dmidecode -s bios-version && sudo dmidecode -s bios-release-date" is now:
    A06
    01/27/2015

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
penalvch (penalvch) wrote :

Felix Matouschek, could you please test the latest upstream kernel available from the very top line at the top of the page from http://kernel.ubuntu.com/~kernel-ppa/mainline/?C=N;O=D (the release names are irrelevant for testing, and please do not test the daily folder)? Install instructions are available at https://wiki.ubuntu.com/Kernel/MainlineBuilds . This will allow additional upstream developers to examine the issue.

If the latest kernel did not allow you to test to the issue (ex. you couldn't boot into the OS) please make a comment in your report about this, and continue to test the next most recent kernel version until you can test to the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this issue is fixed in the mainline kernel, please add the following tags by clicking on the yellow circle with a black pencil icon, next to the word Tags, located at the bottom of the report description:
kernel-fixed-upstream
kernel-fixed-upstream-X.Y-rcZ

Where X, Y, and Z are numbers corresponding to the kernel version.

If the mainline kernel does not fix the issue, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-X.Y-rcZ

Please note, an error to install the kernel does not fit the criteria of kernel-bug-exists-upstream.

Once testing of the latest upstream kernel is complete, please mark this report's Status as Confirmed. Please let us know your results.

Thank you for your understanding.

tags: added: latest-bios-a06 latest-lsi-firmware
removed: bios-outdated-a06
Changed in linux (Ubuntu):
importance: Low → Medium
status: Confirmed → Incomplete
tags: added: regression-update
Revision history for this message
Felix Matouschek (felix-matouschek) wrote :

Hello Christopher,

the errors still occur with kernel 4.3-rc5-unstable installed.

tags: added: kernel-bug-exists-upstream kernel-bug-exists-upstream-4.3-rc5
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
penalvch (penalvch) wrote :

Felix Matouschek, the next step is to fully commit bisect from kernel 3.19.x where this problem didn't occur to 3.19.0-30 in order to identify the last good kernel commit, followed immediately by the first bad one. This will allow for a more expedited analysis of the root cause of your issue. Could you please do this following https://wiki.ubuntu.com/Kernel/KernelBisection ?

Please note, finding adjacent kernel versions is not fully commit bisecting.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

tags: added: needs-bisect
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Felix Matouschek (felix-matouschek) wrote :

Hello Christopher,

the oldest version I could find is 3.19.0-18-generic. Unfortunately it seems I was wrong, the error is also present in this version.

What now?

Revision history for this message
Felix Matouschek (felix-matouschek) wrote :

Just tested... the error also occurs with latest Utopic LTS Kernel. :-(

Revision history for this message
penalvch (penalvch) wrote :

Felix Matouschek, how about older Utopic kernels?

Revision history for this message
Felix Matouschek (felix-matouschek) wrote :

Hi Christopher,

tested it with 3.16.0-25-generic. The errors still occur.

I am 100% certain that it worked at one point with that kernel. Could it be zfsonlinux changes?!

penalvch (penalvch)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Felix Matouschek (felix-matouschek) wrote :

Hi Christopher,

I found some interesting mailing list entries..

http://<email address hidden>/msg14937.html

Apart from not outputting the "Device not ready" message this seems to be exactly my bug.

Is commit d3ed1731ba6bca32ac0ac96377cae8fd735e16e6 by James Bottomley included in the Ubuntu kernel sources?

It is included in the mainline kernel since release 3.4.11

Greetings,
Felix

Revision history for this message
Flo (flo82) wrote :

any news on this?

Revision history for this message
antst (antst) wrote :

Felix,

it is indeed the same bug, as rollback of commit (which was proposed in that mail thread from 2012) is never happened. I just checked kernel sources.

Revision history for this message
antst (antst) wrote :

Actually, my fault. Nobody reverted commit which broke it, but commit which solved it back in 2012 is present in current ubuntu kernel. Hm.

Revision history for this message
Felix Matouschek (felix-matouschek) wrote :

Hello Ant,

I'm not quite sure what you mean?

I'm currently using Debian Jessie with Kernel 4.6.3, the problem is still there. So it is not limited to Ubuntu... it probably also exists in more recent releases like 16.04...

Greetings,
Felix

Revision history for this message
Hoppel (hoppel118) wrote :

hey guys,

I have the same problem with debian jessie and the current proxmox kernel 4.4.x. I reported some things about the my problem here:

https://github.com/zfsonlinux/zfs/issues/4638

There is also another interesting bug report:

https://github.com/zfsonlinux/zfs/issues/3785

Greetings Hoppel

Revision history for this message
Splitice (mat999) wrote :

I can confirm that this still occurs on the Ubuntu 4.8.0-58-generic #63~16.04.1 kernel.

I have tested this on brand new drives (2TB WD Red).

Revision history for this message
Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/1504909

tags: added: iso-testing
Revision history for this message
antst (antst) wrote :

I observe the same on Ubuntu 20.04 (5.4.0-29-generic) issue appeared only after upgrade from 18.04, which was flawless with the same hardware for 2 years.
mpt3sas (LSI3008), HGST and WD drives.

Revision history for this message
antst (antst) wrote :

While disks are working OK, sad side-effect of this is that ZFS starts to show errors. And it is not clear from notifications is it this bug, or it is actual problem with hardware, so require to waste time checking it, and scrubbing pool.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.