Disk report of Unhandled error code

Bug #1092883 reported by James Harris
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Wishlist
Unassigned

Bug Description

Repeated message such as

Dec 21 10:56:03 s01 kernel: [12597.472089] sd 8:0:0:0: [sde] Unhandled error code
Dec 21 10:56:03 s01 kernel: [12597.472093] sd 8:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 21 10:56:03 s01 kernel: [12597.472098] sd 8:0:0:0: [sde] CDB: Read(10): 28 00 6e b6 92 00 00 02 00 00
Dec 21 10:56:03 s01 kernel: [12597.472113] end_request: I/O error, dev sde, sector 1857458688

Once this occurs it is reported a number of times per second apparently on a continuous basis. The sector number does increase but I'm not sure whether the disk driver is really trying to access the disk.

To show the above log entries in context here are a few lines before them and a few after.

Dec 21 10:55:03 s01 kernel: [12537.438622] ata9.00: exception Emask 0x10 SAct 0x1 SErr 0x80000 action 0xe frozen
Dec 21 10:55:03 s01 kernel: [12537.438628] ata9.00: irq_stat 0x00100010, PHY RDY changed
Dec 21 10:55:03 s01 kernel: [12537.438633] ata9: SError: { 10B8B }
Dec 21 10:55:03 s01 kernel: [12537.438638] ata9.00: failed command: READ FPDMA QUEUED
Dec 21 10:55:03 s01 kernel: [12537.438646] ata9.00: cmd 60/00:00:00:92:b6/02:00:6e:00:00/40 tag 0 ncq 262144 in
Dec 21 10:55:03 s01 kernel: [12537.438648] res f2/36:00:00:00:00/00:00:00:01:f2/00 Emask 0x12 (ATA bus error)
Dec 21 10:55:03 s01 kernel: [12537.438652] ata9.00: status: { Busy }
Dec 21 10:55:03 s01 kernel: [12537.438655] ata9.00: error: { IDNF ABRT }
Dec 21 10:55:03 s01 kernel: [12537.438662] ata9: hard resetting link
Dec 21 10:55:13 s01 kernel: [12547.444041] ata9: softreset failed (timeout)
Dec 21 10:55:13 s01 kernel: [12547.444048] ata9: hard resetting link
Dec 21 10:55:23 s01 kernel: [12557.456048] ata9: softreset failed (timeout)
Dec 21 10:55:23 s01 kernel: [12557.456055] ata9: hard resetting link
Dec 21 10:55:58 s01 kernel: [12592.460049] ata9: softreset failed (timeout)
Dec 21 10:55:58 s01 kernel: [12592.460057] ata9: limiting SATA link speed to 1.5 Gbps
Dec 21 10:55:58 s01 kernel: [12592.460061] ata9: hard resetting link
Dec 21 10:56:03 s01 kernel: [12597.472038] ata9: softreset failed (timeout)
Dec 21 10:56:03 s01 kernel: [12597.472046] ata9: reset failed, giving up
Dec 21 10:56:03 s01 kernel: [12597.472050] ata9.00: disabled
Dec 21 10:56:03 s01 kernel: [12597.472064] ata9: EH complete
Dec 21 10:56:03 s01 kernel: [12597.472089] sd 8:0:0:0: [sde] Unhandled error code
Dec 21 10:56:03 s01 kernel: [12597.472093] sd 8:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 21 10:56:03 s01 kernel: [12597.472098] sd 8:0:0:0: [sde] CDB: Read(10): 28 00 6e b6 92 00 00 02 00 00
Dec 21 10:56:03 s01 kernel: [12597.472113] end_request: I/O error, dev sde, sector 1857458688
Dec 21 10:56:03 s01 kernel: [12597.472250] sd 8:0:0:0: [sde] Unhandled error code
Dec 21 10:56:03 s01 kernel: [12597.472254] sd 8:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 21 10:56:03 s01 kernel: [12597.472258] sd 8:0:0:0: [sde] CDB: Read(10): 28 00 6e b6 92 00 00 00 08 00
Dec 21 10:56:03 s01 kernel: [12597.472271] end_request: I/O error, dev sde, sector 1857458688
Dec 21 10:56:03 s01 kernel: [12597.472378] sd 8:0:0:0: [sde] Unhandled error code
Dec 21 10:56:03 s01 kernel: [12597.472381] sd 8:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 21 10:56:03 s01 kernel: [12597.472386] sd 8:0:0:0: [sde] CDB: Read(10): 28 00 6e b6 92 08 00 00 08 00
Dec 21 10:56:03 s01 kernel: [12597.472398] end_request: I/O error, dev sde, sector 1857458696
Dec 21 10:56:03 s01 kernel: [12597.473888] sd 8:0:0:0: [sde] Unhandled error code
Dec 21 10:56:03 s01 kernel: [12597.473891] sd 8:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 21 10:56:03 s01 kernel: [12597.473896] sd 8:0:0:0: [sde] CDB: Read(10): 28 00 6e b6 92 10 00 00 08 00
Dec 21 10:56:03 s01 kernel: [12597.473910] end_request: I/O error, dev sde, sector 1857458704
Dec 21 10:56:03 s01 kernel: [12597.473953] sd 8:0:0:0: [sde] Unhandled error code
Dec 21 10:56:03 s01 kernel: [12597.473957] sd 8:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 21 10:56:03 s01 kernel: [12597.473962] sd 8:0:0:0: [sde] CDB: Read(10): 28 00 6e b6 92 18 00 00 08 00
Dec 21 10:56:03 s01 kernel: [12597.473976] end_request: I/O error, dev sde, sector 1857458712
Dec 21 10:56:03 s01 kernel: [12597.474014] sd 8:0:0:0: [sde] Unhandled error code
Dec 21 10:56:03 s01 kernel: [12597.474017] sd 8:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 21 10:56:03 s01 kernel: [12597.474022] sd 8:0:0:0: [sde] CDB: Read(10): 28 00 6e b6 92 20 00 00 08 00
Dec 21 10:56:03 s01 kernel: [12597.474035] end_request: I/O error, dev sde, sector 1857458720
Dec 21 10:56:03 s01 kernel: [12597.474073] sd 8:0:0:0: [sde] Unhandled error code
Dec 21 10:56:03 s01 kernel: [12597.474087] sd 8:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 21 10:56:03 s01 kernel: [12597.474091] sd 8:0:0:0: [sde] CDB: Read(10): 28 00 6e b6 92 28 00 00 08 00
Dec 21 10:56:03 s01 kernel: [12597.474103] end_request: I/O error, dev sde, sector 1857458728
Dec 21 10:56:03 s01 kernel: [12597.474137] sd 8:0:0:0: [sde] Unhandled error code
Dec 21 10:56:03 s01 kernel: [12597.474140] sd 8:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 21 10:56:03 s01 kernel: [12597.474144] sd 8:0:0:0: [sde] CDB: Read(10): 28 00 6e b6 92 30 00 00 08 00
Dec 21 10:56:03 s01 kernel: [12597.474155] end_request: I/O error, dev sde, sector 1857458736
Dec 21 10:56:03 s01 kernel: [12597.474189] sd 8:0:0:0: [sde] Unhandled error code
Dec 21 10:56:03 s01 kernel: [12597.474191] sd 8:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 21 10:56:03 s01 kernel: [12597.474195] sd 8:0:0:0: [sde] CDB: Read(10): 28 00 6e b6 92 38 00 00 08 00
Dec 21 10:56:03 s01 kernel: [12597.474207] end_request: I/O error, dev sde, sector 1857458744
Dec 21 10:56:03 s01 kernel: [12597.474239] sd 8:0:0:0: [sde] Unhandled error code
Dec 21 10:56:03 s01 kernel: [12597.474241] sd 8:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 21 10:56:03 s01 kernel: [12597.474245] sd 8:0:0:0: [sde] CDB: Read(10): 28 00 6e b6 92 40 00 00 08 00
Dec 21 10:56:03 s01 kernel: [12597.474257] end_request: I/O error, dev sde, sector 1857458752
Dec 21 10:56:03 s01 kernel: [12597.474289] sd 8:0:0:0: [sde] Unhandled error code
Dec 21 10:56:03 s01 kernel: [12597.474292] sd 8:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 21 10:56:03 s01 kernel: [12597.474296] sd 8:0:0:0: [sde] CDB: Read(10): 28 00 6e b6 92 48 00 00 08 00
Dec 21 10:56:03 s01 kernel: [12597.474307] end_request: I/O error, dev sde, sector 1857458760
Dec 21 10:56:03 s01 kernel: [12597.474338] sd 8:0:0:0: [sde] Unhandled error code
Dec 21 10:56:03 s01 kernel: [12597.474341] sd 8:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 21 10:56:03 s01 kernel: [12597.474345] sd 8:0:0:0: [sde] CDB: Read(10): 28 00 6e b6 92 50 00 00 08 00
Dec 21 10:56:03 s01 kernel: [12597.474357] end_request: I/O error, dev sde, sector 1857458768

This happened while running badblocks against the disk.

3) What you expected to happen
Expected kernel or driver to recognise the error (it looks like it did not know what it was or how to handle it) and know how to respond. At least an informative log report would help.

4) What happened instead
The response seems to have confused badblocks. If left alone it looks like it would have marked all following blocks as bad when this could have been a different type of error such as a communication issue.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-35-generic-pae 3.2.0-35.55
ProcVersionSignature: Ubuntu 3.2.0-35.55-generic-pae 3.2.34
Uname: Linux 3.2.0-35-generic-pae i686
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
ApportVersion: 2.0.1-0ubuntu15.1
Architecture: i386
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: mythtv 2841 F.... pulseaudio
                      a 12453 F.... pulseaudio
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xfdff4000 irq 49'
   Mixer name : 'Realtek ALC882'
   Components : 'HDA:10ec0882,147b8e01,00100101'
   Controls : 53
   Simple ctrls : 25
Date: Fri Dec 21 11:54:35 2012
HibernationDevice: RESUME=UUID=d275224e-9930-4474-abca-05dbecb38f2e
IwConfig:
 lo no wireless extensions.

 eth2 no wireless extensions.

 eth1 no wireless extensions.
MarkForUpload: True
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_GB.utf8
 SHELL=/bin/bash
ProcFB: 0 nouveaufb
ProcKernelCmdLine: root=LABEL=hostname-root ro
PulseList:
 Error: command ['pacmd', 'list'] failed with exit code 1: Home directory /home/mythtv not ours.
 No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-35-generic-pae N/A
 linux-backports-modules-3.2.0-35-generic-pae N/A
 linux-firmware 1.79.1
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to precise on 2012-12-20 (0 days ago)
dmi.bios.date: 09/08/2006
dmi.bios.vendor: Phoenix Technologies, LTD
dmi.bios.version: 6.00 PG
dmi.board.name: AB9/AB9RPO(Intel965+ICH8)
dmi.board.vendor: http://www.abit.com.tw/
dmi.board.version: 1.x
dmi.chassis.type: 3
dmi.modalias: dmi:bvnPhoenixTechnologies,LTD:bvr6.00PG:bd09/08/2006:svn:pn:pvr:rvnhttp//www.abit.com.tw/:rnAB9/AB9RPO(Intel965+ICH8):rvr1.x:cvn:ct3:cvr:

Revision history for this message
James Harris (james-harris-1-h) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Fabio Marconi (fabiomarconi) wrote :

your hard disk is in a bad state.
I close this report
---
Ubuntu Bug Squad volunteer triager
http://wiki.ubuntu.com/BugSquad

tags: added: hardware-error
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
James Harris (james-harris-1-h) wrote :

This shouldn't have been closed as a disk issue. The bug report wasn't for the disk error but that Linux didn't handle the error! That handling could be as simple as reporting what was wrong but if Linux doesn't recognise the error code it can never know if it is handling it correctly.

Changed in linux (Ubuntu):
status: Invalid → New
Revision history for this message
Brad Figg (brad-figg) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
James Harris (james-harris-1-h) wrote :
Download full text (4.6 KiB)

Not sure if it is related but the following has happened since recent upgrade to kernel 3.2.0-35-generic-pae. Once it happens the machine has long pauses between certain operations (GUI updates like alt-tab, responding to command-line commands like lsb_release -a, and file accesses such as directory reads) as if it waits for something and times out before responding.

Again, is it possible that disk errors are not being properly understood by the kernel or drivers?

Jan 2 16:05:26 s01 kernel: [ 3600.304094] INFO: task flush-8:32:3521 blocked for more than 120 seconds.
Jan 2 16:05:26 s01 kernel: [ 3600.304098] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 2 16:05:26 s01 kernel: [ 3600.304102] flush-8:32 D 00000000 0 3521 2 0x00000000
Jan 2 16:05:26 s01 kernel: [ 3600.304109] cf53bba8 00000046 f6b59658 00000000 c88340b8 c180b020 c1931e00 c1931e00
Jan 2 16:05:26 s01 kernel: [ 3600.304119] 989c16d8 00000327 f7bbae00 e0e88000 c180b020 00164400 00000282 00000006
Jan 2 16:05:26 s01 kernel: [ 3600.304128] 00000010 e0872200 f6b59658 cf53bb94 f6b59658 00000010 cf53bbb0 00000246
Jan 2 16:05:26 s01 kernel: [ 3600.304138] Call Trace:
Jan 2 16:05:26 s01 kernel: [ 3600.304148] [<c15a6445>] schedule+0x35/0x50
Jan 2 16:05:26 s01 kernel: [ 3600.304153] [<c15a64d8>] io_schedule+0x78/0xb0
Jan 2 16:05:26 s01 kernel: [ 3600.304158] [<c129726d>] get_request_wait+0xbd/0x190
Jan 2 16:05:26 s01 kernel: [ 3600.304165] [<c12a8343>] ? cfq_merge+0x63/0x90
Jan 2 16:05:26 s01 kernel: [ 3600.304171] [<c107a090>] ? add_wait_queue+0x50/0x50
Jan 2 16:05:26 s01 kernel: [ 3600.304175] [<c1297f82>] blk_queue_bio+0x62/0x2e0
Jan 2 16:05:26 s01 kernel: [ 3600.304190] [<c12950d9>] generic_make_request.part.47+0x59/0x90
Jan 2 16:05:26 s01 kernel: [ 3600.304194] [<c1296cc7>] generic_make_request+0x57/0x60
Jan 2 16:05:26 s01 kernel: [ 3600.304198] [<c1296d3f>] submit_bio+0x6f/0x100
Jan 2 16:05:26 s01 kernel: [ 3600.304203] [<c1174640>] ? bio_alloc_bioset+0x40/0xc0
Jan 2 16:05:26 s01 kernel: [ 3600.304208] [<c116f2a1>] submit_bh+0xd1/0x100
Jan 2 16:05:26 s01 kernel: [ 3600.304212] [<c11720bb>] __block_write_full_page+0x22b/0x390
Jan 2 16:05:26 s01 kernel: [ 3600.304216] [<c11761c0>] ? blkdev_get_blocks+0xe0/0xe0
Jan 2 16:05:26 s01 kernel: [ 3600.304220] [<c1172931>] block_write_full_page_endio+0xa1/0xe0
Jan 2 16:05:26 s01 kernel: [ 3600.304224] [<c1170310>] ? end_buffer_async_read+0x110/0x110
Jan 2 16:05:26 s01 kernel: [ 3600.304228] [<c11761c0>] ? blkdev_get_blocks+0xe0/0xe0
Jan 2 16:05:26 s01 kernel: [ 3600.304233] [<c1016000>] ? vt8237_force_hpet_resume+0x40/0x70
Jan 2 16:05:26 s01 kernel: [ 3600.304237] [<c1172987>] block_write_full_page+0x17/0x20
Jan 2 16:05:26 s01 kernel: [ 3600.304241] [<c1170310>] ? end_buffer_async_read+0x110/0x110
Jan 2 16:05:26 s01 kernel: [ 3600.304245] [<c11756c4>] blkdev_writepage+0x14/0x20
Jan 2 16:05:26 s01 kernel: [ 3600.304249] [<c10fe490>] __writepage+0x10/0x40
Jan 2 16:05:26 s01 kernel: [ 3600.304253] [<c10fec53>] write_cache_pages+0x193/0x3f0
Jan 2 16:05:26 s01 kernel: [ 3600.304258] [<c10fe480>] ? set_page_dirty_lock+0x50/0x50
Jan 2 16:05:26 s01 ke...

Read more...

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.8 kernel[0] (Not a kernel in the daily directory) and install both the linux-image and linux-image-extra .deb packages.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8-rc1-raring/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
James Harris (james-harris-1-h) wrote :

Will try to test the upstream kernel and then report back.

tags: added: kernel-unable-to-test-upstream
Revision history for this message
James Harris (james-harris-1-h) wrote :

Updated to latest kernel. At the time I downloaded it the latest was slightly after the one Joseph pointed me to so used the latest one. The stanza it added to menu.lst was

title Ubuntu 12.04.1 LTS, kernel 3.8.0-030800rc2-generic
root (hd0,2)
kernel /boot/vmlinuz-3.8.0-030800rc2-generic root=LABEL=s01-root ro
initrd /boot/initrd.img-3.8.0-030800rc2-generic
quiet

Unfortunately - the machine behaved oddly on the updated kernel. Even keyboard input was garbled. Had to use power button to reboot. So was unable to verify whether the problem exists on the new kernel. Sorry. As directed have added the tag kernel-unable-to-test-upstream.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
James Harris (james-harris-1-h) wrote :

The issue occurred again. Will try to make some sense of it. Whatever is causing the initial failure of access to one of the disks, when the error occurs the kernel or driver subsystem seems to go into an infinite loop (or a number of infinite loops - see below). Even if the disk is pulled from the machine (SATA, hot swap) the kernel continues to emit the error messages and does not recognise removal of the device.

After the kernel reports:

Jan 29 22:17:15 s01 kernel: [772716.664170] ata9.00: disabled
Jan 29 22:17:15 s01 kernel: [772716.664176] ata9: EH complete

it then starts what seem to be a number of loops - possibly because there are different disk requests outstanding (but that is just a guess) - each reporting:

Jan 29 22:17:15 s01 kernel: [772716.664204] sd 8:0:0:0: [sdg] Unhandled error code
Jan 29 22:17:15 s01 kernel: [772716.664207] sd 8:0:0:0: [sdg] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK

Here are some examples of lines that were reported thousands of times (about 70 reports per second) until the machine was rebooted.

Jan 29 22:17:15 s01 kernel: [772716.664331] end_request: I/O error, dev sdg, sector 4096
Jan 29 22:17:15 s01 kernel: [772716.672247] end_request: I/O error, dev sdg, sector 3388584944

Both of the above were always to the same sector.

Jan 29 22:17:15 s01 kernel: [772716.685014] EXT2-fs (sdg2): error: read_block_bitmap: Cannot read block bitmap - block_group = 12926, block_bitmap = 423572606

This was always the same block_bitmap number.

HTH,
James

penalvch (penalvch)
tags: added: bios-outdated needs-upstream-testing regression-potential
penalvch (penalvch)
description: updated
Revision history for this message
penalvch (penalvch) wrote :

James Harris, this would be a clear cut case of HDD failure, with your log substantiating this:
end_request: I/O error, dev sde, sector 1857458688

There is no more simplifying it, or changing the log output, I/O error is I/O error.

You are welcome to use Checkbox as noted in https://help.ubuntu.com/community/Checkbox to further substantiate this, but it would just be confirming what is known already.

Please feel free to report any future bugs you may find.

Changed in linux (Ubuntu):
importance: Medium → Wishlist
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.