RAID1 array affected by non-RAID swappable IDE disk

Bug #313487 reported by Axel Müller
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

I'm running a server machine with Ubuntu 8.04 LTS installed as RAID1.
$ df
Filesystem 1K-blocks Used Available Use% Mounted on
[...]
/dev/md1 482495704 50322380 407856944 11% /
/dev/md0 186555 60819 116104 35% /boot
[...]

This worked well until I added a hot swappable IDE drive.
AfterI installed the additional IDE drive I powered the machine up without problems.
I did not mount any partition of this drive but "fdisk /dev/sdb" showed the partitions on this drive as expected.
When I removed the swappable drive one partition of the RAID1 drives was disabled as well:

[ 2259.642911] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
[ 2259.643041] ata1.01: cmd 25/00:08:68:59:1c/00:00:1d:00:00/f0 tag 0 dma 4096 in
[ 2259.643054] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2259.643130] ata1.01: status: { DRDY }
[ 2259.643249] ata1: soft resetting link
[ 2264.841449] ata1: port is slow to respond, please be patient (Status 0x80)
[ 2269.700106] ata1: SRST failed (errno=-16)
[ 2269.700184] ata1: soft resetting link
[ 2299.861884] ata1.01: qc timeout (cmd 0xec)
[ 2299.861926] ata1.01: failed to IDENTIFY (I/O error, err_mask=0x5)
[ 2299.861948] ata1.01: revalidation failed (errno=-5)
[ 2299.862014] ata1: failed to recover some devices, retrying in 5 secs
[ 2304.870541] ata1: soft resetting link
[ 2310.069057] ata1: port is slow to respond, please be patient (Status 0x80)
[ 2314.927727] ata1: SRST failed (errno=-16)
[ 2314.927806] ata1: soft resetting link
[ 2345.089504] ata1.01: qc timeout (cmd 0xec)
[ 2345.089544] ata1.01: failed to IDENTIFY (I/O error, err_mask=0x5)
[ 2345.089566] ata1.01: revalidation failed (errno=-5)
[ 2345.089632] ata1: failed to recover some devices, retrying in 5 secs
[ 2350.098151] ata1: soft resetting link
[ 2355.296723] ata1: port is slow to respond, please be patient (Status 0x80)
[ 2360.155350] ata1: SRST failed (errno=-16)
[ 2360.155428] ata1: soft resetting link
[ 2390.317147] ata1.01: qc timeout (cmd 0xec)
[ 2390.317189] ata1.01: failed to IDENTIFY (I/O error, err_mask=0x5)
[ 2390.317211] ata1.01: revalidation failed (errno=-5)
[ 2390.317278] ata1.01: disabled
[ 2390.838537] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40)
[ 2390.838575] ata1.00: revalidation failed (errno=-5)
[ 2390.838640] ata1: failed to recover some devices, retrying in 5 secs
[ 2395.845722] ata1: soft resetting link
[ 2401.044150] ata1: port is slow to respond, please be patient (Status 0x80)
[ 2405.902816] ata1: SRST failed (errno=-16)
[ 2405.902895] ata1: soft resetting link
[ 2436.124561] ata1.00: qc timeout (cmd 0xec)
[ 2436.124603] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 2436.124625] ata1.00: revalidation failed (errno=-5)
[ 2436.124691] ata1: failed to recover some devices, retrying in 5 secs
[ 2441.133225] ata1: soft resetting link
[ 2446.331733] ata1: port is slow to respond, please be patient (Status 0x80)
[ 2451.190404] ata1: SRST failed (errno=-16)
[ 2451.190482] ata1: soft resetting link
[ 2481.412208] ata1.00: qc timeout (cmd 0xec)
[ 2481.412251] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 2481.412274] ata1.00: revalidation failed (errno=-5)
[ 2481.412481] ata1.00: disabled
[ 2482.281909] ata1: soft resetting link
[ 2487.480456] ata1: port is slow to respond, please be patient (Status 0x80)
[ 2492.339127] ata1: SRST failed (errno=-16)
[ 2492.339204] ata1: soft resetting link
[ 2492.510749] ata1: EH complete
[ 2492.510912] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 2492.510947] end_request: I/O error, dev sdb, sector 488397160
[ 2492.510972] Buffer I/O error on device sdb, logical block 61049645
[ 2492.511092] sd 1:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 2492.511121] end_request: I/O error, dev sda, sector 973040792
[ 2492.511173] md: super_written gets error=-5, uptodate=0
[ 2492.511200] raid1: Disk failure on sda2, disabling device.
[ 2492.511209] Operation continuing on 1 devices
[ 2492.511627] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 2492.511692] end_request: I/O error, dev sdb, sector 488397160
[ 2492.511713] Buffer I/O error on device sdb, logical block 61049645
[ 2492.534440] RAID1 conf printout:
[ 2492.534476] --- wd:1 rd:2
[ 2492.534495] disk 0, wo:1, o:0, dev:sda2
[ 2492.534511] disk 1, wo:0, o:1, dev:sdc2
[ 2492.559127] RAID1 conf printout:
[ 2492.559160] --- wd:1 rd:2
[ 2492.559180] disk 1, wo:0, o:1, dev:sdc2

I placed the swappable drive again in the bay but it was never made available to the system so that I was not able to access the drive again using "fdisk /dev/sdb":

[ 3457.576385] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3457.576437] end_request: I/O error, dev sdb, sector 0
[ 3457.576459] Buffer I/O error on device sdb, logical block 0
[ 3457.576538] Buffer I/O error on device sdb, logical block 1
[ 3457.576593] Buffer I/O error on device sdb, logical block 2
[ 3457.576646] Buffer I/O error on device sdb, logical block 3
[ 3457.580557] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3457.580607] end_request: I/O error, dev sdb, sector 0
[ 3457.580629] Buffer I/O error on device sdb, logical block 0
[ 3457.583139] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3457.583189] end_request: I/O error, dev sdb, sector 488397160
[ 3457.583212] Buffer I/O error on device sdb, logical block 61049645
[ 3457.585439] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3457.585487] end_request: I/O error, dev sdb, sector 488397160
[ 3457.585510] Buffer I/O error on device sdb, logical block 61049645
[ 3457.588283] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3457.588334] end_request: I/O error, dev sdb, sector 0
[ 3457.588357] Buffer I/O error on device sdb, logical block 0
[ 3457.588436] Buffer I/O error on device sdb, logical block 1
[ 3457.588491] Buffer I/O error on device sdb, logical block 2
[ 3457.588658] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3457.588688] end_request: I/O error, dev sdb, sector 0
[ 3472.957180] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3472.957314] end_request: I/O error, dev sdb, sector 0
[ 3472.957334] printk: 2 messages suppressed.
[ 3472.957352] Buffer I/O error on device sdb, logical block 0
[ 3472.957432] Buffer I/O error on device sdb, logical block 1
[ 3472.957487] Buffer I/O error on device sdb, logical block 2
[ 3472.961077] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3472.961125] end_request: I/O error, dev sdb, sector 0
[ 3472.963320] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3472.963370] end_request: I/O error, dev sdb, sector 488397160
[ 3472.965205] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3472.965252] end_request: I/O error, dev sdb, sector 488397160
[ 3472.967523] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3472.967571] end_request: I/O error, dev sdb, sector 0
[ 3472.969468] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3472.969515] end_request: I/O error, dev sdb, sector 0
[ 3540.024037] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3540.024089] end_request: I/O error, dev sdb, sector 0
[ 3540.024108] printk: 9 messages suppressed.
[ 3540.024125] Buffer I/O error on device sdb, logical block 0
[ 3540.024203] Buffer I/O error on device sdb, logical block 1
[ 3540.024258] Buffer I/O error on device sdb, logical block 2
[ 3540.024311] Buffer I/O error on device sdb, logical block 3
[ 3540.028253] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3540.028301] end_request: I/O error, dev sdb, sector 0
[ 3540.028323] Buffer I/O error on device sdb, logical block 0
[ 3540.030813] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3540.030863] end_request: I/O error, dev sdb, sector 488397160
[ 3540.030886] Buffer I/O error on device sdb, logical block 61049645
[ 3540.033107] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3540.033156] end_request: I/O error, dev sdb, sector 488397160
[ 3540.033178] Buffer I/O error on device sdb, logical block 61049645
[ 3540.035756] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3540.035805] end_request: I/O error, dev sdb, sector 0
[ 3540.035827] Buffer I/O error on device sdb, logical block 0
[ 3540.035903] Buffer I/O error on device sdb, logical block 1
[ 3540.035958] Buffer I/O error on device sdb, logical block 2
[ 3540.039038] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3540.039086] end_request: I/O error, dev sdb, sector 0
[ 3570.888022] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3570.888074] end_request: I/O error, dev sdb, sector 0
[ 3570.888093] printk: 2 messages suppressed.
[ 3570.888110] Buffer I/O error on device sdb, logical block 0
[ 3570.888189] Buffer I/O error on device sdb, logical block 1
[ 3570.888243] Buffer I/O error on device sdb, logical block 2
[ 3570.888297] Buffer I/O error on device sdb, logical block 3
[ 3570.892716] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3570.892765] end_request: I/O error, dev sdb, sector 0
[ 3570.892787] Buffer I/O error on device sdb, logical block 0
[ 3570.895175] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3570.895224] end_request: I/O error, dev sdb, sector 488397160
[ 3570.895247] Buffer I/O error on device sdb, logical block 61049645
[ 3570.897494] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3570.897540] end_request: I/O error, dev sdb, sector 488397160
[ 3570.900673] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3570.900726] end_request: I/O error, dev sdb, sector 0
[ 3570.902700] sd 1:0:1:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
[ 3570.902748] end_request: I/O error, dev sdb, sector 0

After a reboot I detected that the second RAID partition on the drive already lost one partition was also disabled.
Basically the RAID1 was only running on one disk (sdc). I added both partitions on sda to the RAID again which triggered the rebuild on sda as expected.

Conclusion: I looks like hot swappable IDE drives cannot be used with Ubuntu 8.04 LTS if there are RAID devices used!

I'm not sure if this is related to bug 12519 but if so, there should be a fix for 8.04 LTS!

$ lsb_release -rd
Description: Ubuntu 8.04.1
Release: 8.04

$ uname -a
Linux gate 2.6.24-22-server #1 SMP Mon Nov 24 19:14:19 UTC 2008 i686 GNU/Linux

Revision history for this message
Axel Müller (axel-mueller) wrote :
Download full text (7.8 KiB)

I tried if this error is reproducible: It is. I turned off the swappable drive, but no kernel message showed up with dmesg.
Only when I tried to access the no longer available drive using "fdisk /dev/sdb" this shell was frozen for some minutes. A parallel SSH login to this machine did not succeed since no shell came up after username/password.
Fdisk did not produce any output but just terminated after some minutes. Again once RAID partition was out of sync and dmesg showed the following log:

[192595.770595] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
[192595.770718] ata1.01: cmd 25/00:08:68:59:1c/00:00:1d:00:00/f0 tag 0 dma 4096 in
[192595.770730] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[192595.770804] ata1.01: status: { DRDY }
[192595.770923] ata1: soft resetting link
[192600.969062] ata1: port is slow to respond, please be patient (Status 0x80)
[192605.827700] ata1: SRST failed (errno=-16)
[192605.827771] ata1: soft resetting link
[192636.059178] ata1.01: qc timeout (cmd 0xec)
[192636.059210] ata1.01: failed to IDENTIFY (I/O error, err_mask=0x5)
[192636.059231] ata1.01: revalidation failed (errno=-5)
[192636.059296] ata1: failed to recover some devices, retrying in 5 secs
[192641.067816] ata1: soft resetting link ...

Read more...

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Axel,

This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/lucid.

If it remains an issue, could you run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 313487

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Axel Müller (axel-mueller) wrote :
Download full text (4.5 KiB)

Ok - Because I would like to see this bug fixed I restarted my server today (after 6 month uptime) in order to boot the latest official kernel:

Linux gate 2.6.24-27-server #1 SMP Mon Feb 22 19:42:25 UTC 2010 i686 GNU/Linux

Exactly as one year ago this kernel has the same problem:

[ 533.295320] ata2.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
[ 533.295470] ata2.01: cmd 25/00:08:68:59:1c/00:00:1d:00:00/f0 tag 0 dma 4096 in
[ 533.295482] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 533.295593] ata2.01: status: { DRDY }
[ 533.295749] ata2: soft resetting link
[ 538.493339] ata2: port is slow to respond, please be patient (Status 0x80)
[ 543.351552] ata2: SRST failed (errno=-16)
[ 543.351659] ata2: soft resetting link
[ 573.500557] ata2.01: qc timeout (cmd 0xec)
[ 573.500590] ata2.01: failed to IDENTIFY (I/O error, err_mask=0x5)
[ 573.500610] ata2.01: revalidation failed (errno=-5)
[ 573.500713] ata2: failed to recover some devices, retrying in 5 secs
[ 578.508701] ata2: soft resetting link
[ 583.706741] ata2: port is slow to respond, please be patient (Status 0x80)
[ 588.564967] ata2: SRST failed (errno=-16)
[ 588.565077] ata2: soft resetting link
[ 618.714024] ata2.01: qc timeout (cmd 0xec)
[ 618.714057] ata2.01: failed to IDENTIFY (I/O error, err_mask=0x5)
[ 618.714078] ata2.01: revalidation failed (errno=-5)
[ 618.714182] ata2: failed to recover some devices, retrying in 5 secs
[ 623.722246] ata2: soft resetting link
[ 628.920340] ata2: port is slow to respond, please be patient (Status 0x80)
[ 633.778528] ata2: SRST failed (errno=-16)
[ 633.778637] ata2: soft resetting link
[ 663.927685] ata2.01: qc timeout (cmd 0xec)
[ 663.927717] ata2.01: failed to IDENTIFY (I/O error, err_mask=0x5)
[ 663.927738] ata2.01: revalidation failed (errno=-5)
[ 663.927840] ata2.01: disabled
[ 664.449072] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x40)
[ 664.449100] ata2.00: revalidation failed (errno=-5)
[ 664.449202] ata2: failed to recover some devices, retrying in 5 secs
[ 669.455733] ata2: soft resetting link
[ 674.663796] ata2: port is slow to respond, please be patient (Status 0x80)
[ 679.462080] ata2: SRST failed (errno=-16)
[ 679.462188] ata2: soft resetting link
[ 709.671323] ata2.00: qc timeout (cmd 0xec)
[ 709.671356] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 709.671376] ata2.00: revalidation failed (errno=-5)
[ 709.671480] ata2: failed to recover some devices, retrying in 5 secs
[ 714.679578] ata2: soft resetting link
[ 719.877713] ata2: port is slow to respond, please be patient (Status 0x80)
[ 724.735932] ata2: SRST failed (errno=-16)
[ 724.736040] ata2: soft resetting link
[ 754.945280] ata2.00: qc timeout (cmd 0xec)
[ 754.945313] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 754.945333] ata2.00: revalidation failed (errno=-5)
[ 754.945436] ata2.00: disabled
[ 755.875061] ata2: soft resetting link
[ 761.073101] ata2: port is slow to respond, please be patient (Status 0x80)
[ 765.931373] ata2: SRST failed (errno=-16)
[ 765.931482] ata2: soft resetting link
[ 766.098480] ata2: EH complete
[ 766.098639] sd 3:0:1:0: [sdd] Resu...

Read more...

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu release http://www.ubuntu.com/getubuntu/download . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.