RAID1 array affected by non-RAID swappable IDE disk
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Expired
|
Undecided
|
Unassigned |
Bug Description
I'm running a server machine with Ubuntu 8.04 LTS installed as RAID1.
$ df
Filesystem 1K-blocks Used Available Use% Mounted on
[...]
/dev/md1 482495704 50322380 407856944 11% /
/dev/md0 186555 60819 116104 35% /boot
[...]
This worked well until I added a hot swappable IDE drive.
AfterI installed the additional IDE drive I powered the machine up without problems.
I did not mount any partition of this drive but "fdisk /dev/sdb" showed the partitions on this drive as expected.
When I removed the swappable drive one partition of the RAID1 drives was disabled as well:
[ 2259.642911] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
[ 2259.643041] ata1.01: cmd 25/00:08:
[ 2259.643054] res 40/00:00:
[ 2259.643130] ata1.01: status: { DRDY }
[ 2259.643249] ata1: soft resetting link
[ 2264.841449] ata1: port is slow to respond, please be patient (Status 0x80)
[ 2269.700106] ata1: SRST failed (errno=-16)
[ 2269.700184] ata1: soft resetting link
[ 2299.861884] ata1.01: qc timeout (cmd 0xec)
[ 2299.861926] ata1.01: failed to IDENTIFY (I/O error, err_mask=0x5)
[ 2299.861948] ata1.01: revalidation failed (errno=-5)
[ 2299.862014] ata1: failed to recover some devices, retrying in 5 secs
[ 2304.870541] ata1: soft resetting link
[ 2310.069057] ata1: port is slow to respond, please be patient (Status 0x80)
[ 2314.927727] ata1: SRST failed (errno=-16)
[ 2314.927806] ata1: soft resetting link
[ 2345.089504] ata1.01: qc timeout (cmd 0xec)
[ 2345.089544] ata1.01: failed to IDENTIFY (I/O error, err_mask=0x5)
[ 2345.089566] ata1.01: revalidation failed (errno=-5)
[ 2345.089632] ata1: failed to recover some devices, retrying in 5 secs
[ 2350.098151] ata1: soft resetting link
[ 2355.296723] ata1: port is slow to respond, please be patient (Status 0x80)
[ 2360.155350] ata1: SRST failed (errno=-16)
[ 2360.155428] ata1: soft resetting link
[ 2390.317147] ata1.01: qc timeout (cmd 0xec)
[ 2390.317189] ata1.01: failed to IDENTIFY (I/O error, err_mask=0x5)
[ 2390.317211] ata1.01: revalidation failed (errno=-5)
[ 2390.317278] ata1.01: disabled
[ 2390.838537] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40)
[ 2390.838575] ata1.00: revalidation failed (errno=-5)
[ 2390.838640] ata1: failed to recover some devices, retrying in 5 secs
[ 2395.845722] ata1: soft resetting link
[ 2401.044150] ata1: port is slow to respond, please be patient (Status 0x80)
[ 2405.902816] ata1: SRST failed (errno=-16)
[ 2405.902895] ata1: soft resetting link
[ 2436.124561] ata1.00: qc timeout (cmd 0xec)
[ 2436.124603] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 2436.124625] ata1.00: revalidation failed (errno=-5)
[ 2436.124691] ata1: failed to recover some devices, retrying in 5 secs
[ 2441.133225] ata1: soft resetting link
[ 2446.331733] ata1: port is slow to respond, please be patient (Status 0x80)
[ 2451.190404] ata1: SRST failed (errno=-16)
[ 2451.190482] ata1: soft resetting link
[ 2481.412208] ata1.00: qc timeout (cmd 0xec)
[ 2481.412251] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 2481.412274] ata1.00: revalidation failed (errno=-5)
[ 2481.412481] ata1.00: disabled
[ 2482.281909] ata1: soft resetting link
[ 2487.480456] ata1: port is slow to respond, please be patient (Status 0x80)
[ 2492.339127] ata1: SRST failed (errno=-16)
[ 2492.339204] ata1: soft resetting link
[ 2492.510749] ata1: EH complete
[ 2492.510912] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 2492.510947] end_request: I/O error, dev sdb, sector 488397160
[ 2492.510972] Buffer I/O error on device sdb, logical block 61049645
[ 2492.511092] sd 1:0:0:0: [sda] Result: hostbyte=
[ 2492.511121] end_request: I/O error, dev sda, sector 973040792
[ 2492.511173] md: super_written gets error=-5, uptodate=0
[ 2492.511200] raid1: Disk failure on sda2, disabling device.
[ 2492.511209] Operation continuing on 1 devices
[ 2492.511627] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 2492.511692] end_request: I/O error, dev sdb, sector 488397160
[ 2492.511713] Buffer I/O error on device sdb, logical block 61049645
[ 2492.534440] RAID1 conf printout:
[ 2492.534476] --- wd:1 rd:2
[ 2492.534495] disk 0, wo:1, o:0, dev:sda2
[ 2492.534511] disk 1, wo:0, o:1, dev:sdc2
[ 2492.559127] RAID1 conf printout:
[ 2492.559160] --- wd:1 rd:2
[ 2492.559180] disk 1, wo:0, o:1, dev:sdc2
I placed the swappable drive again in the bay but it was never made available to the system so that I was not able to access the drive again using "fdisk /dev/sdb":
[ 3457.576385] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3457.576437] end_request: I/O error, dev sdb, sector 0
[ 3457.576459] Buffer I/O error on device sdb, logical block 0
[ 3457.576538] Buffer I/O error on device sdb, logical block 1
[ 3457.576593] Buffer I/O error on device sdb, logical block 2
[ 3457.576646] Buffer I/O error on device sdb, logical block 3
[ 3457.580557] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3457.580607] end_request: I/O error, dev sdb, sector 0
[ 3457.580629] Buffer I/O error on device sdb, logical block 0
[ 3457.583139] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3457.583189] end_request: I/O error, dev sdb, sector 488397160
[ 3457.583212] Buffer I/O error on device sdb, logical block 61049645
[ 3457.585439] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3457.585487] end_request: I/O error, dev sdb, sector 488397160
[ 3457.585510] Buffer I/O error on device sdb, logical block 61049645
[ 3457.588283] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3457.588334] end_request: I/O error, dev sdb, sector 0
[ 3457.588357] Buffer I/O error on device sdb, logical block 0
[ 3457.588436] Buffer I/O error on device sdb, logical block 1
[ 3457.588491] Buffer I/O error on device sdb, logical block 2
[ 3457.588658] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3457.588688] end_request: I/O error, dev sdb, sector 0
[ 3472.957180] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3472.957314] end_request: I/O error, dev sdb, sector 0
[ 3472.957334] printk: 2 messages suppressed.
[ 3472.957352] Buffer I/O error on device sdb, logical block 0
[ 3472.957432] Buffer I/O error on device sdb, logical block 1
[ 3472.957487] Buffer I/O error on device sdb, logical block 2
[ 3472.961077] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3472.961125] end_request: I/O error, dev sdb, sector 0
[ 3472.963320] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3472.963370] end_request: I/O error, dev sdb, sector 488397160
[ 3472.965205] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3472.965252] end_request: I/O error, dev sdb, sector 488397160
[ 3472.967523] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3472.967571] end_request: I/O error, dev sdb, sector 0
[ 3472.969468] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3472.969515] end_request: I/O error, dev sdb, sector 0
[ 3540.024037] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3540.024089] end_request: I/O error, dev sdb, sector 0
[ 3540.024108] printk: 9 messages suppressed.
[ 3540.024125] Buffer I/O error on device sdb, logical block 0
[ 3540.024203] Buffer I/O error on device sdb, logical block 1
[ 3540.024258] Buffer I/O error on device sdb, logical block 2
[ 3540.024311] Buffer I/O error on device sdb, logical block 3
[ 3540.028253] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3540.028301] end_request: I/O error, dev sdb, sector 0
[ 3540.028323] Buffer I/O error on device sdb, logical block 0
[ 3540.030813] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3540.030863] end_request: I/O error, dev sdb, sector 488397160
[ 3540.030886] Buffer I/O error on device sdb, logical block 61049645
[ 3540.033107] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3540.033156] end_request: I/O error, dev sdb, sector 488397160
[ 3540.033178] Buffer I/O error on device sdb, logical block 61049645
[ 3540.035756] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3540.035805] end_request: I/O error, dev sdb, sector 0
[ 3540.035827] Buffer I/O error on device sdb, logical block 0
[ 3540.035903] Buffer I/O error on device sdb, logical block 1
[ 3540.035958] Buffer I/O error on device sdb, logical block 2
[ 3540.039038] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3540.039086] end_request: I/O error, dev sdb, sector 0
[ 3570.888022] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3570.888074] end_request: I/O error, dev sdb, sector 0
[ 3570.888093] printk: 2 messages suppressed.
[ 3570.888110] Buffer I/O error on device sdb, logical block 0
[ 3570.888189] Buffer I/O error on device sdb, logical block 1
[ 3570.888243] Buffer I/O error on device sdb, logical block 2
[ 3570.888297] Buffer I/O error on device sdb, logical block 3
[ 3570.892716] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3570.892765] end_request: I/O error, dev sdb, sector 0
[ 3570.892787] Buffer I/O error on device sdb, logical block 0
[ 3570.895175] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3570.895224] end_request: I/O error, dev sdb, sector 488397160
[ 3570.895247] Buffer I/O error on device sdb, logical block 61049645
[ 3570.897494] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3570.897540] end_request: I/O error, dev sdb, sector 488397160
[ 3570.900673] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3570.900726] end_request: I/O error, dev sdb, sector 0
[ 3570.902700] sd 1:0:1:0: [sdb] Result: hostbyte=
[ 3570.902748] end_request: I/O error, dev sdb, sector 0
After a reboot I detected that the second RAID partition on the drive already lost one partition was also disabled.
Basically the RAID1 was only running on one disk (sdc). I added both partitions on sda to the RAID again which triggered the rebuild on sda as expected.
Conclusion: I looks like hot swappable IDE drives cannot be used with Ubuntu 8.04 LTS if there are RAID devices used!
I'm not sure if this is related to bug 12519 but if so, there should be a fix for 8.04 LTS!
$ lsb_release -rd
Description: Ubuntu 8.04.1
Release: 8.04
$ uname -a
Linux gate 2.6.24-22-server #1 SMP Mon Nov 24 19:14:19 UTC 2008 i686 GNU/Linux
I tried if this error is reproducible: It is. I turned off the swappable drive, but no kernel message showed up with dmesg.
Only when I tried to access the no longer available drive using "fdisk /dev/sdb" this shell was frozen for some minutes. A parallel SSH login to this machine did not succeed since no shell came up after username/password.
Fdisk did not produce any output but just terminated after some minutes. Again once RAID partition was out of sync and dmesg showed the following log:
[192595.770595] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen 68:59:1c/ 00:00:1d: 00:00/f0 tag 0 dma 4096 in 00:00:00/ 00:00:00: 00:00/00 Emask 0x4 (timeout)
[192595.770718] ata1.01: cmd 25/00:08:
[192595.770730] res 40/00:00:
[192595.770804] ata1.01: status: { DRDY }
[192595.770923] ata1: soft resetting link
[192600.969062] ata1: port is slow to respond, please be patient (Status 0x80)
[192605.827700] ata1: SRST failed (errno=-16)
[192605.827771] ata1: soft resetting link
[192636.059178] ata1.01: qc timeout (cmd 0xec)
[192636.059210] ata1.01: failed to IDENTIFY (I/O error, err_mask=0x5)
[192636.059231] ata1.01: revalidation failed (errno=-5)
[192636.059296] ata1: failed to recover some devices, retrying in 5 secs
[192641.067816] ata1: soft resetting link ...