Activity log for bug #77734

Date Who What changed Old value New value Message
2007-01-02 21:37:47 TJ bug added bug
2007-01-03 20:00:56 TJ description I appear to have stumbled upon a bug in the kernel that can, in certain circumstances, both cause the kernel-boot to get stuck in an endless loop, and possibly damage the IDE drives over time (based on experience). Using Edgy Eft Desktop Live CD, preparing to install to an existing Windows system. This probably occurs during an installed system-boot too, but I've not got that far as yet. Scenario: PC with a Promise FastTrak TX2000 SoftRAID controller and 4x 60GB IDE parallel ATA drives configured as RAID 10 (Mirror + Stripe) to provide one logical 120GB drive. The PC already has Windows 2003 Server installed and booting from the RAID 10, with 2 NTFS partitions. I wanted to shrink the 2nd partition to make room to install Ubuntu 6.10 from the Live CD. See my Ubuntu forums article for a detailed explanation of my experience: http://www.ubuntuforums.org/showthread.php?p=1958918 Bug: When booting Edgy from the CD the kernel loads the Promise fasttrak controller module "pdc202xx" and then probes each of the connected IDE hard drives (for a partition table?) dmraid not being loaded so its not dealing with the logical drive. Large drives use LBA addressing to overcome the CHS limitations of partition tables. If the probe finds a partition table on any drive, it then tries to seek to the starting sector of each partition (presumably to read its boot-sector system-id byte?), and also tries to seek into the last few sectors of the partition (looking for a superblock?). On a RAID 0 array where the striping causes the partition table to represent a larger logical drive, the starting and ending sector numbers of some partitions are beyond the end of the physical drive the partition table is written on. This causes the Disk Read Errors reported here. The fix would be for the probe to compare the physical number of cylinders reported by the drive (as seen by e.g. fdisk /dev/hde or fdisk /dev/hdg) to the starting/ending sector numbers for the LBA device. If the entries in the partition are beyond the end of the physical disk the probe should handle the situation gracefully (This could potentially be used as a cue to auto-loading dmraid). Once dmraid is loaded "fdisk /dev/mapper/raidarrayname" shows the correct total number of logical sectors. -------- Short extract of repetitive disk errors - usually there are hundred or thousands ------ PDC202XX: Primary channel reset. ide2: reset: success hde: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } hde: task_in_intr: error=0x04 { DriveStatusError } ide: failed opcode was: unknown end_request: I/O error, dev hde, sector 238276076 printk: 8 messages suppressed. Buffer I/O error on device hde2, logical block 47279294 hde: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } hde: task_in_intr: error=0x04 { DriveStatusError } ide: failed opcode was: unknown hde: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } I appear to have stumbled upon a bug in the kernel that can, in certain circumstances, both cause the kernel-boot to get stuck in an endless loop, and possibly damage the IDE drives over time (based on experience). Using Edgy Eft Desktop Live CD, preparing to install to an existing Windows system. This probably occurs during an installed system-boot too, but I've not got that far as yet. Scenario: PC with a Promise FastTrak TX2000 SoftRAID controller and 4x 60GB IDE parallel ATA drives configured as RAID 1+0 (Mirror + Stripe) to provide one logical 120GB drive. The PC already has Windows 2003 Server installed and booting from the RAID 1+0, with 2 NTFS partitions. I wanted to shrink the 2nd partition to make room to install Ubuntu 6.10 from the Live CD. See my Ubuntu forums article for a detailed explanation of my experience: http://www.ubuntuforums.org/showthread.php?p=1958918 Bug: When booting Edgy from the CD the kernel loads the Promise fasttrak controller module "pdc202xx" and then probes each of the connected IDE hard drives (for a partition table?) dmraid not being loaded so its not dealing with the logical drive. The RAID 1+0 120GB logical drive consists of hde+hdf mirrored to hdg+hdh, with the partiton table on hde and hdg. Large drives use LBA addressing to overcome the CHS limitations of partition tables. If the probe finds a partition table on any drive, it then tries to seek to the starting sector of each partition (presumably to read its boot-sector system-id byte?), and also tries to seek into the last few sectors of the partition (looking for a superblock?). On a RAID 0 array where the striping causes the partition table to represent a larger logical drive, the starting and ending sector numbers of some partitions are beyond the end of the physical drive the partition table is written on. This causes the Disk Read Errors reported here. The fix would be for the probe to compare the physical number of cylinders reported by the drive (as seen by e.g. fdisk /dev/hde or fdisk /dev/hdg) to the starting/ending sector numbers for the LBA device. If the entries in the partition are beyond the end of the physical disk the probe should handle the situation gracefully (This could potentially be used as a cue to auto-loading dmraid). Once dmraid is loaded "fdisk /dev/mapper/raidarrayname" shows the correct total number of logical sectors. -------- Short extract of repetitive disk errors - usually there are hundred or thousands ------ PDC202XX: Primary channel reset. ide2: reset: success hde: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } hde: task_in_intr: error=0x04 { DriveStatusError } ide: failed opcode was: unknown end_request: I/O error, dev hde, sector 238276076 printk: 8 messages suppressed. Buffer I/O error on device hde2, logical block 47279294 hde: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } hde: task_in_intr: error=0x04 { DriveStatusError } ide: failed opcode was: unknown hde: task_in_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
2007-01-25 17:41:43 TJ None: statusexplanation Assigned to more appropriate package
2007-01-31 17:47:40 TJ bug assigned to linux (upstream)
2007-01-31 17:52:46 TJ title Disk Read Errors during boot-time probe of physical softRAID drives Disk Read Errors during boot-time caused by probe of invalid partitions
2007-01-31 20:24:58 TJ bug added attachment 'msdos.c.tj.patch' (Patch for fs/partitions/msdos.c)
2007-01-31 20:29:29 TJ bug assigned to linux-source-2.6.17 (Debian)
2007-01-31 21:36:37 TJ bug added attachment 'msdos.c.tj.2.patch' (Updated patch for fs/partitions/msdos.c)
2007-01-31 21:38:37 TJ linux-source-2.6.17: status Unconfirmed In Progress
2007-01-31 21:38:37 TJ linux-source-2.6.17: assignee intuitive-nipple
2007-01-31 21:38:37 TJ linux-source-2.6.17: statusexplanation Assigned to more appropriate package Updated status to "In Progress" to reflect the availability of a universal patch for testing. Needs to be tested in systems that don't have this issue to ensure it doesn't cause any regressions.
2007-01-31 23:04:13 TJ linux: status Unconfirmed In Progress
2007-02-01 02:00:13 TJ bug added attachment 'msdos.c.tj.7.patch' (Patch revision 3)
2007-03-26 16:21:10 Tormod Volden linux-source-2.6.17: statusexplanation Updated status to "In Progress" to reflect the availability of a universal patch for testing. Needs to be tested in systems that don't have this issue to ensure it doesn't cause any regressions.
2007-07-25 20:05:21 TJ linux-source-2.6.20: status In Progress Fix Released
2007-07-25 20:06:02 TJ linux: status In Progress Fix Released
2009-02-15 23:25:40 TJ linux-source-2.6.20: status Fix Released Confirmed
2009-02-15 23:25:40 TJ linux-source-2.6.20: assignee intuitivenipple
2009-02-18 19:35:01 TJ linux: status Fix Released Unknown
2009-02-18 19:35:01 TJ linux: importance Undecided Unknown
2009-02-18 19:35:01 TJ linux: statusexplanation Fix applied to Andrew Morton's -mm tree in January 2007
2009-02-18 19:36:10 Bug Watch Updater linux: status Unknown Confirmed
2009-02-18 19:37:45 TJ bug assigned to linux (Ubuntu)
2009-02-18 19:48:10 TJ linux: status New Confirmed
2009-02-18 19:48:10 TJ linux: assignee intuitivenipple
2009-02-18 19:48:10 TJ linux: statusexplanation Confirmed as still affecting Jaunty by report in bug #329880. It appears Linus Torvalds rejected my patch when it was pushed from Andrew Morton's -mm tree to mainline in May 2007: ----------------------------- From: akpm@linux-foundation.org To: linux@tjworld.net, mm-commits@vger.kernel.org Subject: - filesystem-disk-errors-at-boot-time-caused-by-probe.patch removed from -mm tree Date: Tue, 08 May 2007 19:34:23 -0700 (Wed, 03:34 BST) The patch titled filesystem: Disk Errors at boot-time caused by probe of partitions has been removed from the -mm tree. Its filename was filesystem-disk-errors-at-boot-time-caused-by-probe.patch This patch was dropped because it was nacked ----------------------------- From: Linus Torvalds <torvalds@linux-foundation.org> To: akpm@linux-foundation.org Cc: linux@tjworld.net, bunk@stusta.de, Jens Axboe <jens.axboe@oracle.com> Subject: Re: [patch 012/455] filesystem: Disk Errors at boot-time caused by probe of partitions Date: Tue, 8 May 2007 09:19:32 -0700 (PDT) (17:19 BST) On Tue, 8 May 2007, akpm@linux-foundation.org wrote: > > From: TJ <linux@tjworld.net> I don't really like these kinds of addresses. Who is TJ? When I google for that name, I find a lot of hits, but all the links to tjworld.net are down. I also think the patch is wrong. IIRC, we cannot trust the "capacity" data, because not all disks report it correctly. If we did, we'd just do the check in read_dev_sector() instead. So I'm dropping this. I might be wrong about the capacity thing, we may have fixed it (Jens cc'd). But if the capacity is trustworthy, why not just do the trivial check in read_dev_sector to protect against invalid extended ones? And in add_partitions()? Linus -----------------------------
2009-03-28 10:11:50 Chucky Ellison bug added attachment 'dmesg.txt' (dmesg.txt)
2009-03-29 00:40:35 Chucky Ellison bug added attachment 'dmesg.2.6.29.txt' (dmesg.2.6.29.txt)
2009-03-29 21:35:48 Chucky Ellison bug added attachment 'proc.partitions.2.6.29.txt' (proc.partitions.2.6.29.txt)
2009-03-29 21:37:24 Chucky Ellison bug added attachment 'fdisk-l.2.6.29.txt' (fdisk-l.2.6.29.txt)
2009-04-28 23:54:26 Leann Ogasawara linux-source-2.6.20 (Ubuntu): status Confirmed Won't Fix
2009-07-10 19:38:18 kernel-janitor tags dmraid dmraid kj-comment
2011-01-12 21:29:02 Jeremy Foshee linux (Ubuntu): assignee TJ (intuitivenipple)
2011-01-19 10:32:17 Andy Whitcroft linux-source-2.6.17 (Debian): status New Fix Released
2011-01-19 10:33:05 Andy Whitcroft linux (Ubuntu): status Confirmed Fix Released
2011-02-03 17:20:39 Bug Watch Updater linux: importance Unknown High
2011-09-16 16:40:48 Steve Conklin linux: importance High Undecided
2011-09-16 16:40:48 Steve Conklin linux: status Confirmed New
2011-09-16 16:40:48 Steve Conklin linux: remote watch Linux Kernel Bug Tracker #7912
2011-09-16 16:40:59 Steve Conklin linux: status New Fix Released