dmraid & attempt to access beyond end of device

Bug #329880 reported by Jean-Louis Dupond
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: udev

udev tries to access the partition table of 1 of the dmraid disks.
And because of that the syslog gets filled with:

[ 38.168279] sda: rw=0, want=1250274689, limit=625142448
[ 38.168284] attempt to access beyond end of device
[ 38.168288] sda: rw=0, want=1250274690, limit=625142448
[ 38.168300] attempt to access beyond end of device

udev should NOT make /dev/sda1 /dev/sda2 etc if /dev/sda is part of dmraid ...

Revision history for this message
TJ (tj) wrote :

Interesting. This is something I reported and posted a patch to mainline for back in early 2006. I thought it was eventually sorted out.

Revision history for this message
TJ (tj) wrote :

See my original bug #77734 report "Disk Read Errors during boot-time caused by probe of invalid partitions"

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

udev doesn't when the device is claimed (in fact, I'm pretty sure the kernel doesn't even provide partition information) something must have regressed in the kernel

Revision history for this message
KBios (kbios) wrote :

Same here, on a nvidia raid0 device.
This also seems to slow down system boot.

Changed in linux:
status: New → Confirmed
Revision history for this message
TJ (tj) wrote :

Please attach /var/log/dmesg

Changed in linux:
assignee: nobody → intuitivenipple
Revision history for this message
KBios (kbios) wrote :

Here it is.
However, it seems to cover only the first part of the boot process. Is it right?

Revision history for this message
TJ (tj) wrote :

Yes, that is what I wanted, thank-you.

The issue is not related to udev. This is an issue with how the kernel scans block devices for partitions during start-up. I've confirmed that this issue is caused by the issue I reported in bug #77734.

Here's some significant log messages:

[ 6.136056] sda: sda1
[ 6.160201] sda: p1 exceeds device capacity

[ 6.160504] sdb: unknown partition table

[ 10.488869] attempt to access beyond end of device
[ 10.488874] sda: rw=0, want=1172134336, limit=586072368
[ 10.488876] Buffer I/O error on device sda1, logical block 1172134272

The story is this. Once a block device is found it is scanned for a partition table. If one is found it is reported partition by partition. A successful scan is reported thus (example from a non-dmraid disk):

[ 2.343465] sda: sda1 sda2 sda3 sda4

 In reporting the partition the logic traverses the partitions looking for extended partitions containing logical partitions.

If the dmraid array is striped (RAID 0) the RAID volume starts on disk sda and ends on disk sdb. The primary partition table (MBR) this case is at sector 0 of the logical RAID volume, which just happens to also be at sector 0 of sda.
The partition table entries will contain offsets into the logical RAID volume which is larger than sda.

The logic in fs/partitions/msdos.c::msdos_partition() (tries to) scan each primary and extended partition when the block device is being built.

In trying to seek to the end of sda1 the LBA sector number it is using is logical, not physical. In striped arrays it almost certainly points to a sector that is physically on sdb.

dmraid differs from md (multiple disk) RAID arrays in that md works within the limits of the physical disks (usually built from groups of partitions). The md superblock (4KB) is stored at the end of a partition (usually in the last 64KB).

dmraid was designed to support the 'fake' RAID meta-data signatures of Promise FasTrak and the like which were originally implemented in controller BIOS and Windows device drivers.

I've marked this bug a duplicate of bug #77734 and added the upstream bug reference to that bug, reopening it against upstream Linux.

Revision history for this message
TJ (tj) wrote :

Please post a report the the upstream kernel bug report at:

http://bugzilla.kernel.org/show_bug.cgi?id=7912

Revision history for this message
TJ (tj) wrote :

After some exchanges with Linus on the upstream bug I'm un-duplicating this from #77734. That bug can deal with the kernel versions that allowed the disk I/O to be attempted. It appears the current code was introduced by:

git describe --contains a168ee84c90b39ece357da127ab388f2f64db19c
v2.6.25-rc1~1160

where __generic_make_request() calls bio_check_eod() which will generate the warning in handle_bad_sector().

As this commit refactored the block code significantly I'm not sure at this point if there was similar protection in earlier code.

This bug can deal with limiting the warning messages.

Neil Brown (upstream) believes this may be fixed by commit ac0d86f5809598ddcd6bfa0ea8245ccc910e9eac:

git describe --contains ac0d86f5809598ddcd6bfa0ea8245ccc910e9eac
v2.6.28-rc1~347

and therefore will be included in Jaunty.

Neil says:

"It is certainly a problem that I have seen before. mdadm destroy all partitions in a device that it includes in an array to try to stop other code getting confused. Presumably dmraid doesn't. But from .28 it probably shouldn't need to."

It may be, if we can show this patch can resolve the issue, that it might be a candidate for an SRU back-port.

Could one of the affected users do a test with Jaunty (kernel 2.6.28) and attach the resulting /var/log/kern.log to this report for us to examine?

Changed in linux (Ubuntu):
assignee: TJ (intuitivenipple) → nobody
Revision history for this message
Jean-Louis Dupond (dupondje) wrote :

No issues with this anymore since long time :)
Can be closed

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.