Cannot start from dmraid device anymore

Bug #141435 reported by jens7677
24
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Undecided
Unassigned
linux-source-2.6.22 (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

Since one of the last (kernel ) updates I cannot start my Gutsy anymore. The message I got is:

device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: <email address hidden>
ERROR: isw device for volume "Volume0" broken on /dev/sdb in RAID set "isw_dfgjhjfjai_Volume0"
ERROR: isw: wrong # of devices in RAID set "isw_dfgjhjfjai_Volume0" [1/2] on /dev/sdb
device-mapper: table: 254:0: linear: dm-linear: Device lookup failed
device-mapper: ioctl: error adding target to table

and later the consequences"
ALERT! /dev/dm-1 does not exists. Dropping to a shell!

(I even have to do a cold restart, otherwise my Raid Setup appears broken during POST)

I have a fakeraid (dmraid) setup as described here: http://ubuntuforums.org/showthread.php?t=464758 and the whole system is up-to-date by now.

It seems like this one is somehow related to: https://bugs.launchpad.net/ubuntu/+source/evms/+bug/115616
But I'm not sure since I haven't installed evms nor lvm.

The raid itself is ok, Win Vista and XP on the same disk works fine and I cant mount the linux partition from the tribe 4 cd (like described in http://ubuntuforums.org/showthread.php?t=464758)

Any help is welcome, I really don't know what to do. Removing dmraid is no option...

Regards,
Jens

Revision history for this message
jens7677 (jpeters7677) wrote :

I just updated to kernel version 2.6.22-12.36-generic (boot from lice cd and chroot into my dmraid partition), but nothing changes during the next real startup :(

Revision history for this message
jens7677 (jpeters7677) wrote :
Revision history for this message
jens7677 (jpeters7677) wrote :

I just reverted the kernel to version 2.6.22-11.32. Now I can do a normal startup again. As I remember the problems started with version 11.34, thus some change between these two version screwed it up... Can somebody point me to the exact change which could break a startup from a dmraid partition?

Revision history for this message
freak007 (freak-linux4freak) wrote :
Revision history for this message
jens7677 (jpeters7677) wrote :

I got the message that isw device is broken, not unknown, so not sure. But the rest looks indeed identical, I'm using an Intel ICH9 fake raid to.

(Btw, I wrote two comments earlier that the problems started with 11.34, what I meant to say is that it started when I updated directly from 11.32 to 11.34, so the problems could started with 11.33 as well :))

Update, I just installed linux-image-generic-2.6.22-12.39, together with libdevmapper2:1.02.20-1ubuntu4 and udev 113-0ubuntu11, but still the same fatal error :(

Revision history for this message
freak007 (freak-linux4freak) wrote :

Now, I'm pretty sure it's not a kernel problem, I think it's device-mapper, dmraid, dmsetup or udev problem.
Be careful if you upgrade your gutsy today, if you install the new dmsetup package, you can't boot with 2.6.22-11.32 !

Only solution: live cd + chroot + downgrade dmsetup package

Revision history for this message
jens7677 (jpeters7677) wrote :

mmh, I'm not sure. I haven't installed dmsetup at all (and certainly wont install it now :) ). The new updates of libdevmapper and udev didn't broke my 11.32 kernel (but didn't help the 12.39 kernel to do a good startup either). My situation hasn't changed yesterday.

Revision history for this message
freak007 (freak-linux4freak) wrote :
Download full text (6.3 KiB)

After verification, it's a specific kernel bug. After upgrade, my gutsy is today up to date, and I removed dmsetup package.
After update-initramfs -k 2.6.22-11.rt -u the system runs fine with this kernel and not with 12.39.

The is some changes on kernel libata in 11.33 update (when the problem happened):

linux-source-2.6.22 (2.6.22-11.33) gutsy; urgency=low
  [Ben Collins]
  * libata: Default to hpa being overridden
  [Chuck Short]
  * [LIBATA] Add more hard drives to blacklist.
  [Matthew Garrett]
  * Add support to libata-acpi for acpi-based bay hotplug
  [Upstream Kernel Changes]
  * [pata_marvell]: Add more identifiers
 -- Ben Collins <email address hidden> Sun, 16 Sep 2007 22:13:08 -0400

And this is the actual (12.39) libata-core content about blacklists:

static const struct ata_blacklist_entry ata_device_blacklist [] = {
        /* Devices with DMA related problems under Linux */
        { "WDC AC11000H", NULL, ATA_HORKAGE_NODMA },
        { "WDC AC22100H", NULL, ATA_HORKAGE_NODMA },
        { "WDC AC32500H", NULL, ATA_HORKAGE_NODMA },
        { "WDC AC33100H", NULL, ATA_HORKAGE_NODMA },
        { "WDC AC31600H", NULL, ATA_HORKAGE_NODMA },
        { "WDC AC32100H", "24.09P07", ATA_HORKAGE_NODMA },
        { "WDC AC23200L", "21.10N21", ATA_HORKAGE_NODMA },
        { "Compaq CRD-8241B", NULL, ATA_HORKAGE_NODMA },
        { "CRD-8400B", NULL, ATA_HORKAGE_NODMA },
        { "CRD-8480B", NULL, ATA_HORKAGE_NODMA },
        { "CRD-8482B", NULL, ATA_HORKAGE_NODMA },
        { "CRD-84", NULL, ATA_HORKAGE_NODMA },
        { "SanDisk SDP3B", NULL, ATA_HORKAGE_NODMA },
        { "SanDisk SDP3B-64", NULL, ATA_HORKAGE_NODMA },
        { "SANYO CD-ROM CRD", NULL, ATA_HORKAGE_NODMA },
        { "HITACHI CDR-8", NULL, ATA_HORKAGE_NODMA },
        { "HITACHI CDR-8335", NULL, ATA_HORKAGE_NODMA },
        { "HITACHI CDR-8435", NULL, ATA_HORKAGE_NODMA },
        { "Toshiba CD-ROM XM-6202B", NULL, ATA_HORKAGE_NODMA },
        { "TOSHIBA CD-ROM XM-1702BC", NULL, ATA_HORKAGE_NODMA },
        { "CD-532E-A", NULL, ATA_HORKAGE_NODMA },
        { "E-IDE CD-ROM CR-840",NULL, ATA_HORKAGE_NODMA },
        { "CD-ROM Drive/F5A", NULL, ATA_HORKAGE_NODMA },
        { "WPI CDD-820", NULL, ATA_HORKAGE_NODMA },
        { "SAMSUNG CD-ROM SC-148C", NULL, ATA_HORKAGE_NODMA },
        { "SAMSUNG CD-ROM SC", NULL, ATA_HORKAGE_NODMA },
        { "ATAPI CD-ROM DRIVE 40X MAXIMUM",NULL,ATA_HORKAGE_NODMA },
        { "_NEC DV5800A", NULL, ATA_HORKAGE_NODMA },
        { "SAMSUNG CD-ROM SN-124","N001", ATA_HORKAGE_NODMA },
        { "Seagate STT20000A", NULL, ATA_HORKAGE_NODMA },
        { "IOMEGA ZIP 250 ATAPI", NULL, ATA_HORKAGE_NODMA }, /* temporary fix */

        /* Weird ATAPI devices */
        { "TORiSAN DVD-ROM DRD-N216", NULL, ATA_HORKAGE_MAX_SEC_128 },

        /* Devices we expec...

Read more...

Revision history for this message
freak007 (freak-linux4freak) wrote :

I found the problem, it's the hpa (host protected Area).

In the changelog of Debian's kernel 2.6.22-11.33:

linux-source-2.6.22 (2.6.22-11.33) gutsy; urgency=low
  [Ben Collins]
  * libata: Default to hpa being overridden

With 2.6.22-11.32, we have in dmesg:

[ 47.059365] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 47.061544] ata1.00: Host Protected Area detected:
[ 47.061545] current size: 293044655 sectors
[ 47.061545] native size: 293046768 sectors
[ 47.061549] ata1.00: ATA-7: WDC WD1500ADFD-00NLR0, 19.06P19, max UDMA/133
[ 47.061551] ata1.00: 293044655 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 47.064371] ata1.00: Host Protected Area detected:
[ 47.064372] current size: 293044655 sectors
[ 47.064373] native size: 293046768 sectors
[ 47.064376] ata1.00: configured for UDMA/133
[

And with kernel >= 2.6.22-11.33:

[ 60.453908] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 60.456006] ata1.00: Host Protected Area detected:
[ 60.456007] current size: 293044655 sectors
[ 60.456008] native size: 293046768 sectors
[ 60.457150] ata1.00: native size increased to 293046768 sectors
[ 60.457154] ata1.00: ATA-7: WDC WD1500ADFD-00NLR0, 19.06P19, max UDMA/133
[ 60.457156] ata1.00: 293046768 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 60.459884] ata1.00: configured for UDMA/133

We can see the difference in number of sectors at lines 5,6&7.

Changed in linux-source-2.6.22:
status: New → Confirmed
Revision history for this message
jens7677 (jpeters7677) wrote :

Yes, I can confirm that:

linux-source-2.6.22 (2.6.22-11.33) gutsy; urgency=low
   [Ben Collins]
   * libata: Default to hpa being overridden

broke the kernel for us. I did a custom kernel build from the latest ubuntu linux-source-2.6.22 (2.6.22-12.39) with the default configuration and the problem still exists. But if I do the following change:

+++ linux-source-2.6.22-2.6.22/drivers/ata/libata-core.c
@@ -89,15 +89,15 @@
 module_param_named(fua, libata_fua, int, 0444);
 MODULE_PARM_DESC(fua, "FUA support (0=off, 1=on)");

-static int ata_ignore_hpa = 1;
+static int ata_ignore_hpa = 0;
 module_param_named(ignore_hpa, ata_ignore_hpa, int, 0644);

which is the exactly revert from the linux-source-2.6.22_2.6.22-11.33.diff, all problems are gone and my system starts in a normal way.

Revision history for this message
jens7677 (jpeters7677) wrote :

ups, the line numbers are still from the diff file and not from the real diff. So please don't apply this patch directly...

Revision history for this message
Dazzer (darren-scott) wrote :

I'm having a similar problem with a Nvidia Fakeraid configuration. Under Feisty (with kernel 2.6.20-16) during boot, two devices are discovered under /dev/mapper :

/dev/mapper/nvidia_blah
/dev/mapper/nvidia_blah1

However, after upgrading to Gutsy, these are missing. Normally, these are coupled to become /dev/md1 ....
Interestingly I also have a Silicon Image Controller with a Softraid configuration, and these devices are discovered correctly with both Kernels.

During boot time with the Gutsy Kernel, errors are reported as:

device-mapper: table: 253:0: mirror: Device lookup failure
device-mapper: ioctl: error adding target to table

I know there are a couple of Bugs relating to EVMS - this is definitely something different, as EVMS was never installed on my system.
Could this be something specific with dmraid/dmsetup/udev relating to the Nvidia Nforce4 chipset ?

Luckily, the Raid mirror I'm having problems with isn't the boot device, however I can't access the data (obviously).
Rebooting to the old Kernel kinda works, but introduces other problems with the Nvidia linux restricted graphics driver.

The partitions look fine from fdisk, but the discovery issue would appear to be the main problem.

Any help would be appreciated ... I'd hate to have to downgrade.

Revision history for this message
deustech (deustech) wrote :

i solved this problem manuall by hand to change every fakeraid filesystem UUID manually (this is not desync my fakeraid raid1 array), this issue caused udedv device-mapper error because it can't work with dublicated UUID of every your drive in raid array

Revision history for this message
deustech (deustech) wrote :

you need check with 'blkid' to find out you have duplicate UUID (!) of any in your fakeraid partitions, this is one you can't map your raid device with udevd, this is not EVMS, this is UDEV issue only with dublicated UUID included in your raid array devices

Revision history for this message
jens7677 (jpeters7677) wrote :

Could you explain in detail what you changed where? My only solution was so far to rebuild the kernel with static int ata_ignore_hpa = 0; as decribed above.

Revision history for this message
deustech (deustech) wrote :

my solution was to edit UUIDs of every filesystem partition included in dmraid/fakeraid devices, so i can boot livecd and /dev/mapper appears after dmraid install

Revision history for this message
Dazzer (darren-scott) wrote :

Could you detail step-by-step how you determined which UUID related to which device, and how you altered them ?

Revision history for this message
deustech (deustech) wrote :

try 'blkid' command to find out UUID
use 'tune2fs' for ext3 filesystem, 'reiserfstune' for reiserfs filesystem, and 'xfs_admin' for xfs filesystem

my sample looks like:
$ blkid
/dev/sda1: UUID="546ea841-093b-42ea-964a-3c2e5ff3a2f3" TYPE="swap"
/dev/sda2: UUID="df8eb64d-41d3-4fa6-88b2-5d8163fe4dbe" TYPE="reiserfs"
/dev/sda3: UUID="546ea841-093b-42ea-964a-3c2e5ff3a203" TYPE="xfs"
/dev/sdb1: UUID="546ea841-093b-42ea-964a-3c2e5ff3a2f3" TYPE="swap"
/dev/sdb2: UUID="df8eb64d-41d3-4fa6-88b2-5d8163fe4dbe" TYPE="reiserfs"
/dev/sdb3: UUID="546ea841-093b-42ea-964a-3c2e5ff3a203" TYPE="xfs"
/dev/hdb1: UUID="7833498a-96aa-43fe-86e9-68647bbfe359" TYPE="swap"
/dev/hdb2: UUID="66803B6F803B4539" LABEL="CONTENT" TYPE="ntfs"

you can see e.g. there is dublicated UUIDs of /dev/sda2 and /dev/sdb2 (my root partition dmraid devices in raid1)

i have just use: (e.g.)

$sudo xfs_admin -U 546ea841-093b-42ea-964a-000000000000 /dev/sda1
$sudo reiserfstune -u df8eb64d-41d3-4fa6-88b2-000000000000 /dev/sda2

after that my UUID is not dublicate anymore
and then dmraid -ay works!! my /dev/mapper/nvidia_blahblah appears fine!

/dev/mapper/nvidia_blahblah2: UUID="df8eb64d-41d3-4fa6-88b2-000000000000" TYPE="reiserfs"
/dev/mapper/nvidia_blahblah1: UUID="546ea841-093b-42ea-964a-3c2e5ff3a2f3" TYPE="swap"
/dev/mapper/nvidia_blahblah3: UUID="546ea841-093b-42ea-964a-000000000000" TYPE="xfs"

Revision history for this message
deustech (deustech) wrote :

oops, mystype (xfs_admin for xfs partition, of course)
$sudo xfs_admin -U 546ea841-093b-42ea-964a-000000000000 /dev/sda3

you don't need to touch swaps

Revision history for this message
Dazzer (darren-scott) wrote :

Thanks - I can see the duplicates now:

/dev/sda1: UUID="32507eee-9883-4143-bae3-58762e4d4ae0" SEC_TYPE="ext2" TYPE="ext3"
/dev/sda2: UUID="06e176bf-1b0a-4616-baf0-f6f9f4965639" SEC_TYPE="ext2" TYPE="ext3"
/dev/sdb1: UUID="32507eee-9883-4143-bae3-58762e4d4ae0" SEC_TYPE="ext2" TYPE="ext3"
/dev/sdb2: UUID="06e176bf-1b0a-4616-baf0-f6f9f4965639" SEC_TYPE="ext2" TYPE="ext3"
/dev/md0: UUID="06e176bf-1b0a-4616-baf0-f6f9f4965639" SEC_TYPE="ext2" TYPE="ext3"
/dev/sdc1: UUID="5251b16e-0c72-460f-8ebe-57f6f3c60068" SEC_TYPE="ext2" TYPE="ext3"
/dev/sdg1: TYPE="ntfs" UUID="44C044A8C044A1D2"
/dev/mapper/sil_ahagaibgcjah1: UUID="5251b16e-0c72-460f-8ebe-57f6f3c60068" SEC_TYPE="ext2" TYPE="ext3"
/dev/sda3: TYPE="swap" UUID="c803f547-dd48-4498-a389-9e2f0837e297"
/dev/sdb3: TYPE="swap" UUID="44aa3de6-5a65-420d-820e-17bd96566427"

Although /sda2 and /sdb2 are the same UUID, this isn't causing a problem as softraid is being used (ie. mdadm).
However, I do use fakeraid (dmraid) for /sda1 and /sdb1 - which as you say have the same (conflicting) UUID's.

I notice though that in my case:

/dev/sda1: UUID="32507eee-9883-4143-bae3-58762e4d4ae0" SEC_TYPE="ext2" TYPE="ext3"
/dev/sdb1: UUID="32507eee-9883-4143-bae3-58762e4d4ae0" SEC_TYPE="ext2" TYPE="ext3"

... the SEC_TYPE is listed as "ext2" with the filesystem type as "ext3". Do you know which UUID rename utility to use for the "ext3" filesystem ?

Thanks.

Revision history for this message
deustech (deustech) wrote :

use tune2fs for ext3
e.g.

$sudo tune2fs -U random /dev/sda1

Revision history for this message
Dazzer (darren-scott) wrote :

Ok, I ran in the "tune2fs -U random /dev/sda1", and the two partitions now look like:

/dev/sda1: UUID="b34efba6-b725-4343-8808-0c4fd1b670e4" SEC_TYPE="ext2" TYPE="ext3"
/dev/sdb1: UUID="32507eee-9883-4143-bae3-58762e4d4ae0" SEC_TYPE="ext2" TYPE="ext3"

Separate UUID's. However, (on reboot) the error remains. dmesg output is:

device-mapper: table: 253:2: mirror: Device lookup failure
device-mapper: ioctl: error adding target to table

The /dev/mapper directory doesn't contain the Nvidia devices (even after running a dmraid -ay).

Is there anything else I should be doing to help the re-discovery of the Raid array ?

Revision history for this message
deustech (deustech) wrote :

dmraid works fine for me then no dublicate UUID (except swaps) takes place.

Revision history for this message
Dazzer (darren-scott) wrote :

I tried ensuring that there are no duplicate UUID's (even on the softraid devices). The problem remains though.
Thanks for your assistance though.

Revision history for this message
Dazzer (darren-scott) wrote :

I've just ran: "dmraid -tay -vvvv -dddd -f nvidia" on both kernels.

Interestingly, it looks like the devices are getting discovered incorrectly in the 2.6.22 kernel as /sde and /sdf - instead of /sda and /sdb !

Revision history for this message
Dazzer (darren-scott) wrote :

Here's the file for the new kernel.

Revision history for this message
Dazzer (darren-scott) wrote :

AHh - think I've fixed it. I'd appreciate it if someone could check that what I've done makes sense though !!

I noticed on reboot, that the "blkid" command was showing the problem devices as TYPE-mdraid, but with the same UUID's
Looking at the mdadm utility, I noticed that you could use the Assemble options to rename the UUID's as part of the assemble process.

I used:

sudo mdadm /dev/md1 --assemble --update=uuid /dev/sde1 /dev/sdf1

And the missing "md1" nvidia array appeared correctly. It now comes up on reboot also.
I'm still not sure why renaming the UUID's has fixed it, because even after running the assemble process with the UUID rename, both devices still have a duplicated UUID (albeit new ones!)

Also, unlike the Feisty release, these devices don't appear under /dev/mapper .. which makes me a bit worried that I'm going to get a conflict between mdadm and dmraid.

I guess I'm just going to have to give it a go and see what happens - at least the raid array is now visible.

Changed in linux-source-2.6.22:
assignee: nobody → ubuntu-kernel-team
Revision history for this message
jens7677 (jpeters7677) wrote :

I found a solution for my initial problem, see http://ubuntuforums.org/showthread.php?p=4859992.

In fact, there was never a problem for me, I just missed the module option. :)

Revision history for this message
jens7677 (jpeters7677) wrote :

There was never a problem, a default value of a module option got changed (libata ignore_hpa). I just missed to addd the original value (ignore_hpa=0) to modprobe.d.

(What is the correct status for that bug report, I hope that invalid is fine?)

Changed in linux-source-2.6.22:
status: Confirmed → Invalid
Revision history for this message
Phillip Susi (psusi) wrote :

This definitely should not be invalid. The kernel should NOT be defeating the host protected area by default, and this is causing issues with several dmraid users since the disk size recorded by the bios does not match the size reported after disabling the HPA.

Changed in linux-source-2.6.22:
status: Invalid → Triaged
Revision history for this message
Launchpad Janitor (janitor) wrote : This bug is now reported against the 'linux' package

Beginning with the Hardy Heron 8.04 development cycle, all open Ubuntu kernel bugs need to be reported against the "linux" kernel package. We are automatically migrating this bug to the new "linux" package. However, development has already began for the upcoming Intrepid Ibex 8.10 release. It would be helpful if you could test the upcoming release and verify if this is still an issue - http://www.ubuntu.com/testing . If the issue still exists, please update this report by changing the Status of the "linux" task from "Incomplete" to "New". We appreciate your patience and understanding as we make this transition. Thanks!

Revision history for this message
Phillip Susi (psusi) wrote :

Apparently it was decided to retain the broken behavior by default in libata since the old ide driver was broken in this way, and thus users upgrading from the old to the new kernel/driver could render their system unbootable.

Changed in linux:
status: Incomplete → Won't Fix
Revision history for this message
EagleDM (eagle-maximopc) wrote :

Phillip, I have a simple solution to this problem, could you pass this suggestion to the guys at Intrepid Ibex development?

This could be simply "fixed" if the LiveCD / Installer "ask" for RAID support BEFORE starting a machine.

A simple query that could tell ubuntu which parameter to ser "to override or to NOT to" .

A simple answer to a problem that it is giving nightmares to users of RAID (me included).

Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Revision history for this message
jens7677 (jpeters7677) wrote :

Some comments for the new Ubuntu 9.04 version:
Placing module options in modprobe.d isn't supported any longer so the options has no effect and once again the disk arrays appears broken after updating to 9.0.4. The easiest solution is to add libata.ignore_hpa=0 in your grub configuration.

Example:
title Ubuntu 8.10, kernel 2.6.28-11-generic
root (hd0,0)
kernel /boot/vmlinuz-2.6.28-11-generic root=/dev/mapper/isw_dfgjhjfjai_Volume01 ro libata.ignore_hpa=0
initrd /boot/initrd.img-2.6.28-11-generic

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.