dmraid does not start on boot for single disk RAID0

Bug #1361842 reported by Jason Gunthorpe
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
dmraid (Ubuntu)
New
Undecided
Unassigned

Bug Description

I have a Lenovo server with a LSI controller that insists on having a RAID set to boot. So the BIOS is configured with a RAID0 stripe set, with a single disk:

$ dmraid -i -si
*** Group superset .ddf1_disks
--> Subset
name : ddf1_4c5349202020202080861d60000000004711471100001450
size : 974608384
stride : 128
type : stripe
status : ok
subsets: 0
devs : 1
spares : 0

Notice that 'devs' is 1.

This causes this bit of code in dm-activate to bail:

        case "$Raid_Type" in
                stripe)
                        if [ "$Raid_Nodevs" -lt 2 ]; then
                                if [ -n "$Degraded" ]; then
                                        log_error "Cannot bring up a RAID0 array in degraded mode, not all devices present."
                                fi
                                return 2
                        fi
                        ;;

Of course, the above is totally bogus, a 1 disk RAID0 is perfectly valid. I wonder if this should be testing 'status' instead?

This is a problem because of GPT partitioning. If you don't start the RAID downstream tools will attempt to partition sda. The RAID metadata at the end of the disk collides with the GPT partition backup and it ends up destroying the RAID set and making the server unbootable. The kernel hints at this condition:

[ 4.202136] GPT:Primary header thinks Alt. header is not at the end of the disk.
[ 4.202137] GPT:974608383 != 976773167
[ 4.202138] GPT:Alternate GPT header not at the end of the disk.
[ 4.202138] GPT:974608383 != 976773167

Which is 100% true, the GPT was written to the RAID, not the raw disk, and 974608383 sectors is at the end of the raid volume.

Revision history for this message
Phillip Susi (psusi) wrote :

The test in that script is wrong, but since you only have a single disk, you should be able to throw out the fake raid junk and just put the controller in AHCI mode.

Revision history for this message
Jason Gunthorpe (jgunthorpe) wrote : Re: [Bug 1361842] Re: dmraid does not start on boot for single disk RAID0

You'd think, but if this BIOS has an option to disable the LSI option ROM
and still boot in EFI mode, then it is very well hidden..

On Tue, Oct 14, 2014 at 12:27 PM, Phillip Susi <email address hidden> wrote:

> The test in that script is wrong, but since you only have a single disk,
> you should be able to throw out the fake raid junk and just put the
> controller in AHCI mode.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1361842
>
> Title:
> dmraid does not start on boot for single disk RAID0
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/dmraid/+bug/1361842/+subscriptions
>

Revision history for this message
Phillip Susi (psusi) wrote :

You can just delete the raid array in the bios utility, or use dmraid -E to erase it.

Revision history for this message
Jason Gunthorpe (jgunthorpe) wrote :

The LSI option rom will not present a drive to the EFI BIOS without a raid
label on the disc. If the array is deleted then EFI will not boot off the
drive at all. As far as I can tell there is no fall back option rom that
provides simple AHCI services to EFI.

This is the problem, because the boot script is wrong the installer and so
forth has a very high probability to destroy the RAID label when it writes
a GPT partition label, which renders the system unbootable, and unless you
actually know about this quirk the only symptom is that the boot menu from
the BIOS has no option to boot from the hard drive.

On Mon, Oct 27, 2014 at 9:18 AM, Phillip Susi <email address hidden> wrote:

> You can just delete the raid array in the bios utility, or use dmraid -E
> to erase it.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1361842
>
> Title:
> dmraid does not start on boot for single disk RAID0
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/dmraid/+bug/1361842/+subscriptions
>

Revision history for this message
Phillip Susi (psusi) wrote :

That is one broken firmware if it does not present plain, non raid disks. You should be able to work around it by booting the installer with the nodmraid option. This will have Ubuntu ignore the raid signatures and just use the drive normally.

Revision history for this message
Jason Gunthorpe (jgunthorpe) wrote :

Ignoring the raid label is the entire problem - that is what the broken
dmraid script is already doing.

Fundamentally, if /dev/sda has a valid RAID label then it *MUST* be setup
and accessed through the dmraid device and *NEVER* via /dev/sda.

Otherwise the installer will see a disc that is too big and it will destroy
the RAID label at the end of the disc, then the system will not boot.

The work around solution is to manually start dmraid before partitioning in
the installer, then the system will remain bootable, the OS partition will
not overlap the RAID label, etc. But once booted it still accesses through
/dev/sda and there is a risk that a partitioning tool will again blindly
destroy the RAID label.

I can't see how it would ever be correct for boot scripts to ignore a RAID
label, if there is a label, the drive is part of a RAID set and it must be
activated through dmraid. A single disc RAID still has a label, and still
needs to be accessed through device mapper.

FWIW, RHEL gets this right and sets up dmraid on this disc.

Jason

On Tue, Oct 28, 2014 at 6:32 PM, Phillip Susi <email address hidden> wrote:

> That is one broken firmware if it does not present plain, non raid
> disks. You should be able to work around it by booting the installer
> with the nodmraid option. This will have Ubuntu ignore the raid
> signatures and just use the drive normally.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1361842
>
> Title:
> dmraid does not start on boot for single disk RAID0
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/dmraid/+bug/1361842/+subscriptions
>

Revision history for this message
Phillip Susi (psusi) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 10/28/2014 09:55 PM, Jason Gunthorpe wrote:
> Fundamentally, if /dev/sda has a valid RAID label then it *MUST* be setup
> and accessed through the dmraid device and *NEVER* via /dev/sda.
>
> Otherwise the installer will see a disc that is too big and it will destroy
> the RAID label at the end of the disc, then the system will not boot.

For MBR partitioned disks this isn't really a problem because they never really use the last bit of the disk anyhow, but for GPT, yes... that would be a problem.

> FWIW, RHEL gets this right and sets up dmraid on this disc.

Interesting.. they must have a patch that hasn't been upstreamed.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBCgAGBQJUUFmzAAoJENRVrw2cjl5RJnEH/A2Pjp+b1iD1E0ToEd1xW2DM
T5N6p2PMh4QiO9S5Widz9ouMaHVwWg2qRdFTl2lhjrjGuqWG5reql23Nj+jelHiD
yWZnYQv0HIRIzTscyXG8WAihfmqexZubHl/nBES7COWc0Qw1Zk9ZMGFZ+IF0uEK0
o2f2oH1mVD6jLF39mGJ37UzqCunLglaljZ+cbnLPSFEqmbcgLXC7IU998WnyDdn2
klhYaMez//NLXyBKBaI7U4HMVecH49LKW7uRkPvi2u2ydYGax0QQshq4wc72ArQU
374DMshk7DDh/f9oLqK5GJeXTe8ZGgnTRQn654+tHu7VWJdJeVlnKvbu4IjHq+o=
=Wq8i
-----END PGP SIGNATURE-----

Revision history for this message
Jason Gunthorpe (jgunthorpe) wrote :

On Tue, Oct 28, 2014 at 9:06 PM, Phillip Susi <email address hidden> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> On 10/28/2014 09:55 PM, Jason Gunthorpe wrote:
> > Fundamentally, if /dev/sda has a valid RAID label then it *MUST* be setup
> > and accessed through the dmraid device and *NEVER* via /dev/sda.
> >
> > Otherwise the installer will see a disc that is too big and it will
> destroy
> > the RAID label at the end of the disc, then the system will not boot.
>
> For MBR partitioned disks this isn't really a problem because they never
> really use the last bit of the disk anyhow, but for GPT, yes... that
> would be a problem.
>

Except "never really use" means "don't fill up the filesystem". If the FS
uses the last portion of the partition it will corrupt the RAID label and
restoring the RAID label will corrupt the FS. Which is pretty bad, but not
immediate.

> FWIW, RHEL gets this right and sets up dmraid on this disc.
>
> Interesting.. they must have a patch that hasn't been upstreamed.
>

Looks like they have a dracut specific version, it seems much saner, no
crazy parsing of dmraid output.

 info "Scanning for dmraid devices $DM_RAIDS"
SETS=$(dmraid -c -s)

if [ "$SETS" = "no raid disks" -o "$SETS" = "no raid sets" ]; then
    return
fi

[..]

    # scan and activate all DM RAIDS
    for s in $SETS; do
        info "Activating $s"
        dmraid -ay -i -p --rm_partitions "$s" 2>&1 | vinfo
        [ -e "/dev/mapper/$s" ] && kpartx -a -p p "/dev/mapper/$s" 2>&1 |
vinfo
    done

Jason

Revision history for this message
Phillip Susi (psusi) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/28/2014 11:31 PM, Jason Gunthorpe wrote:
> Except "never really use" means "don't fill up the filesystem". If
> the FS uses the last portion of the partition it will corrupt the
> RAID label and restoring the RAID label will corrupt the FS. Which
> is pretty bad, but not immediate.

No; typically partitioning tools do not assign every last sector on
the disk to the partition, so that last bit of disk will never be used
by the fs.

>> FWIW, RHEL gets this right and sets up dmraid on this disc.
>>
>> Interesting.. they must have a patch that hasn't been
>> upstreamed.
>>
>
> Looks like they have a dracut specific version, it seems much
> saner, no crazy parsing of dmraid output.
>

Oh, right... the problem is in the script only, not dmraid itself
right? If you manually run dmraid -ay, it correctly activates the
array right? Maybe I'll take a crack and finally fixing that script
then as that shouldn't be too hard.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUUPbXAAoJEI5FoCIzSKrwaE8H/0R0d3soXV+VeWQpr6gCKkoL
aczmKEWWgVHcv2t22fvYVrhefxuKSLZRWi0hXawq7+iLhHsuC9ZQHMxvAZvFPbiD
Ywzbhy1/bJiJgRbiyCrG7LOUWvZ6ZIS48RfffA4eY6noFqjZj/4gcHzR5kf4aeZj
ZyV8hN8SGZcQqU/grLK/a3LLYnLr1IQ9ITGyZfwPBvX+0/QnbTN9sjzmWPwBORhd
LmOMgvasySKLwWbOBDtrno0hAuXqWfJRR3Ht/3lsxmd2jNYWfNEcouKeQj/agOrO
Bd46NC5frOxNuM3nI27Z4C1yXVWwVdMqecJ3LcsTXaF460JtLaXoIMh3fzF6AcE=
=wh7o
-----END PGP SIGNATURE-----

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.