Installer corrupts raid drive

Bug #191119 reported by impute on 2008-02-11
36
This bug affects 4 people
Affects Status Importance Assigned to Milestone
parted (Ubuntu)
Undecided
Unassigned

Bug Description

Boot from standard 8.04 Desktop CD.
Run installer, select default language, keyboard to get to Partitioner.
Select Manual install
  Note that raid drives from booting windows are listed separately.
  Select partition on alternate, non raid, drive to install Ubuntu on.
Edit partition and set mount point to "/"
Set login, computer name, etc. and install.
Reboot computer.
  Note that BIOS now shows first drive as not a raid member.
Run Windows.
  Note that software needs to rebuild RAID 1 mirror.
Reboot computer.
  Note that BIOS shows first drive as valid raid member again.

The problem I have is that because Ubuntu doesn't recognize the RAID drives, it corrupts one of them when it installs the default boot loader. To even figure out that the installer is doing this I need to look in the advanced settings.

It would help to have a complete list of all disk operations that will be performed (and the affected drives or partitions) clearly shown on the the final Ready to Install screen.

Another solution would be for Ubuntu to better support the Intel ICH9R controller that I happen to be using, but I think that is only a partial solution because the next software raid system that Intel comes up with might not be supported (and I think that adding better support for Intel raid is already on the wishlist).

Koen Beek (koen-beek) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. This bug did not have a package associated with it, which is important for ensuring that it gets looked at by the proper developers. You can learn more about finding the right package at [WWW] https://wiki.ubuntu.com/Bugs/FindRightPackage . I have classified this bug as a bug in ubiquity.

Greg (gregcouch) wrote :

I have the same problem. I don't know how I'm going to recover my partitions yet. The bug is apparently in the parted package since it happened while using the gparted program that supposedly calls /sbin/parted.

Greg (gregcouch) wrote :

Was assigned to ubiquity, but it looks like its /sbin/parted that does the dirty work.

Anzenketh (anzenketh) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. We are sorry that we do not always have the capacity to look at all reported bugs in a timely manner. There have been many changes in Ubuntu since that time you reported the bug and your problem may have been fixed with some of the updates. If you could test the current Ubuntu development version, this would help us a lot. If you can test it, and it is still an issue, we would appreciate if you could upload updated logs by running apport-collect <bug #>, and any other logs that are relevant for this particular issue.

Please report back if it is fixed or not.

Anzenketh (anzenketh) on 2010-02-22
Changed in parted (Ubuntu):
status: New → Incomplete

No, it is NOT fucking fixed. I can't believe it's been OVER TWO YEARS and no one has bothered to even look at this easily reproducible issue? And because of it my (and who knows how many other people's) data is hosed. What issues DO you investigate, if you don't investigate corrupted file systems during the install? I've defended you guys ad nauseum in the past, but I can't avoid it any more: you guys are just plain lazy. I guess you just want a bullet on your resume saying that you're a contributer to Ubuntu, but don't actually want to do any work. I can't just say that you're useless, because useless people just don't do anything and are a drain on society. YOU allowed people's data to be lost because of your buggy garbage that you call code, and you didn't bother to fix it because you're lazy.

Changed in parted (Ubuntu):
status: Incomplete → Confirmed

Please see bug #568183 for steps on recreating the issue.

Timo Jyrinki (timo-jyrinki) wrote :

Name-calling people is unfortunately not helping. Note that there are quite few people capable of fixing code on this kind of low level software. There is also always limited amount of people who are both able to fix the problem, who have the affected hardware and are motivated enough to fix it on their free time. Name-calling is certainly not motivating.

In the more "reality calls" perspective, this sounds like a very important bug, but there are more bugs reported in Launchpad, and generally in all the software on the planet, than anyone will be able to fix. Data corruption bugs are very important of course, and probably something that should be on the radar of the few paid developers as well, but note that there are again thousands of reported bugs per one paid developer. In this specific case, I believe it being not fixed is a result of a combination of not finding this one skilled and motivated free-time contributor and not affecting anyone of the paid developers directly (who would potentially have the motivation to at least try to find the one person that could fix it).

Finally, please note that the bug seems to be in parted software, which is not developed by Ubuntu developers as such at all. The actual project and its bug reporting tool is at http://www.gnu.org/software/parted/index.shtml

I forgot to mention in my first post that once the problem shows itself, it persists indefinitely across attempts to install Ubuntu. The way I was able to fix the problem at that point was to physically zero out the defunct file system on the disk. This should be obvious, but I'll say it anyway: if zeroing out the defunct file system fixes the problem, then it stands to reason that the cause of the problem is the presence of the defunct file system. In other words, the installer is detecting the defunct file system and screwing up as a result.

Dave Howorth (dhoworth) wrote :

Reading the original report here, I think it is a possibility that the problem was caused by grub overwriting part of the fakeRAID control area. A contributing factor may be a hardware environment that causes different drive orders to be recognized by BIOS, grub and kernel(s).

I don't know this for a fact but I offer it as a suggestion for an area to investigate. If so, this would mean it is a different problem to bug 568183

Logic 101:
There is nothing in this report that would even remotely suggest that it's a GRUB problem. By your logic, it could be anything that writes directly to the hard drive, not just GRUB. While it's true that the absence of evidence is not evidence of absence, the absence of evidence *should* tell you to concentrate your effort on the evidence you *do* have. Following your logic, we should also create a separate bug report because it's possible that invaders from Mars, who have cunningly hidden themselves from us thus far, came down and corrupted the original reporter's RAID array right when he was installing Ubuntu. There's no evidence of that (just just there's no evidence it was a GRUB problem), but it *is* technically possible.

Also, there is plenty of evidence to suggest it *isn't* GRUB. As I mentioned in my emails several times:

1. GRUB only writes in the first few kilobytes of the disk, which is safely out of the way of the data. Any other GRUB stuff is written to a file system.
2. I've replicated the problem, and simply wiping the left over file system fixes it. If it were a GRUB problem, the problem would continue.

Additionally, if this issue were a separate GRUB problem:

3. Ubuntu 8.04 used GRUB version 1, which was extremely mature and well vetted by that point.
4. It would affect ALL distributions that use that version of GRUB, and ALL installs where the user has a RAID drive, and EVERY time the user installs GRUB. This is clearly not the case.
5. The installer actually installs GRUB on ALL of your hard drives (since it doesn't know which one will be active; some other OSs change the active drive at boot), so if that were true, ALL drives would have been corrupted. The reporter clearly stated that only one RAID drive was affected, and he just had to start in degraded mode and resync.

Please, stay in school. If you're not in school, go back.

Download full text (8.8 KiB)

I'm adding this message from the thread to provide as much information for people trying to recreate this bug as possible;

Top-posting for (my) convenience on this one...

It's nice to see someone actually trying to recreate the bug instead of
just flapping their gums trying to sound smart. If you have the time,
could you try recreating it again? I have some suggestions to make it
more like my scenario (in fact, number 3 below is required for the
problem to occur, and number 4 below is likely to be required). I know
these suggestions are long, but it would be appreciated. In for a
penny, in for a pound, eh? You'll have to do the first install again,
but you won't have to actually go through with the second install. You
can cancel after getting past the partitioning screen. I've noticed
that when things go awry there are two tell-tale signs:

1. On the partitioning screen, the original (now defunct) file system(s)
will be detected and show up.

2. Once you select "finish partitioning", the installer will show a list
of partition tables that will be modified. One or more RAID partitions
will show up on the list, irregardless of that fact that you didn't
select them for anything.

If those signs are present, the RAID array will be hosed. If the signs
are not there, the install will go fine and there's no need to continue.
  Additionally, you don't have to worry about test data or even mounting
the RAID array.

When doing the second install, if you post exactly which file systems
were detected on the manual partitioning screen and which partitions
were shown on the "to be modified" list once you hit "finish
partitioning", I'd appreciate it. Now on to the suggestions:

1. It sounds like in your setup you installed for the second time after
setting up the RAID array, but before the array finished resyncing for
the first time. In my setup, the array had been around for a while and
was fully resynced. In fact, I (likely) waited for the array to be
fully resynced before even installing XFS on it. If you *did* wait for
the drive to finish resyncing before the second install, please RSVP
because your array was indeed corrupted, but since it was only one drive
the array was somehow able to resync and recover.

2. I was using the ubuntu 10.4 beta 2 server 64-bit disk for the
(second) install when things went south. Could you try that one?

3. REQUIRED. It sounds like when doing the second install, you just
installed to an existing partition. In order for the problem to occur,
you have to remove/create a partition (even though you're leaving the
RAID partitions alone). If you recreate the partitions I used (6,
below), this will be taken care of.

4. POSSIBLY REQUIRED. When creating the RAID array with the default
options as you did in your first test, by default the array is created
in degraded mode, with a drive added later and resynced. This makes the
initial sync faster. Since I'm a neurotic perfectionist, I always
create my arrays with the much more manly and macho "--force" option to
create them "properly". Its very possible that doing the initial resync
with a degraded array will overwrite t...

Read more...

Jeff Lane (bladernr) wrote :
Download full text (11.8 KiB)

> Top-posting for (my) convenience on this one...
>
> It's nice to see someone actually trying to recreate the bug instead of
> just flapping their gums trying to sound smart. If you have the time,
> could you try recreating it again? I have some suggestions to make it
> more like my scenario (in fact, number 3 below is required for the
> problem to occur, and number 4 below is likely to be required). I know
> these suggestions are long, but it would be appreciated. In for a
> penny, in for a pound, eh? You'll have to do the first install again,
> but you won't have to actually go through with the second install. You
> can cancel after getting past the partitioning screen. I've noticed
> that when things go awry there are two tell-tale signs:

Be warned, I have to do this in a virtualized environment as I don't
have enough physical hardware to recreate this. My only multi-disk
system is my home fileserver and I don't really fancy tearing that
apart and rebuilding it... so keep in mind that I'm doing this in VM
space... but given that this is software RAID that shouldn't matter as
the OS doesn't know it's a VM.

Rest of my reply is inline:

> 1. On the partitioning screen, the original (now defunct) file system(s)
> will be detected and show up.
>
> 2. Once you select "finish partitioning", the installer will show a list
> of partition tables that will be modified. One or more RAID partitions
> will show up on the list, irregardless of that fact that you didn't
> select them for anything.
>
> If those signs are present, the RAID array will be hosed. If the signs
> are not there, the install will go fine and there's no need to continue.
> Additionally, you don't have to worry about test data or even mounting
> the RAID array.
>
> When doing the second install, if you post exactly which file systems
> were detected on the manual partitioning screen and which partitions
> were shown on the "to be modified" list once you hit "finish
> partitioning", I'd appreciate it. Now on to the suggestions:
>
> 1. It sounds like in your setup you installed for the second time after
> setting up the RAID array, but before the array finished resyncing for
> the first time. In my setup, the array had been around for a while and
> was fully resynced. In fact, I (likely) waited for the array to be
> fully resynced before even installing XFS on it. If you *did* wait for
> the drive to finish resyncing before the second install, please RSVP
> because your array was indeed corrupted, but since it was only one drive
> the array was somehow able to resync and recover.

Yes... I rebuild the entire thing from the ground up, this time when I
added three disks (instead of two) I used --force and waited for the
entire thing to sync and become active before putting an XFS
filesystem on it. After that point, I mounted /dev/md0 and copied
about 2GB of data to the new array.

> 2. I was using the ubuntu 10.4 beta 2 server 64-bit disk for the
> (second) install when things went south. Could you try that one?

Sadly, I can not. I no longer have access to the beta 2 ISOs (not
even sure where they are online at this point), however, that's not
necessarily a bad ...

It probably won't work in a VM, because of the way a VM's virtual storage devices work. They discard everything that's not being used at that moment, otherwise you'd have to allocate the full size of the drive. Thus, you probably couldn't have any defunct file systems sitting around.

Jeff Lane (bladernr) wrote :

FWIW, I do not use sparse files on my VMs... I allocate the full size prior and build each drive prior to use, so in my case, a 20GB drive is actually 20GB, not 1GB with a 20GB limit... but it was worth a shot at least... I tried.

Yup, and I thank you for it. And if this bug does get fixed, I imagine those people whose data won't be hosed as a result would thank you too.

Note that in bug 560152, NickJ's problem is actually this bug.

Linux000 (michael-yoyo) wrote :

First off, I don't have the hardware to try this, but there is a chance of grub wiping over information in the RAID controller(like a catalog to retrieve information form multiple disks)and rendering the data unreadable, which would only affect raid arrays, and creating a new file system would work with the new addresses. I will try to replicate it in a VM with fixed size disks. Interested to see if this happens to a raid 5 array... and if it does(and if you can) does taking a disk out and reading it as a single disk recover the information? Sorry if any of this sounds crazy, I don't have much experience with raid systems.

Linux000, don't bother; I have practically written a dissertation on why it isn't likely to be a grub problem in one of these bugs.

(copying comment from bug 568183)

Thanks for the answer, but now that I think about it, I'm not sure NEW_LABEL alone would cause wholesale RAID array corruption. Correct me if I'm wrong because I'm not an expert on how parted works, but that bug with NEW_LABEL would only overwrite the RAID superblock and not the data? In which case, the RAID array would just start up in degraded mode and have to be re-synced. In my case, and in the original bug report, and in NickJ's case, the array was not degraded, but rather the data on the array was corrupted.

Even supposing the NEW_LABEL command did cause the corruption, while it may have been an upstream bug that physically wrote to the disk and caused that problem, the root of all evil is still the installer that incorrectly told parted there was a file system present on a device that's a component of the RAID array. Are you saying that parted is also responsible for detecting which file systems are present as well, and the installer only reports it? In any case it needs to be fixed, because only bad things can happen if the installer doesn't know what file systems are actually present.

In any case, there needs to be basic safeguards in place which prevent the installer from writing directly to partitions with a RAID superblock, has a "RAID autodetect" partition type, or are in a logical volume. I guess you could argue that this would rather be an enhancement, in the way you could argue that a car without brakes is not a defect, but rather brakes would be a nice-to-have enhancement.

Jeff Lane (bladernr) wrote :

Could someone take a look at this? This was a pretty serious issue that seems to have just fizzled out as far as response goes.

Phillip Susi (psusi) wrote :

Correct me of I'm wrong, but it sounds like this problem begins with choosing to install Ubuntu to one of the individual disks comprising the raid array, instead of to the array as a whole. If that is the case, that is user error, not a bug.

If that is not the case, please reproduce this, and then attach the generated data from running the boot info script here:

http://sourceforge.net/projects/bootinfoscript/

Changed in parted (Ubuntu):
status: Confirmed → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for parted (Ubuntu) because there has been no activity for 60 days.]

Changed in parted (Ubuntu):
status: Incomplete → Expired

Is that an option now? The installer didn't let me install to the raid
array before and I have been reluctant to try again since it screwed things
up so badly before.

   -- Greg

On Mon, Oct 10, 2011 at 10:53 AM, Phillip Susi <email address hidden> wrote:

> Correct me of I'm wrong, but it sounds like this problem begins with
> choosing to install Ubuntu to one of the individual disks comprising the
> raid array, instead of to the array as a whole. If that is the case,
> that is user error, not a bug.
>
> If that is not the case, please reproduce this, and then attach the
> generated data from running the boot info script here:
>
> http://sourceforge.net/projects/bootinfoscript/
>
>
> ** Changed in: parted (Ubuntu)
> Status: Confirmed => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/191119
>
> Title:
> Installer corrupts raid drive
>
> Status in “parted” package in Ubuntu:
> Incomplete
>
> Bug description:
> Boot from standard 8.04 Desktop CD.
> Run installer, select default language, keyboard to get to Partitioner.
> Select Manual install
> Note that raid drives from booting windows are listed separately.
> Select partition on alternate, non raid, drive to install Ubuntu on.
> Edit partition and set mount point to "/"
> Set login, computer name, etc. and install.
> Reboot computer.
> Note that BIOS now shows first drive as not a raid member.
> Run Windows.
> Note that software needs to rebuild RAID 1 mirror.
> Reboot computer.
> Note that BIOS shows first drive as valid raid member again.
>
> The problem I have is that because Ubuntu doesn't recognize the RAID
> drives, it corrupts one of them when it installs the default boot
> loader. To even figure out that the installer is doing this I need to
> look in the advanced settings.
>
> It would help to have a complete list of all disk operations that will
> be performed (and the affected drives or partitions) clearly shown on
> the the final Ready to Install screen.
>
> Another solution would be for Ubuntu to better support the Intel ICH9R
> controller that I happen to be using, but I think that is only a
> partial solution because the next software raid system that Intel
> comes up with might not be supported (and I think that adding better
> support for Intel raid is already on the wishlist).
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/parted/+bug/191119/+subscriptions
>

Phillip Susi (psusi) wrote :

Yes, it has been at least as far back as 9.04, I thought all the way back to 8.04, but I could be mistaken.

Greg (gregcouch) wrote :

Thank you, I will try it again.

   -- Greg

On Sat, Dec 10, 2011 at 9:06 AM, Phillip Susi <email address hidden> wrote:

> Yes, it has been at least as far back as 9.04, I thought all the way
> back to 8.04, but I could be mistaken.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/191119
>
> Title:
> Installer corrupts raid drive
>
> Status in “parted” package in Ubuntu:
> Expired
>
> Bug description:
> Boot from standard 8.04 Desktop CD.
> Run installer, select default language, keyboard to get to Partitioner.
> Select Manual install
> Note that raid drives from booting windows are listed separately.
> Select partition on alternate, non raid, drive to install Ubuntu on.
> Edit partition and set mount point to "/"
> Set login, computer name, etc. and install.
> Reboot computer.
> Note that BIOS now shows first drive as not a raid member.
> Run Windows.
> Note that software needs to rebuild RAID 1 mirror.
> Reboot computer.
> Note that BIOS shows first drive as valid raid member again.
>
> The problem I have is that because Ubuntu doesn't recognize the RAID
> drives, it corrupts one of them when it installs the default boot
> loader. To even figure out that the installer is doing this I need to
> look in the advanced settings.
>
> It would help to have a complete list of all disk operations that will
> be performed (and the affected drives or partitions) clearly shown on
> the the final Ready to Install screen.
>
> Another solution would be for Ubuntu to better support the Intel ICH9R
> controller that I happen to be using, but I think that is only a
> partial solution because the next software raid system that Intel
> comes up with might not be supported (and I think that adding better
> support for Intel raid is already on the wishlist).
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/parted/+bug/191119/+subscriptions
>

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers