Activity log for bug #568183

Date Who What changed Old value New value Message
2010-04-22 00:44:23 Alexander Pirdy bug added bug
2010-04-22 01:48:23 Alvin Thompson marked as duplicate 191119
2010-04-22 11:39:51 andrew.dunn nominated for series Ubuntu Lucid
2010-04-22 17:41:04 Alexander Pirdy removed duplicate marker 191119
2010-04-22 17:50:13 Alexander Pirdy description In short this really isn't a bug I found but just one that came up on a mailing list, so for those who want to read the original post should go through the archives of the "Ubuntu user technical support, not for general discussions" <ubuntu-users@lists.ubuntu.com> list. The poster indicated that he had no intention of filing a bug report despite a large reason to. This needs to be addressed, or at least a warning given before the Lucid release date, or there could be many angry people especially those with servers! I am really unable to fully test this but there is a slim chance that it is a duplicate of bug #191119 or #369635 and my apologies if it is some other duplicate. Also I have no idea if this is present (though I find it hard to believe it might be) on past versions of ubuntu. Regardless here is the verbatim post: Title: DANGER!!! Problems with 10.04 installer (RAID devices *will* get corrupted) Long story short: the only way to be safe right now is to physically remove drives with important data during the install. I figured out the cause of my RAID problems, and it's a problem with ubuntu's installer. This will cost people their data if not fixed. Sorry about the length of this post, but the problem takes a while to explain. The following scenario is not the only way your partitions can get hosed. I simply use it because it's a common use case, it illustrates what data is where on the hard drives, and it exposes the flaws in the installer's logic. It also doesn't matter if you don't touch a particular drive, partition, or file system during the install. The data on it can still be corrupted. Suppose you have a hard drive with some partitions on it. On one of those partitions you have a linux file system which houses your data. We'll say for the sake of this discussion that sda2 contains an EXT4 file system with your data. So far, so good. Because this data is too important to rely on a single drive, you decide to buy some more drives and make a RAID 5 device. You buy 3 more drives and create similar partitions an them (say, sdb2, sdc2, and sdd2). You copy the data currently on sda2 somewhere safe, then you use mdadm to create a RAID5 array with sda2, sdb2, sdc2, and sdd2. The new RAID device is md0. You create an XFS file system on md0 and move your data to it*. This is all perfectly fine, but the stage has been set for disaster with the ubuntu installer. Later, you decide to do a clean install of ubuntu on sda1 (sda1 is *not* part of the RAID array), and you get to the partitioning stage and select manual partitioning. This is where things get really ugly really fast. The bug is how the installer detects existing file systems. It simply reads the raw data in a partition to see if the bits it finds correspond to a known file system. In the above example, the installer detects the remnants of the original (non-RAID) file system on sda2 and thinks it's a current EXT4 file system. Even if you use fdisk to mark sda2's partition type as 'RAID autodetect' instead of 'linux' (which is no longer necessary), the installer still detects the partition as having an EXT4 file system. Once this 'ghost' file system is detected, the installer gets really confused about what goes where and will try to write to sda2 during the install, even if you told the installer to ignore sda2 and just install to sda1. This corrupts the current XFS file system on md0, and you're screwed. The overall flaw here is in the file system detection; you can't just assume that any sequence of bits you find sitting around on a hard drive are still current. A possible solution may be to first check for a RAID superblock, and if found that trumps all file system detection. I imagine something similar will have to be done with partitions that are part of an LVM volume as well. -Alvin * In my case, I took a shortcut and created a degraded array (missing sda2), copied the data from sda2 to the array, added sda2 to the array, and resynched. I don't think it makes a difference. In short this really isn't a bug I found but just one that came up on a mailing list, so for those who want to read the original post should go through the archives of the "Ubuntu user technical support, not for general discussions" <ubuntu-users@lists.ubuntu.com> list. The poster indicated that he had no intention of filing a bug report despite a large reason to. This needs to be addressed, or at least a warning given before the Lucid release date, or there could be many angry people especially those with servers! I am really unable to fully test this but there is a slim chance that it is a duplicate of bug #191119 or #369635 and my apologies if it is some other duplicate. Also I have no idea if this is present (though I find it hard to believe it might be) on past versions of ubuntu. Regardless here is the verbatim post: Title: DANGER!!! Problems with 10.04 installer (RAID devices *will* get corrupted) Long story short: the only way to be safe right now is to physically remove drives with important data during the install. I figured out the cause of my RAID problems, and it's a problem with ubuntu's installer. This will cost people their data if not fixed. Sorry about the length of this post, but the problem takes a while to explain. The following scenario is not the only way your partitions can get hosed. I simply use it because it's a common use case, it illustrates what data is where on the hard drives, and it exposes the flaws in the installer's logic. It also doesn't matter if you don't touch a particular drive, partition, or file system during the install. The data on it can still be corrupted. Suppose you have a hard drive with some partitions on it. On one of those partitions you have a linux file system which houses your data. We'll say for the sake of this discussion that sda2 contains an EXT4 file system with your data. So far, so good. Because this data is too important to rely on a single drive, you decide to buy some more drives and make a RAID 5 device. You buy 3 more drives and create similar partitions an them (say, sdb2, sdc2, and sdd2). You copy the data currently on sda2 somewhere safe, then you use mdadm to create a RAID5 array with sda2, sdb2, sdc2, and sdd2. The new RAID device is md0. You create an XFS file system on md0 and move your data to it*. This is all perfectly fine, but the stage has been set for disaster with the ubuntu installer. Later, you decide to do a clean install of ubuntu on sda1 (sda1 is *not* part of the RAID array), and you get to the partitioning stage and select manual partitioning. This is where things get really ugly really fast. The bug is how the installer detects existing file systems. It simply reads the raw data in a partition to see if the bits it finds correspond to a known file system. In the above example, the installer detects the remnants of the original (non-RAID) file system on sda2 and thinks it's a current EXT4 file system. Even if you use fdisk to mark sda2's partition type as 'RAID autodetect' instead of 'linux' (which is no longer necessary), the installer still detects the partition as having an EXT4 file system. Once this 'ghost' file system is detected, the installer gets really confused about what goes where and will try to write to sda2 during the install, even if you told the installer to ignore sda2 and just install to sda1. This corrupts the current XFS file system on md0, and you're screwed. The overall flaw here is in the file system detection; you can't just assume that any sequence of bits you find sitting around on a hard drive are still current. A possible solution may be to first check for a RAID superblock, and if found that trumps all file system detection. I imagine something similar will have to be done with partitions that are part of an LVM volume as well. -Alvin * In my case, I took a shortcut and created a degraded array (missing sda2), copied the data from sda2 to the array, added sda2 to the array, and resynched. I don't think it makes a difference.
2010-04-22 20:36:06 Alvin Thompson marked as duplicate 191119