/boot destroyed during install even when formatting disabled (alternate installer)

Bug #542210 reported by Jason Tackaberry
32
This bug affects 4 people
Affects Status Importance Assigned to Milestone
partman-md (Ubuntu)
Fix Released
High
Colin Watson
Lucid
Fix Released
High
Colin Watson

Bug Description

Although at first blush this seems to be bug #527401, I don't think it's related to grub2. It's also a regression from alpha3; whereas #527401 also applies to lucid alpha3, this bug does not. I am using the alternate installer.

I have an existing server running Hardy with LVM on MD, with several disks partitioned with /dev/sdX1 as 256M and /dev/sdX2 as the rest, with /dev/md0 as raid1 using /dev/sd?1, and /dev/md1 as raid5 using /dev/sd?2.

/dev/md0 is formatted with ext4 and is /boot. /dev/md1 is the only PV for volume group vg0, and vg0 has LVs swap (2G), root (50G), lucid (50G), data.

The existing root LV contains Hardy, and my intent is to install Lucid fresh onto the new lucid LV, and share /boot between both installs, allowing me to easily reboot into Hardy if things go badly. Therefore, during the install, I specifically do not format /boot.

I'm obviously going to wait for 10.04 final before installing it on the actual server. For now I am staging it under a VM. This problem is consistently repeatable with beta1, and consistently absent with alpha3.

During the install, I select manual partitioning, and then:
- Select 'lucid' LV: use as: ext4; mount point: /
- Select 'swap' LV: use as swap
- Select RAID1 device #0: use as: ext3; format: no, keeping existing data; mount point: /boot

It is worth noting that in the partitioner, several partitions and LVs in the list show an "unusable" portion, e.g.:

RAID1 device #0 - 263.1MB Software RAID device
   #1 262.1MB K ext3 /boot
           917.5kB unusable

Lucid Alpha3 did not show these unusable lines. (Could be irrelevant, but thought I'd mention it.)

At this point, if I drop to another terminal, I am able to mount /dev/md0, so it's valid, and everything looks fine. I unmount before proceeding.

After selecting "Finish partitioning and write changes to disk" it warns me that "/dev/md0 assigned to /boot has not been marked for formatting." Alpha3 did this as well. I proceed.

It then says the partition tables of the following devices are changed:

LVM VG vg0, LV lucid
LVM VG vg0, LV swap
RAID1 device #0 <---- NOT LISTED IN ALPHA3

The following partitions are going to be formatted:
LVM VG vg0, LV lucid as ext4
LVM VG vg0, LV swap as swap

If I proceed to write changes to disk, it formats /, and then says:

"The attempt to mount a file system with type ext3 in RAID1 device #0 at /boot failed."

Now if I drop to a shell and try to mount /dev/md0, it fails:

~ # mount -t ext3 /dev/md0 /mnt
mount: mounting /dev/md0 on /mnt failed: Invalid argument.

Tags: lucid
Revision history for this message
Jason Tackaberry (tack) wrote :

Correction: /dev/md0 (/boot) is formatted as ext3, not ext4.

Revision history for this message
Ben Scholzen 'DASPRiD' (dasprid) wrote :

I have the same problem (Ubuntu Lucid Beta 1, alternate installer).

I created two RAID1 partitions on two disks, one for boot and one for LVM. Then I formatted the boot RAID parition with ext3. After setting up LVM was done and I completed the parition creation, the installer popped up the following error:

"The attempt to mount a file system with type ext3 in RAID1 device #0 at /boot failed."

Revision history for this message
Philip Muškovac (yofel) wrote :

Assigning to the alternate installer.

affects: ubuntu → debian-installer (Ubuntu)
tags: added: lucid
Revision history for this message
Ben Scholzen 'DASPRiD' (dasprid) wrote :

Confirming that the bug is still present in Beta 2.

Revision history for this message
Ben Scholzen 'DASPRiD' (dasprid) wrote :

Confirming that the bug is still present in Release Candidate, I assume you are not going to fix that for release? Will be pretty bad if nobody will be able to install Lucid in a RAID configuration.

Revision history for this message
Ben Scholzen 'DASPRiD' (dasprid) wrote :
Revision history for this message
Ben Scholzen 'DASPRiD' (dasprid) wrote :
Revision history for this message
Ben Scholzen 'DASPRiD' (dasprid) wrote :

Here are syslog and partman log attached. I have also screencasted how I'm exatly creating the raid:

http://stuff.dasprids.de/videos/raid-creation.avi

Revision history for this message
Colin Watson (cjwatson) wrote :

This looks suspicious at first glance, but maybe it's irrelevant:

Apr 23 17:07:50 kernel: [ 191.385910] md: delaying resync of md1 until md0 has finished (they share one or more physical units)

Can you look in /proc/mdstat to check that the arrays really aren't overlapping?

Revision history for this message
Jason Tackaberry (tack) wrote :

I don't think that message means the arrays are overlapping, at least for useful definitions of "overlapping." It would say this if, for example, md0 and md1 had separate partitions on the same physical disks. This is the case for my configuration, where I have 5 disks each with two partitions (256M and the rest), and /dev/sd[abcde]1 is RAID-1 on md0 and /dev/sd[abcde]2 is RAID-5 on md1.

This is a perfectly cromulent (and I think common) configuration that would elicit that message from md.

Colin Watson (cjwatson)
affects: debian-installer (Ubuntu Lucid) → partman-md (Ubuntu Lucid)
Changed in partman-md (Ubuntu Lucid):
assignee: nobody → Colin Watson (cjwatson)
importance: Undecided → High
milestone: none → ubuntu-10.04
status: New → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package partman-md - 49ubuntu1

---------------
partman-md (49ubuntu1) lucid; urgency=low

  * Only call NEW_LABEL on MD devices if they're empty, since with current
    parted this unconditionally clobbers the superblock (LP: #542210).
  * Register partman-md/confirm_nooverwrite, associated with the
    partman-md/confirm template.
 -- Colin Watson <email address hidden> Mon, 26 Apr 2010 14:51:29 +0100

Changed in partman-md (Ubuntu Lucid):
status: Fix Committed → Fix Released
Revision history for this message
Jason Tackaberry (tack) wrote :

Will there be another 10.04 RC to test this? Or is there some other way we can test the updated installer?

Revision history for this message
Steve Langasek (vorlon) wrote :

candidate images will be available in a few hours including this change; you will be able to find them posted on http://iso.qa.ubuntu.com/qatracker/build/all when they're ready.

Revision history for this message
Alvin Thompson (alvint-deactivatedaccount) wrote :

This is likely a cause of bug 191119 and bug 568183 as well. However, those bugs are not dups; in those bugs existing RAID partitions are corrupted, even if they aren't used during the install. Looking at this report, I would guess the RAID partition corruption is because the installer is trying to use that NEW_LABEL command on the incorrectly detected file systems on the devices comprising the RAID array. The root cause of those bugs is those incorrectly detected file systems. See those bugs for details. Until those bugs are fixed, I would VERY STRONGLY not recommend installing on a production system, or even a test system for that matter.

Jason and Ben: very well done investigating this. Collin: not-so-well done.

Revision history for this message
Jason Tackaberry (tack) wrote :

Thanks Colin and Steve. I can confirm that the problem as reported is fixed (inasmuch as I am no longer able to reproduce it in my staging VM with a QA build of the alternate installer).

Revision history for this message
NickJ (nickd-jones) wrote :

I can also confirm that my experience of the problem as detailed in bug #560152 has been rectified with http://cdimage.ubuntu.com/daily/20100427.1/lucid-alternate-amd64.iso

- The 'unusable' bits of the partition no longer display
- There are no messages about partition tables changing, just that /boot and / are going to be formatted
- Grub successfully installs to the MBR of both /dev/sda and /dev/sdb

Thanks to all

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.