Cannot install on HP Proliant DL385 G7 - dual RAID controllers

Bug #695842 reported by Barry G on 2010-12-30
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
grub-installer (Ubuntu)
High
Colin Watson
Lucid
High
Colin Watson
Maverick
High
Colin Watson

Bug Description

Stable release update justification:

Impact: grub-installer picks the wrong device to install GRUB to when / is on a software RAID device composed of CCISS devices, and the installation therefore fails.
Development branch: Fixed in grub-installer 1.60ubuntu1 by adding a mapdevfs command to canonicalise the /dev/block/ device names printed by 'mdadm --detail'.
Patch:
 http://bazaar.launchpad.net/~cjwatson/grub-installer/lucid-proposed/revision/1173
 http://bazaar.launchpad.net/~cjwatson/grub-installer/maverick-proposed/revision/1187
TEST CASE: I'm afraid this consists of finding a system with at least two CCISS disks and doing a RAID install on it. It looks like Canonical Support have access to such a system, so I hope they can help. Use the apt-setup/proposed=true boot parameter to pull in the unverified installer update.
Regression potential: As far as I know, this can't affect anything other than RAID installs.

Original report follows:

Binary package hint: debian-installer

The installation of Lucid is futile on a HP Proliant DL385 G7 system (uncertified). Other releases have not been tested at this time.

Logs (syslog and partman) generated during install time (using just one of the two available RAID controllers) have been attached. Apport information from the affected system is forthcoming (live CD).

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: debian-installer (not installed)
ProcVersionSignature: Ubuntu 2.6.32-24.39-generic 2.6.32.15+drm33.5
Uname: Linux 2.6.32-24-generic x86_64
Architecture: amd64
Date: Thu Dec 30 22:44:53 2010
DeviceMapperTables:
 Error: command ['dmsetup', 'table'] failed with exit code 1: /dev/mapper/control: open failed: Permission denied
 Failure to communicate with kernel device-mapper driver.
 Command failed
DmraidDevices: Error: command ['dmraid', '-r'] failed with exit code 1: ERROR: you must be root
DmraidSets: Error: command ['dmraid', '-s'] failed with exit code 1: ERROR: you must be root
LiveMediaBuild: Ubuntu 10.04.1 LTS "Lucid Lynx" - Release amd64 (20100816.1)
MachineType: HP ProLiant DL385 G7
MemoryUsage:
 total used free shared buffers cached
 Mem: 33012524 1916960 31095564 0 113132 704396
 -/+ buffers/cache: 1099432 31913092
 Swap: 0 0 0
ProcCmdLine: file=/cdrom/preseed/hostname.seed boot=casper persistent iso-scan/filename=/hostname-10.04.1-desktop-amd64.iso splash
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: debian-installer
dmi.bios.date: 06/15/2010
dmi.bios.vendor: HP
dmi.bios.version: A18
dmi.chassis.type: 23
dmi.chassis.vendor: HP
dmi.modalias: dmi:bvnHP:bvrA18:bd06/15/2010:svnHP:pnProLiantDL385G7:pvr:cvnHP:ct23:cvr:
dmi.product.name: ProLiant DL385 G7
dmi.sys.vendor: HP

Barry G (barrygould) wrote :

The syslog file contains the following lines:

Dec 28 13:12:22 debootstrap: Selecting previously deselected package base-files.
Dec 28 13:12:22 debootstrap: (Reading database ... 0 files and directories currently installed.)
Dec 28 13:12:22 debootstrap: dpkg: regarding .../base-files_5.0.0ubuntu20.10.04.2_amd64.deb containing base-files, pre-dependency problem:
Dec 28 13:12:22 debootstrap: base-files pre-depends on awk
Dec 28 13:12:22 debootstrap: awk is not installed.
Dec 28 13:12:22 debootstrap: dpkg: warning: ignoring pre-dependency problem!
Dec 28 13:12:22 debootstrap: Unpacking base-files (from .../base-files_5.0.0ubuntu20.10.04.2_amd64.deb) ...
Dec 28 13:12:22 debootstrap: dpkg (subprocess): unable to execute new pre-installation script: Permission denied
Dec 28 13:12:22 debootstrap: dpkg: error processing /var/cache/apt/archives/base-files_5.0.0ubuntu20.10.04.2_amd64.deb (--install):
Dec 28 13:12:22 debootstrap: subprocess new pre-installation script returned error exit status 2
Dec 28 13:12:22 debootstrap: Selecting previously deselected package base-passwd.
Dec 28 13:12:22 debootstrap: Unpacking base-passwd (from .../base-passwd_3.5.22_amd64.deb) ...
Dec 28 13:12:22 kernel: [ 1058.913707] JBD: barrier-based sync failed on md0-8 - disabling barriers
Dec 28 13:12:23 debootstrap: dpkg: base-passwd: dependency problems, but configuring anyway as you requested:
Dec 28 13:12:23 debootstrap: base-passwd depends on libc6 (>= 2.8); however:
Dec 28 13:12:23 debootstrap: Package libc6 is not installed.
Dec 28 13:12:23 debootstrap: Setting up base-passwd (3.5.22) ...
Dec 28 13:12:23 debootstrap: dpkg (subprocess): unable to execute installed post-installation script: Permission denied
Dec 28 13:12:23 debootstrap: dpkg: error processing base-passwd (--install):
Dec 28 13:12:23 debootstrap: subprocess installed post-installation script returned error exit status 2
Dec 28 13:12:23 debootstrap: Errors were encountered while processing:
Dec 28 13:12:23 debootstrap: /var/cache/apt/archives/base-files_5.0.0ubuntu20.10.04.2_amd64.deb
Dec 28 13:12:23 debootstrap: base-passwd

summary: - report for Bug #695766
+ Cannot install - unable to execute installed post-installation script:
+ Permission denied
description: updated
Peter Matulis (petermatulis) wrote :
Peter Matulis (petermatulis) wrote :
Peter Matulis (petermatulis) wrote :

Upon further testing with scenarios with permutations on the following:

- installation to disks involving one or two controllers
- inclusion of LVM
- inclusion of MD (software RAID)

my hypothesis is that the installer is simply getting confused by the dual controllers.

Some observations:

a) When installing to disk involving one or both controllers and a combination of LVM and MD then package installation fails.

b) When installing to disk involving both controllers and no LVM nor MD then package installation succeeds but the system does not boot (which I presume is because GRUB installation failed).

c) When installing to disk involving a single controller (2 test installs; RAID 0 and RAID 6) and no LVM nor MD then both package and GRUB installation succeeds. Still, the installer detects controllers in wrong order and complains.

I am working on getting the partman & syslog installer logs to the last scenario.

Other info:

The PCI ids for the controllers

03:00.0 RAID bus controller [0104]: Hewlett-Packard Company Smart Array G6 controllers [103c:323a] (rev 01)
        Subsystem: Hewlett-Packard Company Device [103c:3245]

0c:00.0 RAID bus controller [0104]: Hewlett-Packard Company Smart Array G6 controllers [103c:323a] (rev 01)
        Subsystem: Hewlett-Packard Company Device [103c:3243]

summary: - Cannot install - unable to execute installed post-installation script:
- Permission denied
+ Cannot install on HP Proliant DL385 G7 - dual RAID controllers
Peter Matulis (petermatulis) wrote :

Also forthcoming is a repetition of scenario (c) above but with Maverick.

Peter Matulis (petermatulis) wrote :

The syslog of the Lucid test (no LVM nor MD) shows grub-install being successful. Strange.

Peter Matulis (petermatulis) wrote :

Same for the Maverick test.

Note that CentOS 5.4 installs fine using both controllers as well as using LVM and MD.

Peter Matulis (petermatulis) wrote :

So far we can boot in these scenarios:

1. One controller and no MD/LVM; no intervention required
2. Two controllers with no MD/LVM; manual grub install required
3. Two controllers with MD (RAID0) but no LVM; manual grub install required

Attached:

- syslog (10.10_2controllers-yesMD-noLVM_syslog)
- boot sectors for c0d0 and c1d0 (c0d0_sector and c1d0_sector)

From the last install (scenario 3), based on the boot sectors, I confirmed that c0d0 had GRUB installed and c1d0 did not (but do not see any successful installation to c0d0 in the log). Yet I do see where you then installed GRUB manually to each volume. What was in c0d0 might be a remnant of a previous install. Note that the installer is trying to install GRUB to /dev/md0 (?).

Jan 21 21:42:19 grub-installer: info: Installing grub on '/dev/md0'
Jan 21 21:42:19 grub-installer: info: grub-install supports --no-floppy
Jan 21 21:42:19 grub-installer: info: Running chroot /target grub-install --no-floppy --force "/dev/md0"
Jan 21 21:42:20 grub-installer: /usr/sbin/grub-setup: warn: Attempting to install GRUB to a partition instead of the MBR. This is a BAD idea..
Jan 21 21:42:20 grub-installer: /usr/sbin/grub-setup: error: embedding is not possible, but this is required when the root device is on a RAID array or LVM volume.
Jan 21 21:42:20 grub-installer: error: Running 'grub-install --no-floppy --force "/dev/md0"' failed

Then the manual interventions:

Jan 21 21:43:46 grub-installer: info: Installing grub on '/dev/cciss/c0d0'
Jan 21 21:43:46 grub-installer: info: grub-install supports --no-floppy
Jan 21 21:43:46 grub-installer: info: Running chroot /target grub-install --no-floppy --force "/dev/cciss/c0d0"
Jan 21 21:43:47 grub-installer: Installation finished. No error reported.
Jan 21 21:43:47 grub-installer: info: grub-install ran successfully

Jan 21 21:44:14 grub-installer: info: Installing grub on '/dev/cciss/c1d0'
Jan 21 21:44:14 grub-installer: info: grub-install supports --no-floppy
Jan 21 21:44:14 grub-installer: info: Running chroot /target grub-install --no-floppy --force "/dev/cciss/c1d0"
Jan 21 21:44:16 grub-installer: Installation finished. No error reported.
Jan 21 21:44:16 grub-installer: info: grub-install ran successfully

Peter Matulis (petermatulis) wrote :
Peter Matulis (petermatulis) wrote :
Colin Watson (cjwatson) wrote :

Installing to /dev/md0 is certainly wrong in the (typical) case where the array is constructed from partitions rather than whole disks. The installer does have code to spot this situation, which evidently isn't working for some reason.

Could you run through an installation that reproduces this, but before you reach the GRUB installation step, switch to Alt-F2, 'nano /usr/bin/grub-installer', uncomment the 'set -x' line near the top, save and exit, and switch back to Alt-F1 and continue? This should give a more verbose syslog which may uncover the problem with situation 3.

Peter Matulis (petermatulis) wrote :

@Colin. Here is the new syslog. However I do not see more info from grub-installer.

Colin Watson (cjwatson) wrote :

Nor do I. Um, are you sure you did it right? It looks like you started a shell to edit /usr/bin/grub-installer *after* the GRUB installation step started. You *must* edit it before that or it won't work. I suggest doing it somewhere around the network configuration or partitioning stages; you should have plenty of time.

Peter Matulis (petermatulis) wrote :

@Colin. Ok, this one looks better.

Colin Watson (cjwatson) wrote :

Thanks, Peter. Could you try that again, but this time apply the attached patch on the fly as well? I think this should fix it.

affects: debian-installer (Ubuntu) → grub-installer (Ubuntu)
Changed in grub-installer (Ubuntu):
assignee: nobody → Colin Watson (cjwatson)
importance: Undecided → High
status: New → Triaged
tags: added: patch
Peter Matulis (petermatulis) wrote :

@Colin. Yes, this works! I am attaching the syslog ('set -x' uncommented) in case it's valuable to you.

Also, as stated earlier, the ultimate goal is to get a successful install with LVM on top of the md device. Should I create a separate bug for that or should this also be fixed at this point?

Colin Watson (cjwatson) wrote :

Thanks! I'll go ahead and upload that, then.

On LVM: the syslog in comment 3 (LVM+RAID) fails well before grub-installer, so this does appear to be a separate problem and we should track it separately. One thing I notice from the syslog is that partman-base 139ubuntu6 is in use, and a fix landed in lucid-updates two weeks ago which was relevant to RAID installations (bug 569900). Now, my calculations indicate that the disk sizes in this case are outside the range affected by that bug, but I might have got something wrong somewhere. I'd appreciate a retest with the 10.04.2 candidate images plus this fix, and if it still fails I'll need as much detailed information as possible about the exact disk layout (syslog, partman, 'fdisk -l', /proc/mdstat, 'pvs --units b').

Colin Watson (cjwatson) on 2011-02-10
Changed in grub-installer (Ubuntu Lucid):
assignee: nobody → Colin Watson (cjwatson)
importance: Undecided → High
milestone: none → ubuntu-10.04.3
status: New → Triaged
Changed in grub-installer (Ubuntu Maverick):
assignee: nobody → Colin Watson (cjwatson)
importance: Undecided → High
status: New → Triaged
Launchpad Janitor (janitor) wrote :
Download full text (4.7 KiB)

This bug was fixed in the package grub-installer - 1.60ubuntu1

---------------
grub-installer (1.60ubuntu1) natty; urgency=low

  * Resynchronise with Debian. Remaining changes:
    - Show the grub menu and raise the menu timeout if other operating
      systems are installed (only for GRUB Legacy right now).
    - Remove splash boot parameter unless debian-installer/framebuffer=true
      and debian-installer/splash=true.
    - If / or /boot are on a removable device, install GRUB there by
      default.
    - Only mount /target/proc if it isn't already mounted.
    - Support setting OVERRIDE_UNSUPPORTED_OS in the environment to force
      grub-installer to use its default MBR selection method despite there
      being unsupported operating systems on the disk.
    - Unless grub-installer/make_active is preseeded to false, mark the
      partition to which GRUB is being installed as bootable, or failing
      that the first available primary partition on the disk to which GRUB
      is being installed.
    - Support grub-installer/bootdev_directory preseeding to make use of the
      relative path feature of grub4dos, so that we can point grub4dos at
      part of a disk for Wubi. Setting this disables normal grub
      installation, but still generates a device.map (for GRUB Legacy only);
      it also hides the menu.
    - Handle cases where /boot is bind-mounted.
    - Add support for writing an GRUB Legacy MBR on each disk in an
      mdadm-managed RAID providing /boot. (GRUB 2 can handle this already.)
    - Properly make use of output from os-prober to configure the booting of
      other operating systems on dmraid arrays. Attempt to guess where in
      the device map the array belongs, by substituting the first drive in
      the dmraid array for the dmraid array device node itself, and removing
      any reference to other member disks of the array.
    - Go back to using update-grub -y for GRUB Legacy for now; our grub
      package is a bit old and still requires this.
    - Default to grub2 for GPT systems.
    - Allow grub/grub2 choice for ext4, though still default to grub2.
    - If /boot is on an MD device and we're using GRUB 2, install GRUB there
      rather than (hd0); GRUB 2 will interpret that as meaning that it needs
      to install to each of the RAID members.
    - If using GRUB 2 and installing to a RAID device any of whose
      components are partitions, then default to installing to the MBRs of
      each of the containing disks, since GRUB 2 will refuse to install to
      the partition devices.
    - Bind-mount /proc and /sys while running grub-install.
    - Update grub-installer/bootdev text to avoid GRUB device naming that
      changed between GRUB Legacy and GRUB 2, and to use libata-style device
      naming since that is more accurate for most people.
    - On i386/efi and amd64/efi subarchitectures, install grub-efi and purge
      grub, grub-legacy, and grub-pc; elsewhere, purge grub-efi*.
    - Don't ask for a boot device on EFI, and don't pass a boot device
      argument to grub-install.
    - Add a preseedable grub-installer/timeout template to adjust the
      initial GRUB timeout.
    - Inst...

Read more...

Changed in grub-installer (Ubuntu):
status: Triaged → Fix Released
Colin Watson (cjwatson) on 2011-02-10
description: updated
Changed in grub-installer (Ubuntu Lucid):
status: Triaged → In Progress
Changed in grub-installer (Ubuntu Maverick):
status: Triaged → In Progress
Peter Matulis (petermatulis) wrote :

@Colin. Yes, the 10.04.2 ISO candidate + patch works (md0 + LVM)!

Am I good to go with such a solution regarding system roll outs without fear of something being clobbered at 10.04.2 release time or by any subsequent updates? Or should I wait for 10.04.2 to be officially out (Feb 17) and use that to install on my machines?

Colin Watson (cjwatson) wrote :

The out-of-memory error must have been transient, then; I guess we chalk that up as unreproducible.

I don't know how your system works, so I can't say what will or won't be clobbered. I can say that you aren't going to get a different newer version of grub-installer, though; the next version to enter lucid-proposed (after 10.04.2) will contain my patch.

Accepted grub-installer into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in grub-installer (Ubuntu Lucid):
status: In Progress → Fix Committed
tags: added: verification-needed
Martin Pitt (pitti) wrote :

Accepted grub-installer into maverick-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in grub-installer (Ubuntu Maverick):
status: In Progress → Fix Committed

Any tester for this SRU ? Barry or Peter can you test the version in lucid-proposed or maverick-proposed ? Thanks in advance.

Colin Watson (cjwatson) wrote :

Remember that you need to use apt-setup/proposed=true to pull in the unverified grub-installer update.

description: updated

SRU verification for Lucid:
I have verified grub-installer 1.49ubuntu11.2 in -proposed, installed a server image with apt-setup/proposed=true passed as an installer boot parameter and found no regression.

Marking as verification-done

tags: added: verification-done-lucid
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grub-installer - 1.49ubuntu11.2

---------------
grub-installer (1.49ubuntu11.2) lucid-proposed; urgency=low

  * Canonicalise device names printed by 'mdadm --detail' (LP: #695842).
 -- Colin Watson <email address hidden> Thu, 10 Feb 2011 19:51:23 +0000

Changed in grub-installer (Ubuntu Lucid):
status: Fix Committed → Fix Released
tags: added: testcase
JC Hulce (soaringsky) wrote :

This bug affects Ubuntu 10.10, Maverick Meerkat. Maverick has reached end-of-life and is no longer supported, so I am closing the bugtask for Maverick. Please upgrade to a newer version of Ubuntu.
More information here: https://lists.ubuntu.com/archives/ubuntu-announce/2012-April/000158.html

Changed in grub-installer (Ubuntu Maverick):
status: Fix Committed → Invalid
To post a comment you must log in.