[STAGING] 2.6.32-47 kernel update on 10.04 breaks software RAID at boot

Bug #1209423 reported by A1an on 2013-08-08
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned

Bug Description

After update from 2.6.32-46 to 2.6.32-47 one of the md arrays does not start anymore at boot time. The RAID 1 md0 now causes message "Continue to wait; or Press S to skip mounting or M for manual recovery" at boot. Booting back into 2.6.32-46 the md devices are correctly set up and the system boots normally. The array that does not start anymore has the following, while While a second array with metadata=1.0 starts correctly:
level=raid1 metadata=1.2 num-devices=2

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-50-generic 2.6.32-50.112
Regression: Yes
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.32-50.112-generic 2.6.32.61+drm33.26
Uname: Linux 2.6.32-50-generic x86_64
NonfreeKernelModules: nvidia
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: alan 1991 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xf6220000 irq 22'
   Mixer name : 'Realtek ALC888'
   Components : 'HDA:10ec0888,80860034,00100202'
   Controls : 28
   Simple ctrls : 16
Date: Thu Aug 8 02:01:09 2013
HibernationDevice: RESUME=UUID=b1649a13-fffd-4e98-b42b-a5e9ea98f9fc
InstallationMedia: Ubuntu 10.04.4 LTS "Lucid Lynx" - Release amd64 (20120214.2)
ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-50-generic root=UUID=e69dddfa-14a0-499f-9267-edca27fbd7a4 ro quiet splash
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_US.utf8
 SHELL=/bin/bash
RelatedPackageVersions: linux-firmware 1.34.14
RfKill:
 0: hci0: Bluetooth
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
StagingDrivers: r8192s_usb
Title: [STAGING]
dmi.bios.date: 04/19/2010
dmi.bios.vendor: Intel Corp.
dmi.bios.version: WBIBX10J.86A.0293.2010.0419.1819
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: DP55WB
dmi.board.vendor: Intel Corporation
dmi.board.version: AAE64798-207
dmi.chassis.type: 2
dmi.modalias: dmi:bvnIntelCorp.:bvrWBIBX10J.86A.0293.2010.0419.1819:bd04/19/2010:svn:pn:pvr:rvnIntelCorporation:rnDP55WB:rvrAAE64798-207:cvn:ct2:cvr:

A1an (alan-b) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
A1an (alan-b) wrote :

Answering questions from the above linked bug:

> It's possible that the patches which were linked are not related. They're not in the delta between the two versions which are isted.

I see, I did not realize the kernel versions (using a dot for the last digits) are different from the ubuntu ones (using a -).

> 1. You mention that this occurs on various upgrades, could you confirm that these would be various different kernel versions over time, and not ONLY on the 46 to 47 update?

Indeed I tested it with each new update from 47 to 50 (current one) and the issue is there on all of them while booting into 46 works fine

> 2. Have you seen this when doing any other reboots (not after system updates)?
> (such as might occur if this was a boot race - which could only shows up because the first boot after an update performs additional operations)

No, booting with 46 always bring the md0 array up while others 47-50 always fail to do so.

> 3. When this occurs you mention that you see the 'M' for manual recovery, do you also see the MD "degraded raid" prompt and if so how do you respond?

I did not try manual recovery (did not want to broke the array by manipulating it with potentially "unsafe" environment)

> 4. On taking the 'M' that option LVM slices are missing which are served from the RAID, can you provide the following information for missing LVs (backed by md0):

I do not have LVM on my RAID device as the original reporter of the other bug. One of the following applies however:

  E) what is the actual state of md0 as show in 'cat /proc/mdstat'?

md0 is not listed at all in /proc/mdstat

Furthermore I attach (mdadm-examine-broken.txt) the output of mdadm --examine on one of the RAID elements (/dev/sda6) which looks quite weird to me with a long series of failed where there should be only two elements in the array.

A1an (alan-b) wrote :

Just rebooted into -46 and noticed that the output of mdadm --examine /dev/sda6 seems exactly the same and shows the same long series of "failed" (so it might be normal for 1.2 metadata?).

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: bios-outdated-0336
Stefan Bader (smb) wrote :

Lucid server is supported until 2015 and I would tend to see this as a server issue. So since this can be reduced to md problems, the fact that /proc/mdstat is empty would lead to assume that something during the assembly failed completely. Despite the weird output of mdadm --examine (sounds really like some tool failure back in lucid) does looking at the other mirror half produce the same/similar output (assume that would be on sdc)?

Maybe also remove the "quiet" command line option in grub before booting (if that still hides too much output, experiment with removing splash as well, though I think to remember that may cause the question about skipping or manual recovery not to appear on screen on some older releases (thoug not remembering which one)).

Joseph Salisbury (jsalisbury) wrote :

Could be a similar issue in Precise, see bug 1210104

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Stefan Bader (smb) wrote :

Yes, could be related but also is a slightly more complicated setup as the imsm container is used (fakeRAID with some support in the BIOS from Intel).

Tried to reproduce in a VM but without success, yet. Though one "interesting" note is that "mdadm --misc --scan --detail" would somehow report a metadata version of 01.02 (instead of 1.2) which when added to /etc/mdadm/mdadm.conf causes mdadm to complain about an unknown format (though the array still gets assembled).

Can we get the output of "mdadm --misc --scan --detail" for the working and non-working kernel and the contents of /etc/mdadm/mdadm.conf?

Stefan Bader (smb) wrote :

Looks like the --scan command might be useless as it seems only to work properly when the array is defined correctly in mdadm.conf. All my devices can be examined with --examine and do not show those odd "failed" lines. While not being able to reproduce the exact issue, removing the array definition from mdadm.conf gets me into a similar problem and mdadm seems not to work right at that point. But that happens with the old -46 and newer -50 kernels.

A1an (alan-b) wrote :

Attached the --examine for the other component of the array (sdc1), it shows similar output.

A1an (alan-b) wrote :

@Stefan: --scan command might be useless but thanks anyway since it helped me noticing the issue that make me solve it :)

Here is the output of the scan command and the relevant content of the config file:
$ sudo mdadm --misc --scan --detail
ARRAY /dev/md0 level=raid1 num-devices=2 metadata=01.02 name=alan-desktop:0 UUID=d5119bd2:7491e0f3:df5db333:6e036bee
ARRAY /dev/md3 level=raid1 num-devices=2 metadata=01.00 name=pca3:3 UUID=154f648b:6076c08b:76f5610a:9fbce7c7

$ cat /etc/mdadm/mdadm.conf | grep ARRAY
ARRAY /dev/md/0 level=raid1 metadata=1.2 num-devices=2 UUID=d5119bd2:7491e0f3:df5db333:6e036bee name=pca3:1
ARRAY /dev/md/3 level=raid1 metadata=1.0 num-devices=2 UUID=154f648b:6076c08b:76f5610a:9fbce7c7 name=pca3:3

As you can see there is a mismatch in the configured and scanned names, the latter coming from the array creation since long ago. Now, I have no idea why they differ however changing the one in the config file to match the scanned one made the array build up at boot again. Just confirmed with 2.6.32-50-generic

So, to reproduce the issue its probably enough to change the array name into mdadm.conf.
A last notice/recommendation: I do not understand why there is a need for array information to be placed into a configuration file and even more why to rely so much on it, such a config file will belong to a single system while arrays would be used also in multi-boot environments or have to be read also from recovery system used to boot a machine. I think relying on metadata on the drives only to be a far better approach.

Thank you for helping in solving the issue and let me know if you need more info for further development.

A1an (alan-b) wrote :

Just curious, why is that still incomplete? Any info still missing?

tags: added: needs-bisect
Changed in linux (Ubuntu):
status: Incomplete → Confirmed

A1an, as per https://downloadcenter.intel.com/SearchResult.aspx?lang=eng&ProductFamily=Desktop+Boards&ProductLine=Intel%C2%AE+5+Series+Chipset+Boards&ProductProduct=Intel%C2%AE+Desktop+Board+DP55WB an update is available for your BIOS (0336). If you update to this following https://help.ubuntu.com/community/BiosUpdate , does it change anything? If it doesn't, could you please both specify what happened, and just provide the output of the following terminal command:
sudo dmidecode -s bios-version && sudo dmidecode -s bios-release-date

For more on BIOS updates and linux, please see https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette .

Thank you for your understanding.

description: updated
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.