maps file not getting copied from initramfs to real rootfs (loosing state, race, misconfig)

Bug #550131 reported by ceg on 2010-03-28
36
This bug affects 7 people
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
Undecided
Surbhi Palande
Lucid
Undecided
Unassigned
Maverick
Undecided
Unassigned

Bug Description

Binary package hint: mdadm

"mdadm --incremental" will save state in a map file under /var/run/mdadm/map.

But in initramfs and early boot this directory does not exist.

The state is then saved in /var/run/mdadm.map.new (man page incorrectly says /var/run/mdadm.map) and carried into the main system later on. This means hotplug devices showing up later can not be matched by --incremental correctly. (with a strange device busy failure)

This can be fixed as explained by Jools in the comments, and as he did in his package.

ceg (ceg) on 2010-03-28
summary: - boot: initramfs missing dir /var/run/mdadm
+ initramfs missing /var/run/mdadm dir (loosing state)

this issue can be identified from the initramfs by checking the current md devices. Only one of the disks in the array is active and incremental adding fails with error, /dev/md/d0 device already exists.

one time workaround is to stop the existing array and manually assemble the array without --incremental.

Changed in mdadm (Ubuntu):
status: New → Confirmed
ceg (ceg) wrote :

confirmed from Bug #541058 and #136252

description: updated
ceg (ceg) on 2010-03-30
summary: - initramfs missing /var/run/mdadm dir (loosing state)
+ initramfs missing /var/run/mdadm/ dir (loosing state, race, misconfig)
ceg (ceg) on 2010-04-15
description: updated

@ceg, when /var/run/mdadm is not present, mdadm creates a map in /var/run/. Please refer to man mdadm for more information (or mapfile.c::map_read()/map_write() in mdadm code). Can you please explain why this is causing a problem? Ideally it should not cause a problem.

Changed in mdadm (Ubuntu):
status: Confirmed → Incomplete
Jools Wills (jools) wrote :

Why has this been set to incomplete? The initial explanation is quite clear. /var/run doesn't exist at the initramfs stage and it will fail. If you updated to a new mdadm, this will use /dev/.mdadm folder for the map file - this exists throughout the initramfs and is carried over once root is mounted/booted. Tou could also fix this easily enough in the current version, but this is very old now.

from the number of bugs relating to mdadm (not surpising since the version in ubuntu is so old and mismatched against the kernel version), and being such an important core package, i would suggest mdadm gets a high priority and someone to look at it. It is also an example of where ubuntu decides to do something different from debian (using udev back when debian wasn't) but lacked the people to maintain the modified package :/

in the meantime those who want can try my lucid packages, which fixes at least a few mdadm bugs on launchpad (including this one).. (no guarantees though but I use it myself).

http://malus.exotica.org.uk/~buzz/mdadm/lucid/

Changed in mdadm (Ubuntu):
status: Incomplete → Confirmed
Jools Wills (jools) wrote :

I explained slightly wrong, since my memory was vague. The old mdadm 2.7.1 in ubuntu probably makes the map file ok, but it won't get carried across onto the rootfs. mdadm 3.1.2 wanted to place the map file in a different location and failed. my solution at the time was to change the location, and then copy it over to the rootfs once it is mounted. an easier solution (and the default with upstream mdadm 3.1.4) is to use /dev/.mdadm for the map file. as the devices is carried over to the rootfs from the initramfs, this works better.

Surbhi Palande (csurbhi) wrote :

@Jools Will, 2.7.1, creates the map file in /var/run/ instead of /var/run/mdadm as /var/run/mdadm is not found in the initramfs. However /var/run is found in initramfs and so the map file should be created fine. This is why I have marked the bug as Invalid. Current Ubuntu is still at 2.7.1 Can you please give an example as to how the absence of /var/run/mdadm leads to loosing state, race, misconfig? (as /var/run will have the maps file). I am for now marking the bug as incomplete!

Changed in mdadm (Ubuntu):
status: Confirmed → Incomplete
Jools Wills (jools) wrote :

Please demonstrate to me where the map file created in initramfs is then copied to the root filesystem so it is available for mdadm at that stage. It isn't! I'm not going to play the game setting the bug to confirmed/invalid back and forth.

to fix this you need to either copy the map file to the root fs once it is mounted, or place the map file location in /dev/.mdadm for example.

This is a bug. If it wasn't then I wouldn't need to make my own packages to fix it. (which work much better than ubuntu's).

ceg (ceg) wrote :

Surbhi, you actually have an ubuntu system on raid installed and the mapfile is carried over?
Then please state the exact version and details and say it works for you.
If not please leave the bug filed as it was, documenting a bug in the ubuntu mdadm package.

Changed in mdadm (Ubuntu):
status: Incomplete → Confirmed
description: updated
Surbhi Palande (csurbhi) wrote :

@ceg, @Jools Wills, I will love to help to solve this bug. For that I need to know what the problem is. The summary says that the maps file cant be created and thus it leads to loosing state, race, misconfig? If there is an example for it, then I can get a better understanding of what really is happening and then it will be easier to solve this.

As per marking the bug "Incomplete" -> Its done when more information from the user is requested!

Thanks very much for helping!

Surbhi Palande (csurbhi) wrote :

@ceg, @Jools Wills, I see what you are saying! I will try to fix this up soon and post updates here :) Thanks a tonnes!

Surbhi Palande (csurbhi) on 2010-09-17
summary: - initramfs missing /var/run/mdadm/ dir (loosing state, race, misconfig)
+ maps file not getting copied from initramfs to real rootfs (loosing
+ state, race, misconfig)
Changed in mdadm (Ubuntu):
status: Confirmed → In Progress
assignee: nobody → Surbhi Palande (csurbhi)
Stephan B (strushb) wrote :

Thank you for getting into this, Surhbi. I managed to hack a solution into my initramfs scripts, maybe you can extract something useful for the "good solution" : )

In /usr/share/initramfs/hooks/mdadm: Create /var/run/mdadm in initramfs
> mkdir -p ${DESTDIR}/var/run/mdadm

Create /usr/share/initramfs/scripts/init-bottom/mdadm: Script that copies /var/run/mdadm to the real /var/run/
> #! /bin/sh
>
> case $1 in
> prereqs)
> exit 0
> ;;
> esac
> if [ -r /var/run/mdadm/map ]; then
> mkdir -p /dev/.initramfs/varrun/mdadm
> cp /var/run/mdadm/map /dev.initramfs/varrun/mdadm/
> fi

Using 10.04 / Lucid, /etc/init/mounted-varrun.conf then copies the mapfile to the newly created /var tmpfs. HTH

Jools Wills (jools) wrote :

just to note with new mdadm 3.1.4+ it uses /dev/.mdadm/map for the location which is available from the initramfs stage and once root is mounted.

Surbhi Palande (csurbhi) wrote :
Download full text (4.8 KiB)

Call for testing mdadm 2.7.1 autoassembly.

For hitherto Ubuntu releases the mdadm package shall stay at 2.7.1 However Natty would have mdadm at 3.4.1. This document is intended to test the mdadm fixes for 2.7.1. Here is the rough procedure that needs to be followed:

Testing auto-assembly of your md array when your rootfs lies on it:
1)Install the mdadm package and initramfs package kept at: https://edge.launchpad.net/~csurbhi/+archive/mdadm-autoassembly
2)Run /usr/share/mdadm/mkconf and ensure that your /etc/mdadm/mdadm.conf has the array definition.
a) Save your original initramfs in /boot itself by say /boot/initrd-old.img.
b) Then run update-initramfs -c -k <your-kernel-version>. Store this iniramfs as /boot/initrd-new.img. We shall use this initramfs as a safety net. If you cannot boot with the auto-assembly fixes, then you should not land in a foot in your mouth situation. Through grub's edit menu, you can then resort to this safety net by editing the initrd=initrd-new.img (or if this does not work for some random reason then resort back to your older initrd=initrd-old.img) This way you will be sure that you can still boot your precious system.
c) Now comment or remove the ARRAY definitions from your /etc/mdadm/mdadm.conf and once again run the same “update-initramfs -c -k <your-kernel-version>” to generate a brand new initramfs.
3)Run mdadm –detail –scan and note the UUIDs in the array. Note the hostname stored in your array. Does it not match with your real hostname? Then we can fix that at the initramfs prompt that you inevitably will land at if you try auto-assembly. Also note the device components that form the root md-device. Keep this paper for cross checking when you reboot
4)Reboot.
5)If you are at the initramfs prompt here are the things that you should first ensure:
a) ls /bin/hostname /etc/hostname - are these files present?
b) run “hostname”. Does this show you the hostname that your system is intended to have? Is it the same as the contents of /etc/hostname.
c) ls /var/run – Is this dir there?
If you answer yes to the above three questions, then things are so far so good. Now run the following command:
mdadm –assemble -U uuid /dev/<md-name> <dev-components-listed here>
Your mdadm –detail –scan that you ran previously should have given you the component names if you dont know it right now. Hopefully you have them listed on your paper.
Eg in my case I ran:
mdadm –assemble -U uuid /dev/md0 /dev/sda1 /dev/sdb1
Again run:
mdadm –detail –scan <md-device> and verify that the uuids are indeed updated and the hostname reflects the hostname that is stored /etc/hostname. You can now press Ctr+D and you should come back to the root prompt. However you still need to test auto-assembly of your root md device. To do that simple reboot and you should not see the face of initramfs this time. You should land gently on your root prompt as you expected. If you do not see the light of the rootfs prompt this way or using this initramfs, then as mentioned earlier, please avail your saved initrd images through grub. Skip the further steps in this case. Update the launchpad bugs, saying you could not get to the root pr...

Read more...

Surbhi Palande (csurbhi) wrote :

@Jools, @ceg, Thanks a lot for your insightful comments and making Ubuntu better. I have done the fixes mostly, based on these comments. The patches are based on the source code in neil-browns git repository and a few initramfs fixes which are applicable to Ubuntu (which are again based on the bugs on launchpad). Do let me know the output of the test ppas. At present these are only for maverick. I shall be uploading the ones based for lucid soon enough! Thanks again :-)

Surbhi Palande (csurbhi) wrote :

@Stephan B,
Yes, you are absolutely correct. I have done something similar in initramfs kept at the above mentioned ppa. Thanks a lot for the valuable suggestion. Please do let me know if the ppa works for you or not :)

ceg (ceg) wrote :

Surbhi you're very welcome. I am glad if some of our analysing and comments turned out to be helpful for you, and you are actually about to get ubuntu's raid setup into shape. https://wiki.ubuntu.com/ReliableRaid I think with adopting the upstream version and mechanisms you're already clearing the way of many of the noted points for a reliable (cold & hotplug) raid setup in ubuntu (and debian when they switch to an event based boot process).

Steve Langasek (vorlon) on 2011-06-07
Changed in mdadm (Ubuntu Lucid):
status: New → In Progress
Changed in mdadm (Ubuntu Maverick):
status: New → In Progress
Steve Langasek (vorlon) wrote :

Surbhi,

I'm afraid I don't understand the fix for this bug in https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/136252/+attachment/2139759/+files/mdadm_lp_bug_fixes.debdiff. The changelog comments say that the map will be rebuilt - but isn't this bug about needing to preserve the *state* from /var/run in the initramfs? How does rebuilding the map accomplish this?

I understand that this is a backport from mdadm 3.1.4; I just am not sure how this change addresses the issue reported here.

A test case for this particular bug would also be helpful for the SRU process, if someone can describe how to reproduce the original error (presumably in a test environment).

Surbhi Palande (csurbhi) wrote :

@Steve, Sorry for the delay in my reply.
As far as I can remember, the map is lost when we pivot to the *new* rootfs. The lost map is then reconstructed by calling the rebuild code. This map is what is needed for the proper reconstruction of the raid array.

Dimitri John Ledkov (xnox) wrote :

Does this bug still require an sru to Lucid/Maverick?

Steve Langasek (vorlon) wrote :

Well, not for maverick; we won't take any more SRUs to maverick.

At this point, we probably want this issue fixed for quantal,precise,lucid.

Changed in mdadm (Ubuntu Maverick):
status: In Progress → Won't Fix
Rolf Leggewie (r0lf) wrote :

lucid has seen the end of its life and is no longer receiving any updates. Marking the lucid task for this ticket as "Won't Fix".

Changed in mdadm (Ubuntu Lucid):
status: In Progress → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers