Udev has a race condition with device mapper devices

Bug #631795 reported by Tom Louwrier
66
This bug affects 11 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
High
Unassigned
udev (Ubuntu)
Triaged
High
Unassigned

Bug Description

When using lvm and/or dmraid devices, often times during boot up udev prints messages like:

udevd-work[72]: inotify_add_watch(6, /dev/dm-6, 10) failed: No such file or directory

I believe this is caused by a race condition somewhere causing udev to attempt to access the newly created dm device before the dev node actually appears in /dev. Sometimes it seems to get as far as running blkid before the dev node appears, causing it to fail to identify the UUID of the device and create the /dev/disk/by-uuid symlinks.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Hi Tom,

Please be sure to confirm this issue exists with the latest development release of Ubuntu. ISO CD images are available from http://cdimage.ubuntu.com/daily/current/ . If the issue remains, please run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 631795

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

    [This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-kernel-logs
tags: added: needs-upstream-testing
tags: added: kj-triage
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Dependencies.txt

apport information

tags: added: apport-collected
description: updated
description: updated
Revision history for this message
Tom Louwrier (tom-louwrier) wrote : AcpiTables.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : AlsaDevices.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : AplayDevices.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : ArecordDevices.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : BootDmesg.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Card0.Amixer.values.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Card0.Codecs.codec97.0.ac97.0.0.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Card0.Codecs.codec97.0.ac97.0.0.regs.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Dependencies.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Lspci.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Lsusb.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : PciMultimedia.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : ProcModules.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : UdevDb.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : UdevLog.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : WifiSyslog.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Re: [MAVERICK] fails to boot after install on dmraid set

OK, here's what I did.
Managed to get the system running by doing a rescue from the alternate install CD. Dropped into a root session, installed as much packages from ubuntu-desktop by hand (several attempts) and rebooted. At first I got the same errors, but to my surprise when I left it standing for a while, it got on with a reasonably normal boot and a X/Gnome login. Had to go back to the CD and re-add myself as user. Could log on then, but with no sudo or other rights. Back to the CD and added myself to sudoers and admin group.
This was all on kernel 2.6.35.19.

Yesterday I saw kernel 2.6.35.20. arrive. Updated, rebooted and found that it wouldn't boot. Lots of errors on udev and the kernel disagreeing about devices called dm-5, dm-2 etc ,then busibox from initramfs. So rebooted with 2.6.35.19 which worked.
This morning there was an update to udev, which rang a bell with this bug (I thought). Installed it, rebooted 2.6.25.19, all fine. Decided to reinstall 2.6.35.20, in order to force update-initramfs on that kernel as well. Install didn't really work out without errors, but could boot kernel 2.6.35.20. In fact running it right now.

Apart from the kernel install failing partly, update-grub starts complaining now as well about not being able to find its devices:
====
thomas@tomdesktop:~$ sudo update-grub
[sudo] password for thomas:
Generating grub.cfg ...
/usr/sbin/grub-probe: error: cannot find a GRUB drive for /dev/mapper/pdc_jcchiiab5. Check your device.map.
====

I can not really reproduce this error, and I'm not sure whether it's the kernel, udev, dmraid or grub that's not working well, or all of them not playing nice. I do know however that booting off a dmraid set has been very broken since 10.04 (which is a LTS).

I'll be checking on this whenever I feel I see something relevant, and do my updates twice a day. I'll try the latest kernel some time next weekend and report on that.

If there's anything else, like more log files, let me know.

cheers
Tom

Revision history for this message
Tom Louwrier (tom-louwrier) wrote :

hi Jeremy,

Got the latest mainline kernels and tried installing them. Unfortunately I running into a bug with Grub that prevents me to update it.
See LP 634840.
So I cannot test new kernels until I get Grub to boot off them. Sorry.

cheers
Tom

Revision history for this message
Tom Louwrier (tom-louwrier) wrote :

update:
After a couple of 'recovery - repair broken packages - reboot' sessions I have the system booting reasonably well, most of the time.
So I'm changing the bug title from 'fails to boot' to 'does not boot reliably'.

I still see errors from udevd about not being able to find dm-6.
Also Grub-pc refusing to find my dmraid partitions blocks me from finishing installing updates to the kernel of adding any new kernel.

cheers
Tom

summary: - [MAVERICK] fails to boot after install on dmraid set
+ [MAVERICK] does not boot reliably after install on dmraid set
Revision history for this message
Christophe Van Reusel (christophevr) wrote : Re: [MAVERICK] does not boot reliably after install on dmraid set

Hello Tom

Just for info, Did You try once during installation off maverick to partition HD with ext3 instead of ext4 ?

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Re: [Bug 631795] Re: [MAVERICK] does not boot reliably after install on dmraid set

hi Christophe,

No, I've moved to ext4 since Karmic, about a year ago, as soon as Grub2
could handel booting off that. Works fine on my laptop.
Never tried ext3 on this computer (my desktop).

gr
Tom

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Re: [MAVERICK] does not boot reliably after install on dmraid set

Little update:
Found some time and edited grub.cfg by hand to include new kernels as boot option.
Running 2.6.35-22-generic now, booting reasonably well. Still some moaning at startup from udevd about missing devices dm-3 and dm-6. Usually it just goes on and I get to use my system. If not I reboot into recovery mode and update packages from there. That usually helps.

Tried mainline kernel 2.6.36.999 dated 12 and 14 sept 2010; kernel panic. Either something in the kernels not right, or the installation of them went wrong. Won't knoe for sure until grub gets fixed and I can clean up dpkg so it does not end with error messages.

cheers
Tom

Revision history for this message
Christophe Van Reusel (christophevr) wrote :

Little update by me.

before I start using ext4 I did i bios upgrade (gigabyte ep45 MB) . But I forgot to reload bios optimized settings after this update was done, which causes the bios to repoll all your chips to configure the correct settings. After I did that and just remodified the settings I need such as AHCI sata support. Everything runs fine. No single freeze or problem with ext4 anymore from kernel 2.6.35-22.
So I guess it was my own stupidity just to forget that optimizing point. Unfortunatelly A lot of multimedia applications do not run anymore or not very well cause they removed all oss emulation into kernel 2.6.35-22 . So yess I now use my own compiled kernel 2.6.35-22 but included oss support. Now everything runs fine.

gr christophe

Revision history for this message
Tom Louwrier (tom-louwrier) wrote :

update again:
After grub got fixed properly last week (thanx guys!) I just installed the latest mainline kernel and headers (linux-image-2.6.36-999-generic_2.6.36-999.201009240945_i386). Installed and rebooted no problems, but still see udev complaining about device dm-3,4, or 6 as not being available. It's not the same device every time, it changes.
Also every once in a while startup get stopped with a warning that serious errors are detected on my filesystems (ignore, skip mounting or repair manually) so I suspect something goes wrong at shutdowns. But I'm not at all sure what it is or what I should do about it. Might very well be another bug and not related to this one.

Let me know what you need me to check next.

cheers
Tom

Revision history for this message
Tom Louwrier (tom-louwrier) wrote :

sorry, forgot to remove the 'needs upstream testing' tag.
done now.

tags: removed: needs-upstream-testing
Revision history for this message
Tom Louwrier (tom-louwrier) wrote :

Booting goes fine about 4 times out of 5. All other behaviour is unchanged. (still: Maverick is a fine release!)

cheers
Tom

Revision history for this message
Phillip Susi (psusi) wrote :

I'm having trouble following exactly what the problem is and what you've done. At one point during the maverick cycle there were some warnings about /dev/dm-x, but they did not seem to cause a problem and I believe have been fixed. If you do a clean install of 10.10 from the live cd do you have any issues?

Changed in dmraid (Ubuntu):
status: New → Incomplete
Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Re: [Bug 631795] Re: [MAVERICK] does not boot reliably after install on dmraid set

hi Phillip,

Sorry, it was a very messy install indeed due to all the problems I had
with setting (dual-boot) up on a fakeraid. You and Curtis helped solve
those, so I guess you remember. Thanx, btw.
At the moment I have the system running Maverick, still see the messages
at every boot but can't find them in log files. Would have posted them
here earlier if I had.
Usually it's 3 times the same message, sometimes just once. About once
every 4 boots the system drops into a busybox command prompt and for me
that means reboot and give it another go. Tomorrow I'll take a picture
of my screen and post that here. Very advanced way of communicating,
right? ;-)

I should try a fresh install, also to see if all related issues in
parted and grub are now indeed solved.
That's a bunch of work though and at the moment I'm really quite busy.
I'll try to sueeze it in sometime next week. You'll hear it if and when.

cheers
Tom

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Re: [MAVERICK] does not boot reliably after install on dmraid set

OK, I should have done this earlier. The mesage is:

udevd-work[80]: inotify_add_watch(6, /dev/dm-6, 10) failed: No such file or directory

Today this was the only message I got, other days there are three, often pointing towards /dev/dm-3.
See attatched screen shot. At least the picture was taken digitally :-)

cheers
Tom

Revision history for this message
Tom Louwrier (tom-louwrier) wrote :

update:
Still haven't done a complete reinstall, but had a new kernel last week.
Problem persists, system can't find some of its file systems. Every once in a while it happens to be the root fs and of course it then fails to boot and I get a busybox. Rebooting usually 'solves' it.

See attached screen shot.

cheers
Tom

Revision history for this message
Phillip Susi (psusi) wrote :

Hrm... when you get dropped to the busybox shell, give it a few seconds then type exit rather than reboot and see if it fires up then.

Revision history for this message
Tom Louwrier (tom-louwrier) wrote :

Tried that, no joy. Kernel panic.
See attached screenshot.

cheers
Tom

Revision history for this message
Phillip Susi (psusi) wrote :

What about ls /dev/mapper?

Revision history for this message
Tom Louwrier (tom-louwrier) wrote :

Everything seems to be there:

(initramfs) ls /dev/mapper
control pdc_jcchiiab3 pdc_jcchiiab1 pdc_jcchiiab5
pdc_jcchiiab pdc_jcchiiab7 pdc_jcchiiab6

However an exit gives me a kernel panic (still / again)
See attached.
Does this make any sense to you?

cheers
Tom

Revision history for this message
Phillip Susi (psusi) wrote :

And which one is specified on the root= kernel argument in your grub.cfg?

Revision history for this message
Tom Louwrier (tom-louwrier) wrote :

set root='(hd2,msdos5)'
(see attached grub.cfg)

cheers
Tom

Revision history for this message
Phillip Susi (psusi) wrote :

root=UUID=6fbcfa16-341a-4ad0-939f-7ff9933cfe3f is actually the one I was looking for. What does sudo blkid show? It should identify one of the /dev/mapper partitions as having that UUID.

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Re: [Bug 631795] Re: [MAVERICK] does not boot reliably after install on dmraid set

thomas@tomdesktop:~$ sudo blkid
[sudo] password for thomas:
/dev/sda: TYPE="promise_fasttrack_raid_member"
/dev/sdb: TYPE="promise_fasttrack_raid_member"
/dev/mapper/pdc_jcchiiab1: UUID="CC60D6CC60D6BC80" TYPE="ntfs"
/dev/mapper/pdc_jcchiiab3: LABEL="DATA" UUID="C020A07420A072D8" TYPE="ntfs"
/dev/mapper/pdc_jcchiiab5: UUID="6fbcfa16-341a-4ad0-939f-7ff9933cfe3f"
TYPE="ext4"
/dev/mapper/pdc_jcchiiab6: UUID="dfcb0087-c029-42c1-a18f-66dd26bacf6f"
TYPE="ext4"
/dev/mapper/pdc_jcchiiab7: UUID="b17395f5-231b-44d2-b86d-3c2f867ac46c"
TYPE="swap"
thomas@tomdesktop:~$

So it's /dev/mapper/pdc_jcchiiab5, which in Gparted and other tools
shows up correctly as my root fs.
To be complete (and as described sometime ago in another bugreport) my
partitions are setup as follows:
pdc_jcchiiab1: primary partition, ntfs, WinXP
pdc_jcchiiab2: extended partition, container for 5,6,7
pdc_jcchiiab3:primary partition, ntfs, DATA (to be accessed from either OS)
pdc_jcchiiab5: logical partition, ext4, /
pdc_jcchiiab6: logical partition, ext4. /home
pdc_jcchiiab7: logical partition, swap

cheers
Tom

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote : Re: [MAVERICK] does not boot reliably after install on dmraid set

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu development release http://cdimage.ubuntu.com/daily-live/current/ . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Tom Louwrier (tom-louwrier) wrote :

Sorry, but this issue still exists, in fact I had to reboot 3x this morning before getting here.
To add information: I recently got another machine which we use als develop/test box. It also has DMraid, but differs from my own setup:
my machine: Maverick 32bit, Fasttrack100, striped set, PATA disks, dual booting XP
new machine: Lucid 32 bit, Intel chipset, mirrored set, SATA disks, no other OS

Both systems are fully updated, both show the same behaviour and error messages.
I will add detailed information from the other machine through apport. What else should I check and report here?
I'm not ready to move to Natty on my own machine, but may do so on the new box if required. It's just a test hack.

Setting status to confirmed, since I now have two systems with this issue.

Anyone?

cheers
Tom

Revision history for this message
Tom Louwrier (tom-louwrier) wrote :

resetting status after it expired automagically. setting confirmed since another, non identical, system has same issue.

Changed in linux (Ubuntu):
status: Expired → Confirmed
Changed in dmraid (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Tom Louwrier (tom-louwrier) wrote : AlsaDevices.txt

apport information

description: updated
Revision history for this message
Tom Louwrier (tom-louwrier) wrote : AplayDevices.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : ArecordDevices.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : BootDmesg.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Card0.Amixer.values.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Card0.Codecs.codec.0.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Dependencies.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : IwConfig.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Lspci.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : PciMultimedia.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : ProcModules.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : UdevDb.txt

apport information

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : UdevLog.txt

apport information

Revision history for this message
Phillip Susi (psusi) wrote : Re: [MAVERICK] does not boot reliably after install on dmraid set

Please check the following in the busybox shell on a failed boot:

ls /dev/mapper
ls /dev/disk/by-uuid
blkid

The question is whether the disks show up ( in /dev/mapper ), whether their uuid links show up ( in /dev/disk/by-uuid ), and finally, if blkid detects the disk and its uuid.

Revision history for this message
Tom Louwrier (tom-louwrier) wrote :

hi Phillip,

I got some failed boots yesterday. Before that the usual moaning, but it pulled through in the end. My daily updates included a new kernel, and that sure made it easier to reproduce failure :-)

After rebooting into the new kernel, /home could not be mounted. I chose M for Manual recovery and got the information you asked me in #63 (pic 3493). Btw, I hope you don't mind me posting some photo's again here, it's quite a lot of info to type manually and I can't copy and paste it easily.
Rebooted again, failed again. To be sure I got the same info (pic 3494).
Rebooted again, failed again. First some of the 'normal' messages which I managed to capture (pic 3495), then it stumbled on a damaged root fs (pic 3496).
Rebooted again, failed again. This time it could not mount root and I got a Busybox. Did the checks (as in pics 3497 and 3498.)
Finally I rebooted into the previous kernel and that got me a working system.

As far as I can see the results are pretty consistant. I do hope this makes more sense than it makes to me.
At least we can now provoke the failure quite reliably...

The second machine I'm using usually gives messages like in pic 3495, but then does boot OK.
Anything more you want me to check, let me know.

cheers
Tom

Revision history for this message
Tom Louwrier (tom-louwrier) wrote :
Revision history for this message
Tom Louwrier (tom-louwrier) wrote :
Revision history for this message
Tom Louwrier (tom-louwrier) wrote :
Revision history for this message
Tom Louwrier (tom-louwrier) wrote :
Revision history for this message
Tom Louwrier (tom-louwrier) wrote :
summary: - [MAVERICK] does not boot reliably after install on dmraid set
+ [LUCID, MAVERICK] do not boot reliably on dmraid set
Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Re: [LUCID, MAVERICK] do not boot reliably on dmraid set

update:
Have not had a single boot go right since above posts (18 jan). Try every other day, go into recovery mode and load whatever updates there are and hope for the best. I've got no idea what caused this intermittent issue to become a complete show stopper, but it must have been in the updates between 13 and 18 jan 2011.

regards
Tom

Revision history for this message
Tom Louwrier (tom-louwrier) wrote :

Update:

Yesterday I had to try some combo of software that would not install on Lucid together. So I got adventurous and did a two-stage upgrade of my testing system from Lucid to Maverick and then on to Natty. This is the machine with the Intel chipset as mentioned in post #45.

Of course Grub did not install on the right device twice, so I had to boot off an USB-stick to rescue a broken system and reinstall it manually from a root shell. (Where can I log a bug report on that? Grub, grub installer or Ubiquity? The server install CD does not use a GUI.)

Anyway, in all cases the error messages during booting remain, so I can report that this issue is also present in Natty.
The Intel machine is running on Natty now, giving the usual complaints during boot (similar to those in post #66).
The machine with the Promise chipset fails to mount its file systems every time, leaving it useless for now. Have resorted to XP which runs ok, so no hardware issue.

If there's anything I need to check and report back here, let me know.

cheers
Tom

summary: - [LUCID, MAVERICK] do not boot reliably on dmraid set
+ [LUCID, MAVERICK, NATTY] do not boot reliably on dmraid set
Revision history for this message
Phillip Susi (psusi) wrote : Re: [LUCID, MAVERICK, NATTY] do not boot reliably on dmraid set

If you did an upgrade instead of install, then Ubiquity was not involved. If grub did not install to the correct device on upgrade, then it probably was not configured correctly in the first place. If you manually installed it before, that would have masked the configuration problem. Normally when you install, grub is configured to remember where and reinstall there on upgrade, but this does not happen when you manually invoke grub-install.

What was the raid configuration again? Raid0? Also can you try booting with the nosplash and noquiet options and see if you get any more detailed messages? I strongly suspect that the inotify error is irrelevant since I usually get them myself and it doesn't seem to cause a problem.

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Re: [Bug 631795] Re: [LUCID, MAVERICK, NATTY] do not boot reliably on dmraid set

hi Phillip,

This machine (the intel chipset one) has 2 discs in raid1, so a mirrored
set.
 From Lucid to Maverick was an inplace upgrade through upgrade manager.
That left the machine non bootable so I put a USB stick in and booted
Natty server (32-bits). Installed Natty from the stick but without
formatting existing partitions.
Same result: not booting, so again booting from USB, rescue a broken
system, drop into a root shell, grub-install on
/dev/mapper/isw_cfaeccihhh_RAID_MIKE.
That fixed the booting side of things.

I would expect Grub to know and use (or offer the option to use) the
manually installed location because such a deviation would be done to
correct a problem such as mine. It would likely be the last known
working configuration, and resetting to a previous location should not
be done without at least asking me.
As I understand you this would mean that a defect in the automatic
installation long time ago would still be coming back on every upgrade /
re-install?

The moaning and groaning during boot used not to be too serious,
depending on what device couldn't be found. If /home, swap or DATA yes,
then it would pull through and boot. If / then no joy, busybox and
reboot, try again.
So I guess yes it is relevant, or at least a symptom.

I'll see what information I can get for you later this week (any
specific logs you would like to see?)
If I find time next weekend I'll do a full install, formatting the fs
and all, of both Maverick and Natty, see what happens and report back.

cheers
Tom

PS: The other machine, which is the one that I started this bug report
with, has a Promise Fasttrak chipset, 2 discs in a striped set ,raid0,
Maverick 32bits. It's been non-usable since jan18th, but I can get it
into a running state, so if you want logs or other info on that one, plz
ask.
If need be I can do a full re-install on this machine as well, since I
do have /home and DATA on separate partitions. Thing is that on Maverick
Gparted is still 0.6x and the problem with dmraid was fixed in Gparted
0.7 so formatting the partitions won't work and needs to be done from a
separate session booting off either a Gparted or a Natty CD.
I realise it's a bit messy to report on 2 separate machines in one
thread, but they share the same behaviour and they are both running dmraid.

Revision history for this message
Phillip Susi (psusi) wrote : Re: [Bug 631795] Re: [LUCID, MAVERICK, NATTY] do not boot reliably on dmraid set

On 3/8/2011 11:24 AM, Tom Louwrier wrote:
> That left the machine non bootable so I put a USB stick in and booted
> Natty server (32-bits). Installed Natty from the stick but without
> formatting existing partitions.
> Same result: not booting, so again booting from USB, rescue a broken
> system, drop into a root shell, grub-install on
> /dev/mapper/isw_cfaeccihhh_RAID_MIKE.
> That fixed the booting side of things.

IIRC, there is another bug about the alternate installer not giving the
choice of where to install grub to. Ubiquity gives a drop down list
that defaults to /dev/sda, which you need to change to the raid array.

> As I understand you this would mean that a defect in the automatic
> installation long time ago would still be coming back on every upgrade /
> re-install?

Yes, the grub package remembers where it was installed to and reinstalls
there again on upgrade. You can configure that location by running sudo
dpkg-reconfigure grub-pc.

> I'll see what information I can get for you later this week (any
> specific logs you would like to see?)

What I need is to figure out the state of affairs when you get dumped to
the initramfs. Specifically whether the array has shown up in
/dev/mapper and if not, why. Booting with the nosplash and noquiet
options might be helpful explaining that, followed by a cursory ls
/dev/mapper/ once at the initramfs shell.

Revision history for this message
Phillip Susi (psusi) wrote : Re: [LUCID, MAVERICK, NATTY] do not boot reliably on dmraid set

That last screen shot confirms it. The UUID link is not being created. That combined with the errors about /dev/dm-x make me think that there is a race condition causing udev to try accessing the dev node before the kernel has actually created it. Often times this just results in the inotify messages, but my guess is, that in your case udev is getting to the point where it runs blkid before the device node is ready so it fails to identify the UUID and create the link.

I'm going to try to dig more into this.

affects: dmraid (Ubuntu) → udev (Ubuntu)
summary: - [LUCID, MAVERICK, NATTY] do not boot reliably on dmraid set
+ [LUCID, MAVERICK, NATTY] udev has a race condition with device mapper
+ devices
Phillip Susi (psusi)
description: updated
Changed in linux (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → Medium
Changed in udev (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → Medium
Changed in linux (Ubuntu):
importance: Medium → High
Changed in udev (Ubuntu):
importance: Medium → High
Revision history for this message
Phillip Susi (psusi) wrote : Re: [LUCID, MAVERICK, NATTY] udev has a race condition with device mapper devices

Can you add "udev.log-priority=debug" to your kernel command line and try to catch a boot where /home fails to mount, and attach your /var/log/syslog when that happens?

Revision history for this message
Phillip Susi (psusi) wrote :

Tom, do you have the kpartx package installed? If so can you try removing it?

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Re: [Bug 631795] Re: [LUCID, MAVERICK, NATTY] udev has a race condition with device mapper devices

Nope, no kpartx installed.

Tried booting with the options you gave me, saw about 10 seconds and
hundreds of udev messages flash by and then it froze.
Will try again tomorrow and see if I can get past that point.

Attaching the most recent log files I found. The extension 'knip' means
'snip' because they contained logs dating back to january and are pretty
large. I removed everything before march 11, hence the 'snip'.

cheers
Tom

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Re: [Bug 631795] Re: [LUCID, MAVERICK, NATTY] do not boot reliably on dmraid set

hi Phillip,

Found some time to test the second machine as well (intel chipset, 2
discs in mirrored set).
Booted off a Natty daily usb (looking nice!) in order to have Gparted
0.7 which handles dmraid correctly.
Shrunk / fs and formatted to ext4.
Created an ext4 primary partition to be used as /home.
Created an ext4 primary partition to be used as DATA.
Created a swap space as an extended partition.
No spearate /boot, no Windows dual boot.

Shutdown, then booted Maverick Server 32bit from usb-stick.
Entire install went ok, selected manual partition and told it the
corerct fs and mount points. Didn't format, because that was already
done and Maverick comes with Gparted 0.6x.
Selected minimal system and no applications yet.
There were no glitches and Grub did install itself correctly on the
raid, not one of the individual disks. Exactly like you said it should.
Rebooted, got the familiar udev messages and..... failed to mount /home.
Chose Skip and could log in ok.
Rebooted, same story.

Will try the same boot options with udev logging like you asked for the
other machine and report back.

cheers
Tom

papukaija (papukaija)
tags: added: lucid maverick natty ubuntu-boot
removed: boot dmraid fakeraid linux
summary: - [LUCID, MAVERICK, NATTY] udev has a race condition with device mapper
- devices
+ Udev has a race condition with device mapper devices
Revision history for this message
Arie Skliarouk (skliarie) wrote :

I have an weird ussue that involves udev and device-mapper:
https://bugs.launchpad.net/ubuntu/+source/udev/+bug/797226

The bugs might be related.

Revision history for this message
Tais P. Hansen (taisph) wrote :

This sounds similar to what I experience.

On every cold boot, dmraid fails to create devices nodes for a secondary software raid 1+0 device, which means that the "failed to mount, s to skip mounting or m for manual recovery" message is shown. I have to hit M, type dmraid -an, dmraid -ay and exit shell again to have the system boot normally. (just typing dmraid -ay doesn't work - I have to deactivate raid sets first.)

Revision history for this message
Tom Louwrier (tom-louwrier) wrote : Re: [Bug 631795] Re: Udev has a race condition with device mapper devices

Good to know, I'll try that.
Still looking to make some time so I can get those debugging logs as
promised, but it's quite busy over here.

cheers
Tom

Revision history for this message
markusj (markusj) wrote :

I am affected by this issue in a slightly other way, but it first appeared since one of the last package updates (maybe one or two weeks ago, a new udev package has been there, too).

Setup: Ubuntun "Oneiric" 11.10 64Bit running a LVM in a LUKS/cryptsetup container.

Issue 1: Randomly appearing warnings while booting which seem not to have any real impact. udev complains about a symlink(?) which could not be moved. (boot.log just got overwritten so i can not provide the exact message) I had the impression udev tried to move some temporary file to its final destination. (I think it has been some /dev/mapper/ or /dev/$volume-group-name/ symlink) (this might be related to Bug #864185)

More critical: Issue 2: Snapshotting a logical volume leads randomly to working snapshots with missing links in /dev/mapper/ and /dev/disk/by-id/ (but a dead symlink from /dev/$volume-group-name/ to /dev/mapper/ get's created ...)
Somehow udev fails to create the symlinks. I monitored the udev events passing by (using udevadm monitor --property). The DEVLINKS property appears to contain valid data, to be more precise, the symlinks i would expect to be created by udev, but those symlinks sometimes simlpy never get created.
LVM creates several devices if it creates a snapshot, but i only experienced problems with the symlink representing the snapshot-device itself, the "$name-real" and "$name-cow" devices get linked the way they should (at least i did not experience problems there, but since i never used those links, maybe i just never observed this behaviour there).

And the syslog shows the same message as reported above: udev complains about
> inotify_add_watch(6, /dev/dm-23, 10) failed: No such file or directory
which is the same message as reported here, BUT: udev issues this message if the snapshot gets removed(!), dm-23 is the snapshot volume which just has been gone.

Revision history for this message
Sergey (ru-lids) wrote :

ubuntu server 12.04 x64

As well as markusj, i'm affected by issue with removing snapshot of logical volume.

sudo /sbin/lvcreate -l 5895 -s -n tsbackup /dev/vg1/vdata
sudo /sbin/lvremove -f /dev/vg1/tsbackup
syslog: (twice)
udevd[14958]: inotify_add_watch(6, /dev/dm-1, 10) failed: No such file or directory
udevd[14958]: inotify_add_watch(6, /dev/dm-1, 10) failed: No such file or directory

Snapshot is deleted successfully. Only udev events.
On older server 10.04 x64 this error not present. Сonfiguration of both servers is identical.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Unsupported series, setting status to "Won't Fix".

This bug was filed against a series that is no longer supported and so is being marked as Won't Fix. If this issue still exists in a supported series, please file a new bug.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: Triaged → Won't Fix
Revision history for this message
wondra (wondra) wrote :

Is my bug https://bugs.launchpad.net/ubuntu/+source/udev/+bug/1647067 in series 14.04 a duplicate of this?

Revision history for this message
Phillip Susi (psusi) wrote :

No.

Also, I have not seen the inotify_add_watch in quite some time so I think this may have been fixed over the years.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.