Wrong type in blkid cache causes fsck on boot to fail

Bug #316322 reported by Loïc Minier
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
e2fsprogs
Invalid
Undecided
Unassigned
util-linux (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

Hi,

I have had this bug since end of intrepid/beginning of jaunty I think. I have two similar jaunty setups; one boots fine, the other keeps failing the fsck on boot:
bee# fsck -C -R -a /boot
fsck 1.41.3 (12-Oct-2008)
fsck.ext3: Device or resource busy while trying to open /dev/sda2
Filesystem mounted or opened exclusively by another program?

(actually what runs on boot is fsck -C -R -a -A which fails with the same error message)

The "device or resource busy" is due to /boot being on /dev/md1 which is made of /dev/sda2 and /dev/sdb2.

This behavior is racy; I ran blkid:
bee# blkid
/dev/mapper/bee--raid1-refuge: UUID="cfa891cf-6b4b-4c7d-90df-98f9be721cbe" SEC_TYPE="ext2" TYPE="ext3"
/dev/mapper/bee--raid1-home: UUID="2fd30976-f849-458a-94d7-c00e9fdcbe4d" SEC_TYPE="ext2" TYPE="ext3"
/dev/mapper/bee--raid1-ubuntu--root: UUID="6ec6b99f-415b-45fc-8efe-c6a016984b5b" SEC_TYPE="ext2" TYPE="ext3"
/dev/mapper/bee--raid1-swap: TYPE="swap" UUID="e8a8ba4e-a0cb-43c4-b160-363949e4600f"
/dev/sda1: UUID="db18614f-1cbd-9225-4561-bc10b69bf963" TYPE="mdraid"
/dev/sda2: UUID="b1a8d7aa-5aab-41ab-a326-e0350c018764" SEC_TYPE="ext2" TYPE="ext3"
/dev/sdb1: UUID="659550dd-fb28-4f7d-9904-9cdcb4fa228f" SEC_TYPE="ext2" TYPE="ext3"
/dev/md0: UUID="659550dd-fb28-4f7d-9904-9cdcb4fa228f" SEC_TYPE="ext2" TYPE="ext3"
/dev/mapper/bee--raid1-debian--root: UUID="2d99500c-fa17-4587-a059-23b48aa7e08d" SEC_TYPE="ext2" TYPE="ext3"
/dev/mapper/bee--sata-var: UUID="9260b9f1-d6f4-48bd-80ce-97758cc4c1f7" SEC_TYPE="ext2" TYPE="ext3"
/dev/mapper/bee--sata-home: UUID="7950ce8f-615a-4392-bb7a-fda5baaeb93e" SEC_TYPE="ext2" TYPE="ext3"
/dev/sdb2: UUID="40a955ce-5f05-b4fc-7477-72897565b642" TYPE="mdraid"
/dev/sdb3: UUID="60b75582-1b75-c2bc-286c-ebc612c886d3" TYPE="mdraid"
/dev/sda3: UUID="60b75582-1b75-c2bc-286c-ebc612c886d3" TYPE="mdraid"
/dev/md2: UUID="gy0khl-tC2g-N50w-0Dyk-bwZP-DnpD-yQqSYn" TYPE="lvm2pv"
/dev/md1: UUID="b1a8d7aa-5aab-41ab-a326-e0350c018764" TYPE="ext3"

and now suddenly fsck knows that /boot is /dev/md1 and not /dev/sda2!
bee# fsck -C -R -A -a
fsck 1.41.3 (12-Oct-2008)
/dev/md1 is mounted. e2fsck: Cannot continue, aborting.

/dev/mapper/bee--raid1-home is mounted. e2fsck: Cannot continue, aborting.

/dev/mapper/bee--raid1-refuge is mounted. e2fsck: Cannot continue, aborting.

I think this is due to /etc/blkid.tab; a cache of device information.

Bye

Loïc Minier (lool)
description: updated
Revision history for this message
Loïc Minier (lool) wrote :

So I discovered that "blkid -c /dev/null" reports the correct results, but /etc/blkid.tab is always wrong; even "blkid -c /dev/null -w /dev/blkid.tab" doesn' t fix the file (sda2 still appears as ext3/ext2 instead of mdraid).

The only way to fix my blkid cache was to move it out of the way; then "blkid" generated a good new cache.

Revision history for this message
Loïc Minier (lool) wrote :

bee# strace -e trace=file -f blkid -c /dev/null -w /etc/blkid.tab 2>&1 | grep blkid.tab
execve("/sbin/blkid", ["blkid", "-c", "/dev/null", "-w", "/etc/blkid.tab"], [/* 18 vars */]) = 0

=> doesn't seem to update the cache at all, *sigh*

Revision history for this message
Loïc Minier (lool) wrote :

Sure enough the "write" var in misc/blkid.c is set but isn't used...

Revision history for this message
Loïc Minier (lool) wrote :

So we have actually two bugs here:
- for some reason my blkid cache borke and was never fixed subsequently; blkid correctly detects the proper type for devices when I disable the cache, but it seems to trust an existing cache for the device type; here this is clearly incorrect and breaks; I don't know what the cache is meant for, so I guess we need to include the e2fsprogs upstream people in the loop here
- blkid's -w flag is unimplemented/broken; I don't really care, just spotted it while debugging

Revision history for this message
Theodore Ts'o (tytso) wrote :

Normally the way I flush the blkid cache is to just remove it and then re-run blkid. It's much simpler than using -c /dev/null -w /etc/blkid.tab. At one point -w worked, but yeah, it looks like the code to implement it got dropped somewhere along the way.

What's going on here is that /dev/sda2 can either be an MD device or an ext3 filesystem. That's because MD puts its superblock information at the end of the disk, so in fact its ambiguous whether something that has both an ext2/3 superblock and and a MD superblock is part of an MD device or a standalone filesystem. Blkid treats the MD superblock with higher priority, since mke2fs will wipe out the end of the filesystem to try to resolve this ambiguity. HOWEVER, if the cache currently states that a filesystem is a particular filesystem type, there is an optimization which only does filesystem-specific probe to confirm that nothing has changed, and that is causing the problem here. Was this filesystem ever originally a stand-alone filesystem, and then you later converted it to be an MD devie? That can cause this problem.

In any case, yeah, this is a bug, since blkid should be able to address this. Thanks for reporting it. I'll get it fixed in the next maintenance release of e2fsprogs.

Revision history for this message
Loïc Minier (lool) wrote :

I know I had to resync my RAID a couple of times a while ago; I think it was some form of automatic md process kicking in; this is not happening anymore, I don't know what the exact issue was (didn't understand it in full back then, but the system auto-repaired the array).

I suspect I probably booted in degraded mode with the array building a couple of times, which might have created the broken blkid.tab.

I think the optimization you mention should die or be special cased for MD.

Thanks for looking into this!

Revision history for this message
Theodore Ts'o (tytso) wrote :

>I think the optimization you mention should die or be special cased for MD.

Agreed, that's what I'm currently trying to decide between....

Revision history for this message
Julien Plissonneau Duquene (julien-plissonneau-duquene) wrote :

Marking as confirmed in Ubuntu:
- implement or remove the write option -w (upstream)
- fix issue with MD devices (upstream)

Changed in e2fsprogs (Ubuntu):
status: New → Confirmed
Revision history for this message
peter b (b1pete) wrote :

will you gents - much more qualified than me - take a look at #344406 ?

it refers to blkid but in connection with usb devices. I'm struggling with this issue for the last day with no end in sight.

thank you,
peter b

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Since moving to util-linux, we set the blkid cache to a path under /dev so it's wiped after each reboot

Changed in e2fsprogs (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Peng Tao (bergwolf) wrote :

The attached patch proposes a fix to add support for the -w option to blkid command.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : Re: [Bug 316322] Re: Wrong type in blkid cache causes fsck on boot to fail

On Mon, 2009-09-28 at 09:59 +0000, Peng Tao wrote:

> The attached patch proposes a fix to add support for the -w option to
> blkid command.
>
Please submit that upstream.

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message
Peng Tao (bergwolf) wrote : Re: [Bug 316322] Re: Wrong type in blkid cache causes fsck on boot to fail
Download full text (3.2 KiB)

On Thu, Oct 1, 2009 at 3:51 AM, Scott James Remnant <email address hidden> wrote:
> On Mon, 2009-09-28 at 09:59 +0000, Peng Tao wrote:
>
>> The attached patch proposes a fix to add support for the -w option to
>> blkid command.
>>
> Please submit that upstream.
I already submitted the patch. If someone can help review/test the
patch, that would be great.

Thanks,
Bergwolf

>
> Scott
> --
> Scott James Remnant
> <email address hidden>
>
> --
> Wrong type in blkid cache causes fsck on boot to fail
> https://bugs.launchpad.net/bugs/316322
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in e2fsprogs: New
> Status in “e2fsprogs” package in Ubuntu: Fix Released
>
> Bug description:
> Hi,
>
> I have had this bug since end of intrepid/beginning of jaunty I think.  I have two similar jaunty setups; one boots fine, the other keeps failing the fsck on boot:
> bee# fsck -C -R -a /boot
> fsck 1.41.3 (12-Oct-2008)
> fsck.ext3: Device or resource busy while trying to open /dev/sda2
> Filesystem mounted or opened exclusively by another program?
>
> (actually what runs on boot is fsck -C -R -a -A which fails with the same error message)
>
> The "device or resource busy" is due to /boot being on /dev/md1 which is made of /dev/sda2 and /dev/sdb2.
>
> This behavior is racy; I ran blkid:
> bee# blkid
> /dev/mapper/bee--raid1-refuge: UUID="cfa891cf-6b4b-4c7d-90df-98f9be721cbe" SEC_TYPE="ext2" TYPE="ext3"
> /dev/mapper/bee--raid1-home: UUID="2fd30976-f849-458a-94d7-c00e9fdcbe4d" SEC_TYPE="ext2" TYPE="ext3"
> /dev/mapper/bee--raid1-ubuntu--root: UUID="6ec6b99f-415b-45fc-8efe-c6a016984b5b" SEC_TYPE="ext2" TYPE="ext3"
> /dev/mapper/bee--raid1-swap: TYPE="swap" UUID="e8a8ba4e-a0cb-43c4-b160-363949e4600f"
> /dev/sda1: UUID="db18614f-1cbd-9225-4561-bc10b69bf963" TYPE="mdraid"
> /dev/sda2: UUID="b1a8d7aa-5aab-41ab-a326-e0350c018764" SEC_TYPE="ext2" TYPE="ext3"
> /dev/sdb1: UUID="659550dd-fb28-4f7d-9904-9cdcb4fa228f" SEC_TYPE="ext2" TYPE="ext3"
> /dev/md0: UUID="659550dd-fb28-4f7d-9904-9cdcb4fa228f" SEC_TYPE="ext2" TYPE="ext3"
> /dev/mapper/bee--raid1-debian--root: UUID="2d99500c-fa17-4587-a059-23b48aa7e08d" SEC_TYPE="ext2" TYPE="ext3"
> /dev/mapper/bee--sata-var: UUID="9260b9f1-d6f4-48bd-80ce-97758cc4c1f7" SEC_TYPE="ext2" TYPE="ext3"
> /dev/mapper/bee--sata-home: UUID="7950ce8f-615a-4392-bb7a-fda5baaeb93e" SEC_TYPE="ext2" TYPE="ext3"
> /dev/sdb2: UUID="40a955ce-5f05-b4fc-7477-72897565b642" TYPE="mdraid"
> /dev/sdb3: UUID="60b75582-1b75-c2bc-286c-ebc612c886d3" TYPE="mdraid"
> /dev/sda3: UUID="60b75582-1b75-c2bc-286c-ebc612c886d3" TYPE="mdraid"
> /dev/md2: UUID="gy0khl-tC2g-N50w-0Dyk-bwZP-DnpD-yQqSYn" TYPE="lvm2pv"
> /dev/md1: UUID="b1a8d7aa-5aab-41ab-a326-e0350c018764" TYPE="ext3"
>
> and now suddenly fsck knows that /boot is /dev/md1 and not /dev/sda2!
> bee# fsck -C -R -A -a
> fsck 1.41.3 (12-Oct-2008)
> /dev/md1 is mounted.  e2fsck: Cannot continue, aborting.
>
>
> /dev/mapper/bee--raid1-home is mounted.  e2fsck: Cannot continue, aborting.
>
>
> /dev/mapper/bee--raid1-refuge is mounted.  e2fsck: Cannot continue, aborting.
>
>
> I think this is due to /etc/blkid.tab; a cache of device information.
>
> Bye
>

--
Cheers,
Peng T...

Read more...

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : Re: [Bug 316322] Re: Wrong type in blkid cache causes fsck on boot to fail

On Thu, 2009-10-01 at 04:33 +0000, Peng Tao wrote:

> On Thu, Oct 1, 2009 at 3:51 AM, Scott James Remnant <email address hidden> wrote:
> > On Mon, 2009-09-28 at 09:59 +0000, Peng Tao wrote:
> >
> >> The attached patch proposes a fix to add support for the -w option to
> >> blkid command.
> >>
> > Please submit that upstream.
> I already submitted the patch. If someone can help review/test the
> patch, that would be great.
>
Really?

I can't see any mails from you to <email address hidden>, which
is the upstream mailing list.

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message
Peng Tao (bergwolf) wrote : Re: [Bug 316322] Re: Wrong type in blkid cache causes fsck on boot to fail

On Sat, Oct 3, 2009 at 2:53 PM, Scott James Remnant <email address hidden> wrote:
> Really?
>
> I can't see any mails from you to <email address hidden>, which
> is the upstream mailing list.
The patch is for e2fsprogs and can be find here:
http://marc.info/?l=linux-ext4&m=125413525331572&w=2
The upstream mailing list for e2fsprogs is <email address hidden>

Thanks,
Bergwolf

Revision history for this message
Peng Tao (bergwolf) wrote :

On Sat, Oct 3, 2009 at 4:20 PM, Peng Tao <email address hidden> wrote:
> On Sat, Oct 3, 2009 at 2:53 PM, Scott James Remnant <email address hidden> wrote:
>> Really?
>>
>> I can't see any mails from you to <email address hidden>, which
>> is the upstream mailing list.
Err, it seems I should port the patch to util-linux-ng. Will send a
updated patch there soon.

Thanks,
Bergwolf

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote : Re: [Bug 316322] Re: Wrong type in blkid cache causes fsck on boot to fail

On Sat, 2009-10-03 at 08:20 +0000, Peng Tao wrote:

> On Sat, Oct 3, 2009 at 2:53 PM, Scott James Remnant <email address hidden> wrote:
> > Really?
> >
> > I can't see any mails from you to <email address hidden>, which
> > is the upstream mailing list.
> The patch is for e2fsprogs and can be find here:
> http://marc.info/?l=linux-ext4&m=125413525331572&w=2
> The upstream mailing list for e2fsprogs is <email address hidden>
>
Ah, I see the confusion.

The blkid library in e2fsprogs is no longer used. Distributions now use
the utility and library of the same name from util-linux-ng, which has
been greatly rewritten.

(This explains why your patch didn't fit :p)

If you could refresh your patch based on the util-linux-ng blkid, and
submit it to that upstream, that would be greatly appreciated.

It may even be possible that this works just fine in util-linux-ng's
version (have you tested with karmic?)

Scott
--
Scott James Remnant
<email address hidden>

Revision history for this message
Peng Tao (bergwolf) wrote :

FYI.
This patch is based on master branch of util-linux-ng and I have submitted it to the upstream mailing list.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

On Sun, 2009-10-04 at 04:05 +0000, Peng Tao wrote:

> FYI.
> This patch is based on master branch of util-linux-ng and I have submitted it to the upstream mailing list.
>
Thanks, in 9.10 we're shipping the stable/v2.16 branch rather than
master (which has had something of a rewrite of blkid); I'll be sure to
pick this up for 10.04

Scott
--
Scott James Remnant
<email address hidden>

Changed in e2fsprogs (Ubuntu):
status: Fix Released → Triaged
importance: Undecided → Medium
affects: e2fsprogs (Ubuntu) → util-linux (Ubuntu)
Changed in e2fsprogs:
status: New → Invalid
Revision history for this message
ceg (ceg) wrote :

Scott wrote:
>Since moving to util-linux, we set the blkid cache to a path under /dev so it's wiped after each reboot

Has that been reverted?
On my 9.10 /etc/blkid.tab was present, outdated (nov 09) and wrong.

Revision history for this message
ceg (ceg) wrote :

Ah, I have found /dev/.blkid.tab recent but still wrong (#531240).

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

If you have a wrong entry in /dev/.blkid.tab then that's a result of blkid probe being wrong, not "old data" (which is what this bug is about)

Changed in util-linux (Ubuntu):
status: Triaged → Invalid
status: Invalid → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.