blkid fails to identify old LUKS partition volumes

Bug #362315 reported by Julien Plissonneau Duquene on 2009-04-16
This bug report is a duplicate of:  Bug #428435: luks encrypted partition not detected. Edit Remove
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
Low
Unassigned
cryptsetup
Unknown
Unknown
cryptsetup (Ubuntu)
Wishlist
Unassigned
Declined for Jaunty by Scott James Remnant (Canonical)
Nominated for Karmic by JanG
udev (Ubuntu)
Undecided
Unassigned
Declined for Jaunty by Scott James Remnant (Canonical)
Nominated for Karmic by JanG

Bug Description

Binary package hint: udev

Description: Ubuntu 9.04
Release: 9.04
udev:
  Installed: 141-1

Symptoms:
My laptop fails to boot since I upgraded from Intrepid to Jaunty. Before Jaunty, the system asked for cryptsetup luks passphrases twice, a first time for the root partition (sda3 on my system) and a second time for the swap partion (sda4). Since Jaunty it asks for the root cryptsetup passphrase, unlocks the slot, waits a long time then drops to a shell, saying that the device /dev/disks/by-uuid/.... (id of sda4 partition) was not found. I have to manually create the link in /dev/disks/by-uuid every time for the system to boot.

What's actually wrong:
running udevadm info --export-db do not report any uuid for sda4, while sda3 is correct. See attached file.

More wrongness:
# vol_id /dev/sda4
unknown or non-unique volume type (--probe-all lists possibly conflicting types)
# vol_id --probe-all /dev/sda4
swap
crypto_LUKS
## I could not find a way to make vol_id spit out the uuid of the luks header of the partition.

For information, blkid has a different kind of wrongness. Out of the box it will report a wrong uuid (that is, the uuid that was used in intrepid and before), probably because it parses the luks data as swap area. But then it is possible to edit /etc/blkid.tab and say that sda4 is of type crypt_LUKS, then blkid will find the right UUID and remember it.

Your block device has multiple metadata on it, this isn't supported

Changed in udev (Ubuntu):
status: New → Invalid

Hey, this was setup using the standard ubuntu tools some releases ago. It used to work until Jaunty. Now it doesn't work. THAT'S A DEFECT period. Reopening. Someone influent told in some report of some other package that libvolid folks are not willing to fix bugs, it looks like this is true unfortunately.

There is no multiple medatadata on this block device, it's supposed to be only LUKS. If tools detect another metadata that's a bug.

And "multiple metadata is not supported" does not make this bug invalid. It makes the design choice not to support multiple metadata invalid. Change that. Thanks.

Changed in udev (Ubuntu):
status: Invalid → New

By the way, about "multiple metadata is not supported", there are already multiple links for the same block device in /dev/disk/by-id, e.g.
ata-HTS721060G9AT00_MPC3B2Y3GK0S3E-part1 -> sda1
scsi-SATA_HTS721060G9AT00_MPC3B2Y3GK0S3E-part1 -> sda1

Why not just do that in /dev/disk/by-uuid when you are not sure about the type and UUID of a block device?

Please don't simply re-open bugs without consulting with the maintainer.

In your case, my understanding of cryptsetup is that it creates a devmapper device exposing the *unencrypted* data on the partition. It is quite correct that udev ignores /dev/sda4 - it is an encrypted swap, and udev can do nothing with it.

Instead you should use the UUID of the actual unencrypted devmapper device. What does ls -l /dev/mapper show for you?

Changed in udev (Ubuntu):
status: New → Invalid

Reopening again. Please don't make bugs as invalid without first making sure that you understand the problem, and that the bug report is actually invalid. Because in this case it is not. Thanks.

Yes, devmapper exposes the unencrypted partition ... once the partition is unlocked. At boot time it is not unlocked. In order to unlock it, the init scripts wait for sda4 (that is, a LUKS block device) to appear with a UUID in /dev/disks/by-uuid. This happened in the past (with a wrong UUID, but at least it appeared to work), and does not happen anymore.

It is not possible to use the UUID of the unencrypted device before it is unencrypted. Init scripts need the UUID of the LUKS device in order to unlock it.

Also note that this already correctly works for sda3, which is the root device. It fails for sda4, which is the swap/hibernate partition.

Changed in udev (Ubuntu):
status: Invalid → Confirmed

dd if=/dev/zero of=/dev/sda4

And set up LUKS on that device again.

Feel free to change the bug task from udev to whatever tool you used to create the device, I've opened a task on cryptsetup speculatively

Changed in udev (Ubuntu):
status: Confirmed → Won't Fix

Out of interest, if you use the merged blkid/vol_id library available from my PPA (util-linux, e2fsprogs, libblkid, etc.) what result does blkid -p give you on that device?

"Won't fix" is less bad than plain "invalid" but still not appropriate IMO. This is the cow-boy way of "fixing" problems: shoot them first, if they survive shoot them again, then eventally think about them.

With your packages:
$ sudo ./blkid -p /dev/sda3
/dev/sda3: UUID="4d3ba41a-1fb4-474b-a59e-4285e4768bdd" VERSION="256" TYPE="crypto_LUKS" USAGE="crypto"
$ sudo ./blkid -p /dev/sda4
/dev/sda4: ambivalent result (probably more filesystems on the device)

Correct detection should stop as soon as a LUKS header is detected, and report the LUKS UUID. Why? Because if this was an actual swap partition, the LUKS header would have been deleted by mkswap (see zap_bootbits).

What's actually on /dev/sda4:
00000000 4c 55 4b 53 ba be 00 01 61 65 73 00 00 00 00 00 |LUKS....aes.....|
(...)
00000400 01 00 00 00 25 72 07 00 00 00 00 00 e6 78 04 bd |....%r.......x..|
(...)
00000ff0 00 00 00 00 00 00 53 57 41 50 53 50 41 43 45 32 |......SWAPSPACE2|

So yes there is a stale swap signature, probably coming from a previous distro before I switched to ubuntu. As far as I know it's not cryptsetup's job to erase what's there. It is recommanded to fill the disk with random bits before running cryptsetup; the ubuntu tool at that time did not do this (nor did they erase the swap signature).

If you leave this as "Won't fix", it actually means that migrating from previous Ubuntu releases with encrypted swap partitions is not supported. Also probably means that installing Ubuntu Jaunty with a LUKS swap over over existing partitions is not supported, and going to fail. Please document this in release notes. Thanks.

Nevertheless, not reporting any UUID is the worst possible behaviour and break things, as demonstrated.

What is so hard about marking this bug as "confirmed" so 1. others have a better chance to know about it, and 2. it gets a better chance to be fixed someday?

A better way to remove a stale swap signature than zeroing the whole thing:
dd if=/dev/zero of=/dev/sdXXX bs=1024 seek=1 count=3

On Thu, 2009-04-16 at 19:43 +0000, Julien Plissonneau Duquene wrote:

> "Won't fix" is less bad than plain "invalid" but still not appropriate
> IMO. This is the cow-boy way of "fixing" problems: shoot them first, if
> they survive shoot them again, then eventally think about them.
>
I disagree.

You self-identified an apparent defect in a particular software package,
when in fact that package was working normally and as designed.

It's often better to *not* jump directly to the source as you did; had
you filed a more generic bug, our QA team would have been able to triage
it far more effectively and identify the real problem.

> With your packages:
> $ sudo ./blkid -p /dev/sda3
> /dev/sda3: UUID="4d3ba41a-1fb4-474b-a59e-4285e4768bdd" VERSION="256" TYPE="crypto_LUKS" USAGE="crypto"
> $ sudo ./blkid -p /dev/sda4
> /dev/sda4: ambivalent result (probably more filesystems on the device)
>
Right, so blkid behaves the same.

> Correct detection should stop as soon as a LUKS header is detected, and
> report the LUKS UUID. Why? Because if this was an actual swap partition,
> the LUKS header would have been deleted by mkswap (see zap_bootbits).
>
BZZT.

For every person that authoratively asserts, as you just did, that X is
always preferred over Y - I can find you somebody who asserts the exact
opposite.

Years of trying to deal with this sanely led upstream to the simple
conclusion that there is never a preference of X over Y or Y over X.

The better solution is to ensure that there is never a case of
conflicting meta-data on a block device.

> So yes there is a stale swap signature, probably coming from a previous
> distro before I switched to ubuntu. As far as I know it's not
> cryptsetup's job to erase what's there.
>
It is recommended, see util-linux etc.

> If you leave this as "Won't fix", it actually means that migrating from
> previous Ubuntu releases with encrypted swap partitions is not
> supported. Also probably means that installing Ubuntu Jaunty with a LUKS
> swap over over existing partitions is not supported, and going to fail.
> Please document this in release notes. Thanks.
>
No.

I've left the udev task of this as Won't Fix - if previous versions of
cryptsetup were broken, and did not correctly erase conflicting
metadata, then cryptsetup can be fixed and the migration dealt with
there.

(Our bug tracker uses multiple tasks, so this bug is still open, it's
just rejected as a udev problem - another reason you should have filed a
more generic bug before diving in yourself.)

> What is so hard about marking this bug as "confirmed" so 1. others have
> a better chance to know about it, and 2. it gets a better chance to be
> fixed someday?
>
Because it isn't a bug. The software is working exactly as designed.
If you'd like to complain about the design of the software, upstream is
a far better place to do that - we're just a distro.

(Note that vol_id is deprecated in udev, so your complaints should be
directed at the new libblkid in util-linux-ng)

Scott
--
Scott James Remnant
<email address hidden>

Download full text (3.4 KiB)

> You self-identified an apparent defect in a particular software package,
> when in fact that package was working normally and as designed.

Fact: it "worked" before jaunty, with jaunty it's broken.

> It's often better to *not* jump directly to the source as you did; had
> you filed a more generic bug, our QA team would have been able to triage
> it far more effectively and identify the real problem.

You are just denying what the real problem is.

> Right, so blkid behaves the same.

THIS version of blkid behaves the same broken way as vol_id. Previous versions of blkid reported a UUID.

> For every person that authoratively asserts, as you just did, that X is
> always preferred over Y - I can find you somebody who asserts the exact
> opposite.

Yeah, I've read that one before. The perfect excuse for never fixing anything.

BULLSHIT.

I _demonstrated_ that if you have both a LUKS header and a swap header on a partition, it means that cryptsetup was run after mkswap. So that partition is _supposed_ to be a LUKS partition. An in fact it is way more likely to be a LUKS partition than a swap partition. If it were a swap partition with a stale LUKS signature it would mean that the owner of the system took unusual steps (e.g. running cryptsetup, reverting to regular swap but not running mkswap again).

Now find me someone that will assert and demonstrate the reverse.

> The better solution is to ensure that there is never a case of
> conflicting meta-data on a block device.

That's correct, but that's another problem. Yes you can use migration scripts to remove the stale signature. It's going to be more complicated to implement and test than just FIXING udev to report ANY UUID (as blkid used to do before).

> Because it isn't a bug.

Yes it is.

> The software is working exactly as designed.

Probably not. Even then, if there is a design goal that states "when there are two possible UUID do not report ANY" that design goal is plain wrong and must go.

> If you'd like to complain about the design of the software, upstream is
> a far better place to do that - we're just a distro.

I am going to do that too. But in the interim, or if upstream fails to understand that, that can, and should, be patched at the distro level.

---

Now some words about the way YOU, Mr. Scott James Remnant, handled the issue:
- failed to realize that this is a migration problem that is going to impact actual users
- marked bug as "invalid" two times before thinking about it, suggesting an alternative (cryptsetup), and marked as "won't fix" even before evaluating the effort in the alternative vs. the effort in udev
- still fails to realize that the easiest short-term solution for the distro is to fix udev.

And this is how a sensible bug supervisor (like most supervisors actually are) would have done it:
- ask for details (mark bug "new", "incomplete" or "triaged") ; are there actually two signatures on the partition? how did that happen?
- add "cryptsetup" to the bug so they can think about erasing some known signatures (esp. swap) when initalizing a LUKS partition
- mark the bug as "confirmed" because not reporting any UUID in this case is just nonsense.
- eventually...

Read more...

On Fri, 2009-04-17 at 14:09 +0000, Julien Plissonneau Duquene wrote:

> And this is how a sensible bug supervisor (like most supervisors
> actually are) would have done it:
>
I'm not a bug supervisor. I'm a developer.

> - ask for details (mark bug "new", "incomplete" or "triaged") ; are
> there actually two signatures on the partition? how did that happen?
>
No need, your bug report contained all the details required - it was
clear that there were two signatures on the block device and that it was
ignored for precisely this reason.

> - add "cryptsetup" to the bug so they can think about erasing some
> known signatures (esp. swap) when initalizing a LUKS partition
>
Err, I did this.

> - mark the bug as "confirmed" because not reporting any UUID in this
> case is just nonsense.
>
Didn't need to be Confirmed - it can go straight to Invalid/Won't Fix

> - eventually report the problem upstream (or ask the reporter do do
> so), and wait for upstream reaction before taking a decision
> (implement upstream fix, won't fix, patch in distro).
>
I know the upstream opinion on this kind of bug very well, we are in
constant communication.

I'm also actively involved in the current effort to merge vol_id and
libblkid, so I know the upstream opinions of util-linux-ng as well in
this matter.

Scott
--
Scott James Remnant
<email address hidden>

On Fri, 2009-04-17 at 14:09 +0000, Julien Plissonneau Duquene wrote:

> > The software is working exactly as designed.
>
> Probably not.
>
From the /usr/share/doc/udev/NEWS.gz file:

Libvolume_id now always probes for all known filesystems, and does not
stop at the first match. Some filesystems are marked as "exclusive probe",
and if any other filesytem type matches at the same time, libvolume_id
will, by default, not return any probing result. This is intended to prevent
mis-detection with conflicting left-over signatures found from earlier
file system formats. That way, we no longer depend on the probe-order
in case of multiple competing signatures. In some setups the kernel allows
to mount a volume with just the old filesystem signature still in place.
This may damage the new filesystem and cause data-loss, just by mounting
it. Because volume_id can not decide which one the correct signature is,
the wrong signatures need to be removed manually from the volume, or the
volume needs to be reformatted, to enable filesystem detection and possible
auto-mounting.

> Even then, if there is a design goal that states "when
> there are two possible UUID do not report ANY" that design goal is plain
> wrong and must go.
>
Feel free to discuss this with Upstream...

 subscribe kay-sievers

Scott
--
Scott James Remnant
<email address hidden>

There is not much to add to the explanation. Volume autodetection has to refuse to return any results for conflicting signatures. It will get even more picky in the future, we expect.

There is no way to resolve such conflicts automatically. We would risk serious data loss. Be happy, that the system did not recognize one of your data partitions as swap and corrupted it.

It could be discussed, if a tool other than the scary dd should be provided to resolve such problems, but that is outside the scope of this bug.

Thanks,
Kay

> There is no way to resolve such conflicts automatically.

In some cases (such as the case above), there is.

> We would risk serious data loss.

The very worst that could eventually happen here is that the swap is mounted unencrypted. The only possible data loss is that LUKS key slots would be overwritten. That is, if the partition is exposed as "swap". But that's not what I suggest.

I suggest that when there is a valid LUKS header, the partition is always exposed as LUKS. There is no possible data loss here: if there is no valid key, the partition cannot be used at all.

> Be happy, that the system did not recognize one of your data partitions as swap and corrupted it.

It's not a data partition that has a swap signature, it's an actual swap partition (but encrypted with LUKS), with type 0x83 "swap". And it worked that way for about two years (the UUID that was exposed in /dev/disks/by-uuid was the one of the swap header, not the LUKS header). And nothing bad happened, until Jaunty, where I could not boot at all.

Proven fact: your position, on this particular issue, breaks things.
Still not proven: my suggestion would break things.

To do:
- prove that my suggestion to expose the LUKS UUID could actually cause bad things
OR
- recognize that this is an acceptable way to deal with that particular issue (way better IMO than to actually write things on block devices without knowing too well if you should do that, think of the users).

Kay Sievers (kaysievers) wrote :

Sorry, nothing to prove here. You totally miss the point. Fix your broken metadata.

> Sorry, nothing to prove here. You totally miss the point. Fix your broken metadata.

That's already done, mind you.

Now PROVE that my suggestion to expose the LUKS UUID could actually cause bad things. Go.

Daniel Hahler (blueyed) wrote :

Julien, I think you have a good point here (using the LUKS UUID always, especially when the other is swap).

Please take this upstream (via their mailinglist, see http://userweb.kernel.org/~kzak/util-linux-ng/), as Scott suggested, and hopefully they'll accept it.
Thanks.

This is a serious bug that has hosed my system.

Upgraded Ibex to Jaunty (Kubuntu) using the upgrade tool. I have precisely one encrypted container - no multiple type situation here. Unencrypted /boot, and encrypted sdaX_crypt. That is all.

Now, I do not get a password prompt to enter my luks password to unlock the encrypted LVM.

It drops me on to a BusyBox shell with an (initramfs) prompt. Error
message :

Check cryptopts=source= bootarg cat /proc/cmdline
or missing modules, devices: cat /proc/modules ls /dev
-r ALERT! /dev/disk/by-uuid/<blah-blah> does not exist.
Dropping to a shell!

What is going on ? Is the latest kernel bereft of the needed crypto
modules ? How do I fix this ?

JanG (jan-ge) on 2009-10-05
summary: - udev fails to identify crypt_LUKS swap partition by uuid
+ udev fails to identify crypt_LUKS partition by uuid

For experienced users Karel Zak has a fix available at http://article.gmane.org/gmane.linux.utilities.util-linux-ng/2568 . Remember backing up your data!

Another workaround could be modifying your /etc/crypttab from /dev/disk/by-uuid/## to /dev/sdXX.

If your stuck at boot with "Gave up waiting for root device. . . . Alert! /dev/disk/by-uuid/## does not exist. Dropping to a shell", you can simply type this:
$ ln -s /dev/sdXX /dev/disk/by-uuid/##
$ exit

As there will be no automatic fix available until karmic gets released and it may affect users upgrading from jaunty, it should be mentioned in the "Knows issues" list!

JanG (jan-ge) on 2009-10-05
tags: added: regression-potential
tags: added: ubuntu-boot
JanG (jan-ge) on 2009-10-05
summary: - udev fails to identify crypt_LUKS partition by uuid
+ blkid fails to identify old LUKS partition volumes
JanG (jan-ge) on 2009-10-05
Changed in cryptsetup (Ubuntu):
status: New → Confirmed
JanG (jan-ge) wrote :

I added this to Ubuntu release notes project, as there won't be an automatic fix and people may encouter this when upgrading to karmic.
Manual fix infos are stated in the upstream bug report.

tags: removed: ubuntu-boot
Steve Langasek (vorlon) wrote :

Generally speaking, users will not encounter this bug when upgrading to karmic because the bug was already present in jaunty. The exception are kubuntu users, who are encouraged to upgrade directly from hardy to karmic.

We should put this in both the karmic *and* the jaunty release notes.

Changed in ubuntu-release-notes:
importance: Undecided → Low
status: New → Triaged
Changed in cryptsetup (Ubuntu):
importance: Undecided → Low
importance: Low → Wishlist
Antti Kaijanmäki (kaijanmaki) wrote :

I think this can be marked as duplicate of bug #428435. I know this bug dates back a lot more and it's not originally util-linux-ng related, but on karmic this will manifest through blkid.

However the underline cause of these bug reports is the fact that older versions of cryptsetup (<1.0.7) did not clear the partition superblock and thus left bogus signatures floating around. It was a bug with cryptsetup and that bug has now been fixed upstream and from karmic onward people should not be affected with new partitions.

It's unfortunate that the bug is causing a lot of trouble to some users, I included, but let's face it; there's no way we can sanely come up with patches to vol_id or blkid which would automatically guess which signature is the right one. Therefore this is a WONTFIX for every package affected. It's true that old vol_id did claim a partition as LUKS encrypted even though there were other signatures also. This was "wrong" behaviour, but it just happened to work.

I will try to create an encrypted partition with both LUKS and swap signatures under jaunty and then try to add support to my script in bug #428435 (comment 24) to remove the swap signature so that there's at least some way for people to migrate from jaunty to karmic without having to reformat their LUKS partitions.

Antti Kaijanmäki (kaijanmaki) wrote :

Marking this as a duplicate of bug #428435
We need to come up with a solution which spans to every release from 8.04 LTS server to 10.04 LTS, so let's continue the discussion in one place.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.