Obsolete backup ext2/3/4 superblocks can confuse e2fsck on an encrypted LUKS partition

Bug #1713175 reported by Devang on 2017-08-26
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cryptsetup (Ubuntu)
Undecided
Unassigned
e2fsprogs (Ubuntu)
Undecided
Unassigned
util-linux (Ubuntu)
Undecided
Unassigned

Bug Description

fsck.ext4 runs on a LUKS partition and starts to correct inode entries, rendering the partition corrupted and useless. It seems like it should defensively check where it is an isLuks partition using "cryptsetup isLuks /dev/sda1" before continuing to modify it.

I hope such a defensive check can be added.

Theodore Ts'o (tytso) wrote :

So what happened is the following. There was a previous ext4 file system on the disk. You ran "cryptsetup luksFormat /dev/sda1" which wiped out the primary superblock.

You then manually ran fsck.ext4 on the device. It noticed the primary superblock was non-existent, and then asked permission to modify the file system. So it would have required multiple sysadmin errors in order get to this point.

If you want a process which is fool-proof against sysadmin errors, you could run the following before you run cryptosetup:

dd if=/dev/zero of=/dev/callcc/scratch seek=32766 count=10 bs=4k

If you do this, then e2fsck won't find the backup superblock, and it will print:

e2fsck 1.43.5 (04-Aug-2017)
ext2fs_open2: Bad magic number in super-block
e2fsck: Superblock invalid, trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open /dev/callcc/scratch

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem. If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

/dev/callcc/scratch contains a crypto_LUKS file system

So we could try to do the "is there another file system or LUKS" check before falling back to the secondary superblock. I'd call this a feature request.

It also should be the case that the wipefs program should have wiped enough to get rid of the backup superblock (it's considered best practice to run wipefs before running cryptsetup).

Furthermore, crypsetup should have run wipefs, or done its own wiping. Mke2fs will wipe sectors at the beginning and the end of the superblock (otherwise it's possible for the device to get misidentified as part of a RAID array).

Fundamentally, trying to use in-band signalling is fraught with peril. All tools need to do a better job of preventing this kind of mis-identification, especially at file system or LUKS creation/format time.

Theodore Ts'o (tytso) wrote :

Note: I think this should be a low-priority/wishlist/feature request. But I can't edit the importance. It is a valid feature request though, so I plan to treat it at that priority.

If someone else thinks it's higher priority, patches are welcome. :-)

Changed in cryptsetup (Ubuntu):
status: New → Confirmed
asi (gmazyland) wrote :

One day cryptsetup will link to liblkid, adding two "use friendly" things in luksFormat
- warn if it detects something on disk before destroying it
- run wipefs to really destroy all the magic strings there (as Teo suggested)

So valid feature request also on the cryptsetup side, with the same low-priority though :-)

Theodore Ts'o (tytso) on 2017-08-26
summary: - fsck should check before running on an encrypted LUKS partition
+ Obsolete backup ext2/3/4 superblocks can confuse e2fsck on an encrypted
+ LUKS partition
Devang (devangm) wrote :

You are correct, not running atleast wipefs before using luksFormat on the block device was a mistake on my part, but I barely remember making it. I half expected fsck to come up with some errors, so I let it make changes. It is not a volume that is always connected at startup, hence the manual check. It was just the wrong block device. An isLuks check would have returned yes and prevented fsck from running. Maybe libblkid could detect luks partitions?

This happened on a 2 drive raid 1 array, which I have dd backups of, if some how the array can be recovered. Hard lesson learned though. Thank you for the explanation.

asi (gmazyland) wrote :

FYI libblkid/blkid detects LUKS for years already.

Devang (devangm) wrote :

Ah, well, the whole thing is more of a stupid lesson learned.

Phillip Susi (psusi) wrote :

I have to disagree; the only tool that uses the backup superblock is e2fsck, and if you accidentally run wipefs on the volume, you should be able to recover it with e2fsck.

If you don't want to attempt to recover a filesystem there, then don't run e2fsck.

Changed in util-linux (Ubuntu):
status: New → Invalid
Devang (devangm) wrote :

My intention was to attempt to recover a filesystem. It just happened to be the wrong block device on which there was a functioning LUKS partition.

Being defensive to prevent such a circumstance would call for two things: cryptsetup forcing a user to write zeros so such a situation isn't created in the future, and e2fsck checking for a LUKS partition and re-verifying the user really wants to continue despite there being a LUKS header and backup superblocks on the partition. If such checks can be added without additional dependencies, I think they just might prevent future similar mishaps. Maybe this mistake doesn't commonly happen, but this configuration of layering ext4 on LUKS is the default for most linux distributions, even for USB and external drives.

In fact, e2fsck could've figured out through lsblk what I was really trying to do and corrected me. I had a relatively simple configuration. It wouldn't have been that hard to figure out what I was doing was wrong.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers