fsck.ext4 -n wrote to & destroyed filesystem

Bug #537483 reported by Bela Lubkin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
e2fsprogs (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: e2fsprogs

System is running Ubuntu 10.04-current (was in the middle of upgrading last night's new packages -- had installed almost all, but not yet rebooted from the new 2.6.32-16 kernel -- was still on 2.6.32-15). As the system is completely crashed, I cannot report on the exact e2fsprogs / fsck releases. However, it was the newest version available in any of the Ubuntu 10.04 Lucid Lynx repositories (including -backports and -proposed, if anything is in those yet). I had run an `apt-get dist-upgrade` less than 2 hours before the crash; e2fsprogs would be whichever version was last issued before about 2010-03-11 1530 GMT.

WHAT HAPPENED:

Out of curiousity -- and somewhat bothered at how slow and noisy disk operations were during the day's round of upgrades -- I determined to run fsck's "-E fragcheck" -- "show me details about filesystem fragmentation" flag.

Below (after all text) is a cut-and-paste from the ssh session I ran the command from.

The exact command I ran was:

   time fsck.ext4 -n -v -t -t -D -E fragcheck /dev/sda5

in which flags are:

   -n DO NOT WRITE TO THE FILESYSTEM
   -v verbose
   -t timing information; twice for extra details
   -D optimize directories
   -E fragcheck "print a detailed report of any discontiguous blocks"

The documentation for -D comments that it "will detect directory entries with duplicate names in a single directory, which e2fsck normally does not enforce". It was for this enhanced detection that I added this flag. I realize that it is a flag which directs fsck to write, but I believe that it -- as with all(*) other writing flags -- would be rendered inoperable by "-n". That is, I believed that the combination "-n -D" would cause additional checks (for directories needing optimization & for duplicate directory entries) without causing any writes. (*)I realize this isn't fully true, that the three bad-block-related flags -[clL] are effective even under -n. This is clearly documented; the clarity of _that_ documentation lends support to the supposition that no _other_ flags will override -n.

In any case, I do not know if it was -D, the combination of -D -E fragcheck, or some other random issue which caused the problem. For all I know, `fsck -n` is fundamentally broken on ext4. I do not wish to conduct further experiments after this unwitting one, which will leave me reconstructing a system.

As the transcript shows, fsck responded with:

   /dev/sda5 is mounted.

   WARNING!!! Running e2fsck on a mounted filesystem may cause
   SEVERE filesystem damage.

   Do you really want to continue (y/n)?

Perhaps foolishly, I assumed that this message is issued in all cases -- whether or not fsck will actually be writing.

[Aside: the message should be enhanced as follows: if, due to -n, fsck _UNDERSTANDS_ that it is not going to be doing any writes, the message should read something like: "WARNING... may cause SEVERE filesystem damage. The current run is DISABLED by the `-n' flag to write to the filesystem, so no actual damage will occur." (of course this message should only be added if we're sure that it's true!). On the other hand, if -n was _NOT_ present, it should additionally comment "The current run is ENABLED to write to the filesystem. Do you really want to continue..." My point here is that this message should unambiguously inform the user whether it's just a sham warning, issued as a matter of form even though this is a dry run; or a REAL warning that damage is about to occur.]

In any case, I did answer "yes" in the belief that it wasn't actually going to write.

As the transcript shows, it displayed that it was recovering the journal, and then that there was a bad magic number.

After that I ran `fdisk -l`, which failed with an I/O error (I assume due to the binary or shared objects not being accessible); and then `df`, which succeeded but showed the root filesystem (/dev/sda5) in bad shape.

At that point I was sure the system was destroyed. Just in case, I switched power off without doing any software shutdown actions; but this did not help. Upon reboot I see:

   error: unknown filesystem.
   grub rescue> _

I may attempt some sort of rescue with `mkfs -S`, but I don't have much hope of recovery since I don't know the necessary parameters. :-(

POSSIBLE CAUSE: system was in-place upgraded from Ubuntu 9.10 Karmic Koala. Root filesystem was ext3, not ext4, before the upgrade. I don't believe I did anything to explicitly upgrade it to ext4. I probably should not have invoked fsck as `fsck.ext4` but rather just `e2fsck` or `fsck`, allowing the system to draw its own conclusion about filesystem type.

I had earlier run some exploratory commands like `tune2fs -l`, `dumpe2fs -l`; the output included something, I cannot say what at this point, which made me believe the current FS format was ext4.

Even if I was wrong to explicitly call for ext4, even if the actual on-disk format was ext3, I do not believe this command should have destroyed the filesystem! At the very least it should have called more specific attention to the problem: "On-disk filesystem format has been detected as ext3. Checking this with ext4 algorithms will probably damage the filesystem. Are you still sure you want to continue?"

Below is the actual cut-and-paste, completely unedited transcript from the fatal ssh session.

>Bela<

root@adelie:~# time fsck.ext4 -n -v -t -t -D -E fragcheck /dev/sda5
e2fsck 1.41.10 (10-Feb-2009)
/dev/sda5 is mounted.

WARNING!!! Running e2fsck on a mounted filesystem may cause
SEVERE filesystem damage.

Do you really want to continue (y/n)? yes

/dev/sda5: recovering journal

fsck.ext4: Bad magic number in super-block while trying to re-open /dev/sda5
e2fsck: io manager magic bad!

real 0m11.921s
user 0m0.180s
sys 0m0.304s
root@adelie:~#
root@adelie:~# fdisk -l
bash: /sbin/fdisk: Input/output error
root@adelie:~# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda5 73786976294836933504 73786976294768062164 68871340 100% /
none 767392 276 767116 1% /dev
none 771608 48 771560 1% /dev/shm
none 771608 220 771388 1% /var/run
none 771608 0 771608 0% /var/lock
none 771608 0 771608 0% /lib/init/rw
none 73786976294836933504 73786976294768062164 68871340 100% /var/lib/ureadahead/debugfs

Revision history for this message
Bela Lubkin (filbo) wrote :

The gist, in case it's lost in all the detail above:

I ran `fsck -n` on a filesystem, with some other flags which were not documented as overriding -n.

It hosed my filesystem.

Revision history for this message
Theodore Ts'o (tytso) wrote : Re: [Bug 537483] [NEW] fsck.ext4 -n wrote to & destroyed filesystem
Download full text (8.8 KiB)

On Thu, Mar 11, 2010 at 04:55:37PM -0000, Bela Lubkin wrote:
> The documentation for -D comments that it "will detect directory entries
> with duplicate names in a single directory, which e2fsck normally does
> not enforce". It was for this enhanced detection that I added this
> flag. I realize that it is a flag which directs fsck to write, but I
> believe that it -- as with all(*) other writing flags -- would be
> rendered inoperable by "-n". That is, I believed that the combination
> "-n -D" would cause additional checks (for directories needing
> optimization & for duplicate directory entries) without causing any
> writes. (*)I realize this isn't fully true, that the three bad-block-
> related flags -[clL] are effective even under -n. This is clearly
> documented; the clarity of _that_ documentation lends support to the
> supposition that no _other_ flags will override -n.

Yeah, sorry. The -D option was added later, and I forgot to update
the man page for the -n option. The -D option does indeed allow the
file system to be opened read/write, and will rewrite the directories,
which is danagerous when the file system is mounted.

Your assumption that -n would make the -D will modify the filesystem
part go away was a bad one.

> In any case, I do not know if it was -D, the combination of -D -E
> fragcheck, or some other random issue which caused the problem. For all
> I know, `fsck -n` is fundamentally broken on ext4. I do not wish to
> conduct further experiments after this unwitting one, which will leave
> me reconstructing a system.

There is a bug with -D and small directories in e2fsprogs 1.41.10,
which I've since fixed, which may have affected you, but
fundamentally, it is dangeorus to run e2fsck -D while the filesystem
is mounted.

> As the transcript shows, fsck responded with:
>
> /dev/sda5 is mounted.
>
> WARNING!!! Running e2fsck on a mounted filesystem may cause
> SEVERE filesystem damage.
>
> Do you really want to continue (y/n)?
>
> Perhaps foolishly, I assumed that this message is issued in all cases --
> whether or not fsck will actually be writing.

Nope, e2fsck is smart. It only issues this warning when the
filesystem is opened read-only. If you try to run "e2fsck -n
/dev/XXX" on a mounted filesystem", it won't ask that question.

So yeah, you made two bad assumptions, and that's what lead to your
file system getting badly screwed up.

I'll change things so the message is made more explicit. In case
you're curious, the reason why it was originally the worded the way it
was because if the /etc/mtab hasn't been cleared by the init scripts
when a user booted into single user mode, it was possible for e2fsck
to think the filesystem is mounted, when it really wasn't mounted.
But I'd much rather someone get scared off from running e2fsck if
their /etc/mtab hasn't been cleared after a system crash, if it avoids
the user who thinks, "surely this message doesn't apply to *me*".

> After that I ran `fdisk -l`, which failed with an I/O error (I assume
> due to the binary or shared objects not being accessible); and then
> `df`, which succeeded but showed the root filesystem (/dev/sda5) in bad
> shape.
>
>...

Read more...

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package e2fsprogs - 1.41.11-1ubuntu1

---------------
e2fsprogs (1.41.11-1ubuntu1) lucid; urgency=low

  * Merge from Debian unstable, remaining changes:
    - Do not build-depend on dietlibc-dev, which is universe.
    - Do now allow pkg-create-dbgsym to operate on this package.
    - Always use external libblkid and libuuid from util-linux, rather than
      building our own.
    - Includes debian/control in the source package to force the above.
    - Build with -O2 on powerpc to avoid a suspected toolchain bug
      (LP: #450214).
    - Do not include /etc/e2fsck.conf and remove on upgrade.
    (Fixes LP: #521648, #537483, #530071)

e2fsprogs (1.41.11-1) unstable; urgency=medium

  * New upstream release
  * Add Heimdal function com_right_r() to libcom_err (Closes: #558910)
  * Allow e2fsck to run even if the physical device has more than 2**32 blocks
  * Debugfs's "logdump -b <blk>" now properly shows the allocation status
    of the block <blk>. (Closes: #564084)
  * Make e2fsck's "the filesystem is mounted" message is now more scary
    to hopefully dissuade users from thinking, "surely that message
    doesn't apply to *me*" :-(
  * e2fsck -n will now always open the file system read-only. We now
    disallow certain combination of options which previously were manual
    exceptions; this is bad because it causes users to think they are
    smarter than they really are. So "-n -c", "-n -l", "-n -L", and
    "-n -D" are no longer supported.
  * If the partition is badly aligned, have mke2fs just print a warning
    message and continue. Previously mke2fs would ask to confirm, and
    this broke distro installation scripts.
  * Fix a bug in libext2fs caused the creation of very large journals
    for ext4 to be _very_ slow.
  * E2fsck now understands the EOFBLOCKS_FL flag which will be used in
    2.6.34 kernels to make e2fsck not complain about blocks deliberately
    fallocated() beyond an inode's i_size.
  * Fix a bug in e2fsck which could cause e2fsck -D to corrupt
    non-indexed directories. (Closes: #572453)
  * debian/rules: can be compiled statically with stack protector now.
    (Closes: #573923)
  * Update debian policy compliance to 3.8.4
 -- Scott James Remnant <email address hidden> Mon, 22 Mar 2010 17:48:20 +0000

Changed in e2fsprogs (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.