Command-line recovery required when fsck reports an unexpectedy inconsistency

Bug #58430 reported by saads on 2006-09-01
42
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fedora
Won't Fix
Medium
util-linux (Ubuntu)
Low
Unassigned
Nominated for Intrepid by Aryeh Gregor

Bug Description

When there is an ext3 filesystem error and fsck fails to correct it, the system becomes unusable. It asks the user to perform fsck manually in a very cryptic way. The average user can in no way understand what has happened to their system. In most cases a simple fsck /dev/hdaX (whatever partition required the check) will end up solving the problem. This should be done automatically.

Here is the output of what happened to a friend of mine who rebooted and had to use windows because they couldn't start ubuntu:

----------------------------------------------------------------------------
The following appears when i try to boot ubuntu:

dev/hda2 contains a file system with errors-check forced

unexpected inconsistency RUN fsck manually without -a or -p options

an automatic file system check of the root file system failed

a manual fsck must be performed then system rebooted
-----------------------------------------------------------------------------

Matt Zimmerman (mdz) wrote :

The reason the system requires that fsck be run manually in this case is that there is the possibility of data loss/corruption during the repair. Where errors can be automatically corrected without this risk, fsck simply corrects them without prompting the user.

Sivan Greenberg (sivan) wrote :

Matt, shouldn't we allow people to do that and confirm they understand the risk on the same prompt ? That is, when detecting such an error occurred, we offer the user to run the "manual" fsck and have him attend to the critical , possibly involving data loss error by fsck.

I think the reasoning is wrong here.

so, we are in the situation that fsck recovery could trigger data loss.
so, it's dangerous.

question is... what is the alternative for the user?
who is going to repair a filesystem corruption in any other way than fsck? ever? I mean... basically there is only one choice.
but ok... I guess at least we should ask the user... instead of just dumping her into a cryptic shell.

"Serious data corruption occurred, what do you want to do?"

<F5> auto repair (might cause dataloss)
<F6> manual repair (expert mode) -> which dumps the user to a shell as of today

because right now the user is dumped into a shell and most users will not know what to do and simply reinstall ubuntu, overwriting ALL of their data! so what did we protect the user from here?

Martin (martin615) wrote :

Why even ask? Seriously... who would answer anything but yes to fsck's questions?

Martin (martin615) wrote :

I also think the Importance should be bumped because, as Emmanuel said, it renders the system unusable for many users.

Neal McBurnett (nealmcb) wrote :

Please don't just fsck automatically for folks.

This just happened to me, and I ended up with a broken system by following the directions and running fsck and having to hold down the "enter" key for a LONG time while ominous messages flew by offering to remove illegal blocks, duplicate blocks, etc.

If I hadn't been wanting to get to bed, or if the data on the system had been more critical, I would have wanted to first make a copy of the disk with a rescue disk, dd and ssh so I could

 1) investigate just what had happened to the system in the first place, to hopefully keep this from happening to me or someone else

 2) be able to try alternate repair strategies.

It may well be that the default option for the majority of inexperienced users should be to offer to do an automatic fsck. But given the likelyhood of severe data loss, I think we really need to also offer them an option to stop, make a copy, consult an expert, or somehow deal with the situation manually as best they can.

And in either case, I think we should do whatever we can to maintain a log of the results of the fsck. That will probably be hard in many circumstances, but we could at least tell folks to stop, think about how they got there, and take notes for future reference. Some will be able to run the fsck via a serial port or something. It would also be nice if fsck offered some statistics at the end - max and min block number affected, # repairs, # repairs refused, or whatever to characterize what happened for future forensics or bug investigations.

Just remember, some folks will REALLY CARE about the data.

Martin (martin615) wrote :

fsck puts all data that can be salvaged in lost+found, right? Other than that, is it really useful to dump the disk image for any other purpose but debugging?

Martin (martin615) wrote :

And if there's some tool out there that is better at recovering data than fsck, why not investigate running that tool instead (or maybe in addition to fsck).

Anyways, asking and making it CRYSTAL CLEAR and DEAD SIMPLE what needs to be done to get back to a working state is better than what's there now.

Neal McBurnett (nealmcb) wrote :

lost+found is useful, and people should be reminded that it is there. But there is a lot of info that fsck spews out, like directory names and inodes, that is currently not seen again as far as I know. Sounds like a separate bug on fsck (or user education :-) might be in order, but nevertheless I think the automatic fsck option and the manual "consult an expert" option both make sense.

gcordoba (glgcg) wrote :

Hi,
what about doing fsck umounting the HD? I found that lot of people advice that in other forums.
Additionaly, in
  Ubuntu Forums > The Ubuntu Forum Community > Main Support Categories > General Help
Manual Fsck

a more "safe" procedure is adviced:
 sudo touch /forcefsck && sudo reboot

Is that really safe? If yes, it could be easy to implement in an automatic procedure or, at least to advice the user do that manually.
In may case (up to now, fortunatelly), when I have beed asked to do fsck, I just do ctrl-D, and a reboot is performed. Then, a "normal" checking pass sucessfuly. However, at the next check (after 30 bootings) the error is detected again. I afraid for my system and data.
With best regards,
Gustavo

The best route for practically all users is probably fsck -y. Yes, there's a tiny chance that a) fsck will break something AND b) that something happens to be important to the user AND c) the user could have done better manually BUT d) they didn't bother backing up AND e) they didn't know that the default was fsck -y and change it manually. The confluence of all of these is far less likely than one being missing, and if one is missing then fsck -y is better than plain fsck, or fsck -p. Yes, the consequences of fsck -y failure are more serious than the consequences of fsck -p failure, but the latter shouldn't be neglected and will crop up *much* more commonly, especially as Ubuntu becomes more mainstream.

You could add a scary prompt by default, but this will terrify ordinary users and not be very useful to most power-users either. I think running fsck -y instead of fsck -p by default on Ubuntu Desktop is the best way to fix this issue, with a well-documented way for power-users to alter the setting to the more conventional fsck -p.

I'm nominating this proposed fix for Intrepid for the drivers to consider whether they like the fix (changing -p to -y). I assume it's not necessary to allocate any resources to figure out how to implement it, i.e., I assume it's a one-line change somewhere.

When fsck on boot fails, it dumps the user into a shell.. and unless he is a Unix sysadmin, he will be lost.

Solution: Replace the current message with something like:

Repairing your file system may cause errors!!! Do you want to: [F]orce Repair, get to a [S]hell or [R]eboot:

1) Just rebooting when fsck fails is ... going to fail again. That's not useful.
2) If it fails to the point when you get a shell, forcing a fsck won't work - it dies because -y doesn't work.

Which leaves dropping to a shell, which is what we do.

Still, this is a moment of utter panic and confusion for anyone inexperienced with fsck. That's a lot of desktop Linux users. (Won't anyone think of the grandmothers!) The present interface simply abandons the user; it needs to be more guiding. If a fsck -y fails, then it should explain why it failed, what files might be gone, or if all hope is lost...

1) Rebooting can be useful if you want to reboot onto something else (a usb key or a rescue network boot or whatever and for some reason you can't directly access the power key).. alright, not that useful..

1.1) I agree we need to keep the current "drop me to a shell" for the few people who understand debugfs and other exciting tools.

2) It currently doesn't try -y, it tries -a ... which will only do "safe repairs", -y can often help. (unless -y is in /fsckoptions..)

3) Even if -y fails, there should really be more help.

For the the case I had today (using F9), the user got it working just by doing "fsck /dev/...." and answering "y" a few times, so -y would have definitely worked.

Whenever I've tried to "repair" a filesystem using fsck -y the filesystem has been destroyed and would no longer even get to /bin/bash the next time round. In my experience if the system is asking you this question you are already in trouble and at a bare minimum you have to understand what /lost+found means.

This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 9 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

The process we are following is described here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

I'm still not sure what sort of useful guiding information could be here that doesn't require 2 screens of data and a sysadmin background. Closing for now.

Changed in fedora:
status: In Progress → Won't Fix
Gustavo Carneiro (gjc) wrote :

This happened to a friend of mine today. She had to phone me, and I had to explain to her about the 'fsck -y'. It wasn't even clear to me what was wrong, she just said "it says it can't mount a filesystem", on the phone.

So, yes, making this process easier for non-expert people, a simple yes/no question with a menu, is the way to go. Ubuntu is being more difficult to use than it needs to be...

Lemmenes (john-lemmenes) wrote :

Bad Karma Koala
 I migrated from Windows to 9.04 and had 2 months of blissfully stable computing.  I even got my can't do without  legacy Windows programs going in VirtualBox.  I upgraded to 9.10 successfully on one machine without difficulty, but on the machine with my accounting software and a lot of data I get:

One or more of the mounts listed in /etc/fstab/
/ : waiting for /dev/disk/b4-uuid/.......
/tmp : waiting for null
/swap : waiting for uuid

I find myself foundering around in the recovery shell trying to use a command line which I have no familiarity.  I did mount -o remount,rw /
and then a
sudo dpkg --configure -a

Yesterday I dropped the fsck -y bomb to no discernable effect. Someone please tell me if it's time to just reinstall everything and just start over. It's a sure thing that I won't be reinstalling 9.10 on the anytime soon on either computer as I have read that this can occur periodically on even an apparently unaffected computer. How is it that this got out of Beta testing?

Once again I'm dealing with OS issues and not computing.  This is not a hobby for me.  I just want it to work so I can.

Lemmenes (john-lemmenes) wrote :

As a post script to the acerbic comment to the effect that those who failed to back up somehow deserve all of this, I do have all my data intact, it is the cost in time and aggravation of redoing all of this again. It doesnt have to be made automatic just make the choices clearer as to what to do and what the potential consequences are. By all means don't release a version with these kind of issues. I'm not a first adopter it's a tool not an avocation.

What is the prevalence of this anyway?

srinivas (cnus01) wrote :

hi gays plz help me

i have a fedora 11 server and its not booting up

how to repair the os without data loss

plz help me gays and my mail id is <email address hidden>

thanks
srinivas

affects: ubuntu-meta (Ubuntu) → util-linux (Ubuntu)
Phillip Susi (psusi) wrote :

This is not a bug: it is working as intended since the repair may cause data loss, it really does need manual user intervention, likely with a backup being made first.

Changed in util-linux (Ubuntu):
status: Confirmed → Invalid
Changed in fedora:
importance: Unknown → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.