Ubuntu

Command-line recovery required when fsck reports an unexpectedy inconsistency

Reported by saads on 2006-09-01
32
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fedora
Won't Fix
Unknown
ubuntu-meta (Ubuntu)
Low
Unassigned
Nominated for Intrepid by Aryeh Gregor

Bug Description

When there is an ext3 filesystem error and fsck fails to correct it, the system becomes unusable. It asks the user to perform fsck manually in a very cryptic way. The average user can in no way understand what has happened to their system. In most cases a simple fsck /dev/hdaX (whatever partition required the check) will end up solving the problem. This should be done automatically.

Here is the output of what happened to a friend of mine who rebooted and had to use windows because they couldn't start ubuntu:

----------------------------------------------------------------------------
The following appears when i try to boot ubuntu:

dev/hda2 contains a file system with errors-check forced

unexpected inconsistency RUN fsck manually without -a or -p options

an automatic file system check of the root file system failed

a manual fsck must be performed then system rebooted
-----------------------------------------------------------------------------

Matt Zimmerman (mdz) wrote :

The reason the system requires that fsck be run manually in this case is that there is the possibility of data loss/corruption during the repair. Where errors can be automatically corrected without this risk, fsck simply corrects them without prompting the user.

Sivan Greenberg (sivan) wrote :

Matt, shouldn't we allow people to do that and confirm they understand the risk on the same prompt ? That is, when detecting such an error occurred, we offer the user to run the "manual" fsck and have him attend to the critical , possibly involving data loss error by fsck.

I think the reasoning is wrong here.

so, we are in the situation that fsck recovery could trigger data loss.
so, it's dangerous.

question is... what is the alternative for the user?
who is going to repair a filesystem corruption in any other way than fsck? ever? I mean... basically there is only one choice.
but ok... I guess at least we should ask the user... instead of just dumping her into a cryptic shell.

"Serious data corruption occurred, what do you want to do?"

<F5> auto repair (might cause dataloss)
<F6> manual repair (expert mode) -> which dumps the user to a shell as of today

because right now the user is dumped into a shell and most users will not know what to do and simply reinstall ubuntu, overwriting ALL of their data! so what did we protect the user from here?

Martin (martin615) wrote :

Why even ask? Seriously... who would answer anything but yes to fsck's questions?

Martin (martin615) wrote :

I also think the Importance should be bumped because, as Emmanuel said, it renders the system unusable for many users.

Neal McBurnett (nealmcb) wrote :

Please don't just fsck automatically for folks.

This just happened to me, and I ended up with a broken system by following the directions and running fsck and having to hold down the "enter" key for a LONG time while ominous messages flew by offering to remove illegal blocks, duplicate blocks, etc.

If I hadn't been wanting to get to bed, or if the data on the system had been more critical, I would have wanted to first make a copy of the disk with a rescue disk, dd and ssh so I could

 1) investigate just what had happened to the system in the first place, to hopefully keep this from happening to me or someone else

 2) be able to try alternate repair strategies.

It may well be that the default option for the majority of inexperienced users should be to offer to do an automatic fsck. But given the likelyhood of severe data loss, I think we really need to also offer them an option to stop, make a copy, consult an expert, or somehow deal with the situation manually as best they can.

And in either case, I think we should do whatever we can to maintain a log of the results of the fsck. That will probably be hard in many circumstances, but we could at least tell folks to stop, think about how they got there, and take notes for future reference. Some will be able to run the fsck via a serial port or something. It would also be nice if fsck offered some statistics at the end - max and min block number affected, # repairs, # repairs refused, or whatever to characterize what happened for future forensics or bug investigations.

Just remember, some folks will REALLY CARE about the data.

Martin (martin615) wrote :

fsck puts all data that can be salvaged in lost+found, right? Other than that, is it really useful to dump the disk image for any other purpose but debugging?

Martin (martin615) wrote :

And if there's some tool out there that is better at recovering data than fsck, why not investigate running that tool instead (or maybe in addition to fsck).

Anyways, asking and making it CRYSTAL CLEAR and DEAD SIMPLE what needs to be done to get back to a working state is better than what's there now.

Neal McBurnett (nealmcb) wrote :

lost+found is useful, and people should be reminded that it is there. But there is a lot of info that fsck spews out, like directory names and inodes, that is currently not seen again as far as I know. Sounds like a separate bug on fsck (or user education :-) might be in order, but nevertheless I think the automatic fsck option and the manual "consult an expert" option both make sense.

gcordoba (glgcg) wrote :

Hi,
what about doing fsck umounting the HD? I found that lot of people advice that in other forums.
Additionaly, in
  Ubuntu Forums > The Ubuntu Forum Community > Main Support Categories > General Help
Manual Fsck

a more "safe" procedure is adviced:
 sudo touch /forcefsck && sudo reboot

Is that really safe? If yes, it could be easy to implement in an automatic procedure or, at least to advice the user do that manually.
In may case (up to now, fortunatelly), when I have beed asked to do fsck, I just do ctrl-D, and a reboot is performed. Then, a "normal" checking pass sucessfuly. However, at the next check (after 30 bootings) the error is detected again. I afraid for my system and data.
With best regards,
Gustavo

The best route for practically all users is probably fsck -y. Yes, there's a tiny chance that a) fsck will break something AND b) that something happens to be important to the user AND c) the user could have done better manually BUT d) they didn't bother backing up AND e) they didn't know that the default was fsck -y and change it manually. The confluence of all of these is far less likely than one being missing, and if one is missing then fsck -y is better than plain fsck, or fsck -p. Yes, the consequences of fsck -y failure are more serious than the consequences of fsck -p failure, but the latter shouldn't be neglected and will crop up *much* more commonly, especially as Ubuntu becomes more mainstream.

You could add a scary prompt by default, but this will terrify ordinary users and not be very useful to most power-users either. I think running fsck -y instead of fsck -p by default on Ubuntu Desktop is the best way to fix this issue, with a well-documented way for power-users to alter the setting to the more conventional fsck -p.

I'm nominating this proposed fix for Intrepid for the drivers to consider whether they like the fix (changing -p to -y). I assume it's not necessary to allocate any resources to figure out how to implement it, i.e., I assume it's a one-line change somewhere.

Changed in fedora:
status: In Progress → Won't Fix
Gustavo Carneiro (gjc) wrote :

This happened to a friend of mine today. She had to phone me, and I had to explain to her about the 'fsck -y'. It wasn't even clear to me what was wrong, she just said "it says it can't mount a filesystem", on the phone.

So, yes, making this process easier for non-expert people, a simple yes/no question with a menu, is the way to go. Ubuntu is being more difficult to use than it needs to be...

Lemmenes (john-lemmenes) wrote :

Bad Karma Koala
 I migrated from Windows to 9.04 and had 2 months of blissfully stable computing.  I even got my can't do without  legacy Windows programs going in VirtualBox.  I upgraded to 9.10 successfully on one machine without difficulty, but on the machine with my accounting software and a lot of data I get:

One or more of the mounts listed in /etc/fstab/
/ : waiting for /dev/disk/b4-uuid/.......
/tmp : waiting for null
/swap : waiting for uuid

I find myself foundering around in the recovery shell trying to use a command line which I have no familiarity.  I did mount -o remount,rw /
and then a
sudo dpkg --configure -a

Yesterday I dropped the fsck -y bomb to no discernable effect. Someone please tell me if it's time to just reinstall everything and just start over. It's a sure thing that I won't be reinstalling 9.10 on the anytime soon on either computer as I have read that this can occur periodically on even an apparently unaffected computer. How is it that this got out of Beta testing?

Once again I'm dealing with OS issues and not computing.  This is not a hobby for me.  I just want it to work so I can.

Lemmenes (john-lemmenes) wrote :

As a post script to the acerbic comment to the effect that those who failed to back up somehow deserve all of this, I do have all my data intact, it is the cost in time and aggravation of redoing all of this again. It doesnt have to be made automatic just make the choices clearer as to what to do and what the potential consequences are. By all means don't release a version with these kind of issues. I'm not a first adopter it's a tool not an avocation.

What is the prevalence of this anyway?

srinivas (cnus01) wrote :

hi gays plz help me

i have a fedora 11 server and its not booting up

how to repair the os without data loss

plz help me gays and my mail id is <email address hidden>

thanks
srinivas

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.