ReiserFS volume (/home) damaged after upgrade to Hardy Heron Alpha 6

Bug #202933 reported by ixothym
18
Affects Status Importance Assigned to Milestone
sysvinit (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

I upgraded my laptop from Ubuntu 7.10 to Ubuntu 8.04 (Alpha 6) using "update-manager -c -d", which went fine. After the upgrade completed, I was told to reboot the machine and that's what I did. After the reboot I entered my username / password at the login prompt, which then told me that my home directory did not exist! My "/home" is a ReiserFS-3.6 volume on LVM2. I rebooted into rescue mode and tried to mount "/home", which resulted in this error:

# mount /home
mount: Operation not supported

The corresponding line from "/etc/fstab" is "/dev/mapper/storage--toxikum-home /home reiserfs defaults,acl,user_xattr 0 2". I looked for error messages in "dmesg" and found this:

...
[ 391.173974] ReiserFS: dm-4: found reiserfs format "3.6" with standard journal
[ 391.173999] ReiserFS: dm-4: using ordered data mode
[ 391.178746] ReiserFS: dm-4: journal params: device dm-4, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
[ 391.179232] ReiserFS: dm-4: checking transaction log (dm-4)
[ 391.225417] ReiserFS: warning: is_tree_node: node level 26691 does not match to the expected one 1
[ 391.225425] ReiserFS: dm-4: warning: vs-5150: search_by_key: invalid format found in block 8211. Fsck?
[ 391.225434] ReiserFS: dm-4: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1 2 0x0 SD]
[ 391.225443] ReiserFS: dm-4: Using r5 hash to sort names
[ 391.225452] ReiserFS: dm-4: warning: xattrs/ACLs enabled and couldn't find/create .reiserfs_priv. Failing mount.
...

Because of the hint to xattrs / ACLs I tried mounting "/home" without "acl,user_xattr" and even with "noacl,nouser_xattr", which did not help. The error message in "dmesg" stayed exactly the same.

Next I tried to run "reiserfsck" on "/dev/storage-toxikum/home" ("reiserfsck --fix-fixable /dev/storage-toxikum/home") which produced this output:

###########
reiserfsck --fix-fixable started at Sun Mar 16 15:37:41 2008
###########
Replaying journal..
Reiserfs journal '/dev/storage-toxikum/home' in blocks [18..8211]: 0 transactions replayed
Checking internal tree../ 1 (of 2)/ 1 (of 87)/ 1 (of 114)block 8211: The level of the node (26691) is not correct, (1) expected
 the problem in the internal node occured (8211), whole subtree is skipped
finished
Comparing bitmaps..vpf-10630: The on-disk and the correct bitmaps differs. Will be fixed later.
Bad nodes were found, Semantic pass skipped
1 found corruptions can be fixed only when running with --rebuild-tree
###########
reiserfsck finished at Sun Mar 16 15:38:34 2008
###########

It now seemed clear that my volume somehow got fried, so i took a copy of "/dev/storage-toxikum/home" for reference (using "dd") and started "reiserfsck --rebuild-tree", which left me with a folder "lost+found" from which I could restore most of the data that wasn't up-to-date in my backup. I attached the lengthy output from "reiserfsck --rebuild-tree /dev/storage-toxikum/home".

All my other ReiserFS-volumes in the same LVM-VG (like "/usr", "/var" or "/tmp") were not damaged. Now of course it is possible that this is merely a coincidence and that it's actually my hard drive which is failing. However I find this unlikely since I never had trouble with this disk and it also works like a charm now (after I restored my backup). SMART also tells me that the drive is fine.

If you need any more information, I would be happy to help. As I said, I have an image of the failed volume so I can run all sorts of tests on it. By the way, I tried mounting the image using a loopback device on a Debian machine running kernel 2.6.24-1-686, which left me with the same error.

Revision history for this message
ixothym (ixothym) wrote :
Revision history for this message
Jojo (kuzniarpawel) wrote :

The same happened in my case. After upgrading from gusty to hardy alpha 6, everything worked fine. But after one reboot, my system could not be initialized, because fsck on /home failed. I had to make fsck.reiserfs --rebuild-tree
It worked fine, but second reboot forced me to rebuilt-tree once more. Now I cannot launch normal session, because fsck allways fails on home. I'm pretty sure that it is not hardware related problem.

Revision history for this message
ixothym (ixothym) wrote :

A few minutes ago it happened again. I unplugged my bluetooth dongle and the machine freezed immediately with a kernel panic (caps lock and scroll lock blinking). I did a hard reboot since Magic SysRq did not work. During reboot in rescue mode, fsck complained about a bad volume which needed --rebuild-tree. However, all volumes were mounted so i continued to boot. Right before the login screen usually appears, the machine locked up again. I rebooted once more into rescue mode and ran a reiserfsck on my /var volume, which turned out to be the volume reported as bad earlier. I wonder why the volume is mounted (even read/write!) if it is known to be damaged...

Anyway, I ran a mkreiserfs on /dev/storage-toxikum/var to build a new filesystem since I had a recent backup. I would really like to hear some input on this random filesystem destruction... I can't provide any log messages since the kernel panics did not make it into the log files.

Revision history for this message
Pascal Mosimann (pascalm-etik) wrote :

Same problem for me. I upgraded from Ubuntu 7.10 to 8.04 beta. Everything went OK but after the reboot, the only partition on my system in ReiserFS is corrupted. Attached is the log in/var/log/fsck/checkfs.
I can recreate the partition, mount it, umount it etc... but when I reboot, it becomes corrupted.

Revision history for this message
ixothym (ixothym) wrote :

I can confirm that recreating the damaged partition is only a temporary fix. After a reboot, it will be corrupted again. I checked my drive using Drive Fitness Test (it's a PATA Hitachi drive), which reported no errors. I switched my /home and /var to ext3 for the time being.

Has anyone looked into this problem yet?

Revision history for this message
Andrew V. Sichevoi (a-sichevoi+ubuntu) wrote :

I confirm the problem.
I have a fresh installation of KUbuntu 8.04Beta with the latest updates. After the first reboot, system couldn't mount /home placed on LVM2 (with ReiserFS). Errors were the same as were described in the previous comments. After few reboots, even reiserfsck couldn't help -- the volume was totally damaged.

But it looks there is not only /home affected: after the next reboot, my other volume (LVM2, reiserfs, placed on the other SATA hdd) got corrupted; it is mounted by UUID via /etc/fstab.

Revision history for this message
Andrew V. Sichevoi (a-sichevoi+ubuntu) wrote :

It seems I managed to find a workaround (and possible the problem place; now it works for me -- after several reboots the problem doesn't appear anymore): need just to disable 'checkfs.sh' script run on startup (mv /etc/rcS.d/S30checkfs.sh /etc/rcS.d/K30checkfs.sh -- according to the documentation in /etc/rcS.d/README).

Somebody, who is responsible for that subsystem in Ubuntu, please, investigate this issue: it is a really serious and annoying bug, which has already corrupted some files on my PC.

Thank you

Revision history for this message
ixothym (ixothym) wrote :

Andrew seems to have tracked this down to /etc/init.d/checkfs.sh

Revision history for this message
Jason (jasonxh) wrote :

I reported bug #211417 which is a duplicate of this bug, and here's the workaround I found:

Change line #86 in /etc/init.d/checkfs.sh:
logsave -s $FSCK_LOGFILE fsck -C3 -R -A $fix $force $FSCKTYPES_OPT 3>$PROGRESS_FILE &
into
fsck -C3 -R -A $fix $force $FSCKTYPES_OPT 3>$PROGRESS_FILE | logsave -s $FSCK_LOGFILE - &

Revision history for this message
Laurent (laurent-nodrahc) wrote :

The same thing happened to me on three reiserfs partitions. Interestingly, they are all the last partitions of their respective drives (/dev/sda9, /dev/sdb8 and /dev/hdb1). For some reason /dev/hda1 was not damaged. I will try to resuscitate the dead partitions with the --rebuild-tree option but I am not very hopeful.

Jason (jasonxh)
Changed in sysvinit:
status: Confirmed → Fix Released
Revision history for this message
floid (jkanowitz) wrote :

After chattering on the dupe bug #211417, it appears that Hardy was released with the offending redirection modified to:

logsave -s $FSCK_LOGFILE fsck -C3 -R -A $fix $force $FSCKTYPES_OPT >/dev/console 2>&1 3>$PROGRESS_FILE &

So stdout and stderr are shoved straight to /dev/console.
I saw that this was marked 'Fix released' but had to satisfy myself that Hardy actually contained it. Hopefully this helps document it for someone else.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.