ext4 filesystem fails randomly with checksum error

Bug #1637779 reported by Selmi on 2016-10-29
32
This bug affects 6 people
Affects Status Importance Assigned to Milestone
e2fsprogs (Ubuntu)
Undecided
Unassigned

Bug Description

Description: Ubuntu 16.10
Release: 16.10

package version:
linux-image-4.8.0-26-generic:
  Installed: 4.8.0-26.28
  Candidate: 4.8.0-26.28
  Version table:
 *** 4.8.0-26.28 500
        500 http://sk.archive.ubuntu.com/ubuntu yakkety-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu yakkety-security/main amd64 Packages
        100 /var/lib/dpkg/status

fresh installation of Ubunut 16.10, all updates included

While I am working with system after few minutes root filesystem /dev/sdb5 switches into readonly mode
in dmesg is this:

[ 304.921552] EXT4-fs error (device sdb5): ext4_iget:4476: inode #24577: comm updatedb.mlocat: checksum invalid
[ 304.925565] Aborting journal on device sdb5-8.
[ 304.926507] EXT4-fs (sdb5): Remounting filesystem read-only
[ 304.927416] EXT4-fs error (device sdb5): ext4_journal_check_start:56: Detected aborted journal
[ 304.943408] EXT4-fs error (device sda1): ext4_iget:4476: inode #12: comm updatedb.mlocat: checksum invalid

when it happens I must do fsck f /dev/sdb1 once, second time it says everything is fine. after reboot when I start dto do something it soon happens again

Selmi (selmi) wrote :

after failure fsck shows this:

fsck from util-linux 2.28.2
e2fsck 1.43.3 (04-Sep-2016)
/dev/sdb5: recovering journal
Superblock last mount time is in the future.
 (by less than a day, probably due to the hardware clock being incorrectly set)
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found. Fix<y>? yes
Inode 131127 was part of the orphaned inode list. FIXED.
Deleted inode 143474 has zero dtime. Fix<y>? yes
Inode 792889 was part of the orphaned inode list. FIXED.
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Inode 787060 ref count is 1, should be 2. Fix<y>? yes
Pass 5: Checking group summary information
Block bitmap differences: -(104256--104319) -(1258816--1259263) -(1802240--1802548) -(3430933--3430934)
Fix<y>? yes
Free blocks count wrong for group #3 (13156, counted=13220).
Fix<y>? yes
Free blocks count wrong for group #38 (12870, counted=13318).
Fix<y>? yes
Free blocks count wrong for group #55 (29651, counted=29960).
Fix<y>? yes
Free blocks count wrong for group #104 (19316, counted=19318).
Fix<y>? yes
Free blocks count wrong (2230417, counted=2231421).
Fix<y>? yes
Inode bitmap differences: -131127 -143474 -792889
Fix ('a' enables 'yes' to all) <y>? yes to all
Free inodes count wrong (763160, counted=763233).
Fix? yes

/dev/sdb5: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sdb5: ***** REBOOT SYSTEM *****
/dev/sdb5: 219807/983040 files (0.1% non-contiguous), 1700739/3932160 blocks

Selmi (selmi) wrote :

problem seems to be related to ext2fsd on windows 10 (I have dual boot)
disk corruptions start as soon as I install it and mount the partition. after it corruptions appear even when I don't boot into windows - probably it damaged something too bad

Theodore Ts'o (tytso) wrote :

Is it always the same inode number which is reported as having an invalid checksum?

Can you attach the output of dumpe2fs -h /dev/hdXX (where hdXX should be replaced with the block device name of your file system)?

Selmi (selmi) wrote :

dumpef2s output attached

when i booted into windows and then back to linux there was always additional checksum error in superblock. when i was booting only to linux then there were always errors i wrote in previous comments, but I don't remember exact numbers

all this started because of damaged disk (hw problems), I had to buy new one and transfer what I could. i was using ext2fsd for years, but on partitions created years ago. it seems it doesn't like ones created with ubuntu 16.10. I already reported this problem to ext2fsd project

When I identified that issue was related with ext2fsd I removed it from windows, reformatted ext4 partitions and restored data from this damaged disk again - now everything seems to work without any problem. That also means I am not currently able to reproduce problem - I don't want to damage data, restoring takes very long time

Ben Straton (fanum) wrote :

I'm experiencing this exact issue. 16.10 x64, not dual booted. Only difference is mine is fill disk encrypted, so my device is dm-0.

Let me know what to attach

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in e2fsprogs (Ubuntu):
status: New → Confirmed
Sam Wisdom (zingballyhoo) wrote :

Also got this issue after using ext2fsd (dual-boot), never had any issues before

I have attached my dumpe2fs output

We have an error quiet similar to this one reported.
We are using Ubuntu 14.04 .. but I really think it's not a matter of version though.

We are getting randomly the following errors:
----------
[97030.025645] EXT4-fs error (device sdb5): ext4_iget:4156: inode #26351669: comm apache2: checksum invalid
----------
It's more than one, just to keep it short ;)
All of them are related to the same device sdb5, which we mount the '/var' partition.

Having this errors, and trying to reboot, it gets stuck booting up when it tries to mount the '/var' partition, informing that errors have been found checking the partition.
It shows up with different options (I (ignore) ,S (avoid mounting) ,M (fix manually). Either 'S' or 'M' are able to mount the partition without any trouble. But it's in every boot.

We have tried to boot up from USB recovery tool, and repair those errors executing the following command: `e2fsck -y /dev/sdb5` with the result I attached.

We have done it a couple of times, and after a random time these errors have showed up again. We also changed the disk (moving all the data from one to another) and it still happens the same.

We would like to know the reason why it happens so randomly. And of course how to avoid those errors forever!

Sorry I attached a wrong file, this one is the correct one!

Theodore Ts'o (tytso) wrote :

The errors simply mean that ext4 has detected that its metadata has gotten corrupted. How and why it happened is going to vary from situation to situation.

For example, in the case of the original reporter, it was due to him installing ext2fsd, and trying to access the file system from windows. The ext2fsd isn't aware of all of the latest new features in ext4, and worse, it didn't know to check the file system feature bitmaps, and if there is a feature set in the read-only incompat feature set or the incompat feature set which ext2fsd didn't understand, that it should keep its paws off the file system. This caused the checksum failures.

In Gerard's case in #8 and #9, it's not clear what the cause might be. Ubuntu 14.04 doesn't ship with e2fsprogs 1.43. So you've updated to a newer version of e2fsprogs; one which is newer than what is supported with Ubuntu 14.04. It's possible that this was due to your using a much older kernel than one that was truly ready to support the metadata checksum feature; so if you are using the ancient Ubuntu 14.04 kernel, that might be the explanation. Or it could be that there is a true hardware problem that you are tripping up against. In any case, this isn't an e2fsprogs bug. It's possible it's a kernel bug, but neither Ubuntu nor the upstream kernel community is going to support the ext4 metadata checksum feature on an ancient Ubuntu 14.04 kernel.

If you think your are technically saavy enough to run with an upstream kernel, you could update to a bleeding edge kernel, and then you can ask about ext4 problems on the linux-ext4 mailing list. If you don't think you're ready to go that extreme, it might be for the best if you were to just disable the metadata_csum feature, using "tune2fs -O ^metadata_csum /dev/sdb5" with the file system umounted. You can then disable mke2fs from enabling the metadata_csum feature in the future by editing /etc/mke2fs.conf, and removing metadata_csum from the features list.

First of all thank you very much for your quickly response.

I am glad to say you we fixed that issue thanks to your post :)

The problem was we formatted the disk (sb5) using some tool from newer Ubuntu version which put metadata_csum feature on it. And because using on Ubuntu 14.04 it had some trouble with it.
We also realize that the warning message on every reboot was for the impossibility to perform a fsck due to incompatibility with metdata_csum feature.

Summarizing, we did the following:

1 - Fix checksum errors using SystemRescue live CD and executing the e2fsck tool (1.43.3 which is fully compatible with metadata_csum.

2 - Once we were sure there were not any error, we proceed disabling metadata_csum, using "tune2fs -O ^metadata_csum /dev/sdb5" with the file system umounted (as you advised us).

3 - Once boot up, it made a fsck successfully and no more warnings about disck errors.

Thank you very much!

Greetings

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers