Ubuntu
linux package

disk corruption ext3_free_blocks_sb: bit already cleared for block 232785

Bug #209346 reported by sam tygier on 2008-03-30

This bug affects 4 people

	Status	Importance	Assigned to
linux (Ubuntu)	Fix Released	Undecided	Unassigned
Jaunty	Won't Fix	High	Manoj Iyer
linux-2.6 (Debian)	Fix Released	Unknown	debbugs #497562

Bug Description

i booted up my computer (xeon 5130, on tyan s2696, WD raptor connected by sata), running 64bit hardy, up to date as of yesterday. it booted ok, and i opened a few applications (firefox, liferea). i started downloading a live cd from the ubuntu website.

then halfway through the download firefox pops up a message saying that it cannot write the iso file because the disk is locked. i have a look in nautilus and the iso file is marked as locked, i notice that the disk is full. i try to delete some files using nautilus and get more messages about the disk being locked. i try to open a terminal, but bash wont load. switch to a virtual terminal, log in and run dmesg. (luckly had a usb disk mounted so i save dmesg and syslog).

[ 74.631330] NET: Registered protocol family 10
[ 74.631568] lo: Disabled Privacy Extensions
[ 74.632685] ADDRCONF(NETDEV_UP): eth1: link is not ready
[ 85.124125] ath0: no IPv6 routers present
[ 85.515886] eth0: no IPv6 routers present
[ 903.875961] EXT3-fs error (device sda9): ext3_free_blocks_sb: bit already cleared for block 232785
[ 903.875971] Aborting journal on device sda9.
[ 903.876456] ext3_abort called.
[ 903.876459] EXT3-fs error (device sda9): ext3_journal_start_sb: Detected aborted journal
[ 903.876462] Remounting filesystem read-only
[ 903.877537] Remounting filesystem read-only
[ 903.877819] EXT3-fs error (device sda9): ext3_free_blocks_sb: bit already cleared for block 232786
[ 903.878272] EXT3-fs error (device sda9): ext3_free_blocks_sb: bit already cleared for block 232787
[ 903.878678] EXT3-fs error (device sda9): ext3_free_blocks_sb: bit already cleared for block 232788
[ 903.879076] EXT3-fs error (device sda9): ext3_free_blocks_sb: bit already cleared for block 232789
[ 903.879477] EXT3-fs error (device sda9) in ext3_free_blocks_sb: Journal has aborted
[ 903.879821] EXT3-fs error (device sda9) in ext3_free_blocks_sb: Journal has aborted
[ 903.880166] EXT3-fs error (device sda9) in ext3_free_blocks_sb: Journal has aborted
[ 903.880514] EXT3-fs error (device sda9) in ext3_free_blocks_sb: Journal has aborted
[ 903.880868] EXT3-fs error (device sda9) in ext3_reserve_inode_write: Journal has aborted
[ 903.880871] EXT3-fs error (device sda9) in ext3_truncate: Journal has aborted
[ 903.881551] EXT3-fs error (device sda9) in ext3_reserve_inode_write: Journal has aborted
[ 903.881554] EXT3-fs error (device sda9) in ext3_orphan_del: Journal has aborted
[ 903.882238] EXT3-fs error (device sda9) in ext3_reserve_inode_write: Journal has aborted
[ 903.934531] __journal_remove_journal_head: freeing b_committed_data
[ 903.934538] __journal_remove_journal_head: freeing b_committed_data
[ 903.934541] __journal_remove_journal_head: freeing b_committed_data

syslog had lots of:
Mar 30 10:55:15 oberon gdmsetup[6315]: CRITICAL: Error opening file: No such file or directory
Mar 30 10:55:40 oberon last message repeated 3 times
Mar 30 10:56:05 oberon last message repeated 2 times
Mar 30 10:56:13 oberon gdmsetup[6379]: CRITICAL: Error opening file: No such file or directory
Mar 30 10:56:13 oberon gdmsetup[6379]: CRITICAL: Error opening file: No such file or directory

(i opened login screen prefs at the start of the session to turn off the gdm sounds).

i can provide more of dmesg and syslog if needed

on rebooting, fsck was unable to repair the errors (but i will file that as a separate issue).

I have checked that all the cables were secure.

I have checked the disk using smartcrl, it seems to be fine

sam@oberon:~$ sudo smartctl -a /dev/sda
smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Raptor family
Device Model: WDC WD1500ADFD-00NLR5
Serial Number: WD-WMAP41915867
Firmware Version: 21.07QR5
User Capacity: 150,039,945,216 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 7
ATA Standard is: ATA/ATAPI-7 published, ANSI INCITS 397-2005
Local Time is: Sun Mar 30 19:13:36 2008 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
     was completed without error.
     Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
     without error or no self-test has ever
     been run.
Total time to complete Offline
data collection: (4783) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
     Auto Offline data collection on/off support.
     Suspend Offline collection upon new
     command.
     Offline surface scan supported.
     Self-test supported.
     Conveyance Self-test supported.
     Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
     power-saving mode.
     Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
     General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 72) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000b 200 200 051 Pre-fail Always - 0
  3 Spin_Up_Time 0x0007 162 159 021 Pre-fail Always - 4925
  4 Start_Stop_Count 0x0032 100 100 040 Old_age Always - 97
  5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
  7 Seek_Error_Rate 0x000a 200 200 051 Old_age Always - 0
  9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1440
10 Spin_Retry_Count 0x0012 100 253 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0012 100 253 051 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 97
194 Temperature_Celsius 0x0022 106 104 000 Old_age Always - 41
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0012 200 200 000 Old_age Always - 0
199 UDMA_CRC_Error_Count 0x000a 200 253 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Conveyance offline Completed without error 00% 1440 -
# 2 Extended offline Completed without error 00% 1440 -
# 3 Short offline Completed without error 00% 1439 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
    1 0 0 Not_testing
    2 0 0 Not_testing
    3 0 0 Not_testing
    4 0 0 Not_testing
    5 0 0 Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Tags:

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2008-04-04:

Hi Sam,

Just curious if you are able to consistently reproduce this issue or if it was a random occurrence? Also care to comment which specific version of the kernel you were using (cat /proc/version_signature)? I know 2.6.24-15 was recently released. Thanks.

Changed in linux:
status:	New → Incomplete

Revision history for this message

sam tygier (samtygier) wrote on 2008-04-05:

i have not been able to reproduce this. it was with 2.6.24-12.22. i can't see anything in the changlog that looks related.

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2008-08-29:

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2009-01-17:

*This is an automated response*

This bug report is being closed because we received no response to the previous request for information. Please reopen this if it is still an issue in the actively developed pre-release of Jaunty Jackalope 9.04 - http://cdimage.ubuntu.com/releases/jaunty . To reopen the bug report simply change the Status of the "linux" task back to "New".

Changed in linux:
status:	Incomplete → Won't Fix

Revision history for this message

Fumihito YOSHIDA (hito) wrote on 2009-03-17:

Hi, I found this related information at Red Hat EL 4.8's RHSA-2009:0331-14.
The problem caused by race conditions, It is not hardware trouble.
I could not reproduce yet, but source code said "problem stills in Jaunty"(maybe).

Jaunty's linux/fs/buffer.c::void unlock_buffer is :
{
        clear_bit_unlock(BH_Lock, &bh->b_state);
        smp_mb__after_clear_bit();
        wake_up_bit(&bh->b_state, BH_Lock);

Information from Red Hat:
* http://rhn.redhat.com/errata/RHSA-2009-0331.html
* => https://bugzilla.redhat.com/show_bug.cgi?id=476533 (published info, link from RHSA-2009:0331-14)
* => https://bugzilla.redhat.com/show_bug.cgi?id=460179 (this is Red Hat's private)
* Red Hat's patchname:linux-2.6.9-fs-fix-it-already-cleared-for-block-errors.patch
linux/fs/buffer.c::void unlock_buffer is :
{
+ smp_mb__before_clear_bit();
        clear_bit_unlock(BH_Lock, &bh->b_state);
        smp_mb__after_clear_bit();
        wake_up_bit(&bh->b_state, BH_Lock);

They said,
> a misplaced memory barrier at unlock_buffer() could lead to a concurrent
> h_refcounter update which produced a reference counter leak and, later, a
> double free in ext3_xattr_release_block(). Consequent to the double free,
> ext3 reported an error
>
> ext3_free_blocks_sb: bit already cleared for block [block number]
>
> and mounted itself as read-only. With this update, the memory barrier is
> now placed before the buffer head lock bit, forcing the write order and
> preventing the double free. (BZ#476533)

// I cant understand that why Red Hat does not distribute this info....

Related info are:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=497562

Changed in linux:
status:	Won't Fix → New

Bug Watch Updater (bug-watch-updater) on 2009-03-17

Changed in linux-2.6:
status:	Unknown → New

Revision history for this message

nicobrainless (nicoseb) wrote on 2009-03-23:

Hi,

I can confirm this on Jaunty 2.6.28-11 fully updated...
it had been a week and I have data corruptions each time it happens!!
Once I totally lost "/etc" and had to reinstall. I first thought it was related to the ext4 data loss problem but I reinstalled yesterday everything with ext3 again and even if the result is not as dramatic I still have those random crash-up...
I cannot find a way to reproduce, it just happens after a while...
I haven't lost any personal data yet but that might happened...

in syslog/dmesg the report is this:
[ 5109.838850] EXT3-fs error (device sda1): ext3_free_blocks_sb: bit already cleared for block 9224201
[ 5109.838865] Aborting journal on device sda1.
[ 5109.839318] Remounting filesystem read-only
[ 5109.839558] EXT3-fs error (device sda1) in ext3_free_blocks_sb: Journal has aborted
[ 5109.839567] EXT3-fs error (device sda1) in ext3_reserve_inode_write: Journal has aborted
[ 5109.839572] EXT3-fs error (device sda1) in ext3_truncate: Journal has aborted
[ 5109.839578] EXT3-fs error (device sda1) in ext3_reserve_inode_write: Journal has aborted
[ 5109.839583] EXT3-fs error (device sda1) in ext3_orphan_del: Journal has aborted
[ 5109.839588] EXT3-fs error (device sda1) in ext3_reserve_inode_write: Journal has aborted
[ 5109.852220] __journal_remove_journal_head: freeing b_committed_data

please help!

Revision history for this message

Johan Sköld (johan-skold) wrote on 2009-04-19:

I can also confirm this happens in jaunty. I cannot say exactly what causes it, but it happens after a few minutes whenever I boot my computer, thus rendering my computer unusable. There is nothing physically wrong with the hard drive as I've for one tried another one, and the computer is only half a year old. fsck fixes the disk (supposedly), but it won't let me remount with read-write, even after running "blockdev --setrw /dev/sda1".

Dmesg has the exact same output as already reported numerous times in this bug report. I'm under 64-bit jaunty running 2.6.28-11-generic, and the computer in question is an HP EliteBook 8730w. I've tried a re-install and same thing happened. I'll leave this installation as is until jaunty is up to release, so if more information is needed, please provide instructions of how I obtain said information.

Revision history for this message

nicobrainless (nicoseb) wrote on 2009-04-19:

Well, the only solution I found was to keep booting on 2.6.28-9 instead of 2.6.28-11 and I did not get any trouble since then!
Yesterday I updated my kernel going straight to 2.6.29... works like a charm!

Revision history for this message

Oliver Paulus (oliver-webprojekt) wrote on 2009-04-28:

I can confirm this in jaunty too. 2 times my ext3 partition got a disk corruption. will try older kernel for a while (2.6.27 series).

Revision history for this message

Johan Sköld (johan-skold) wrote on 2009-04-28:

#10

2.6.29 does indeed solve this, and the issue where my computer wouldn't shut down correctly is gone.

Was 2.6.28 just a bad version?

Revision history for this message

Renan Teston Inácio (zerocaronte) wrote on 2009-05-06:

#11

Although my logs showed a different problem, ie, another ext3 function in the kernel and with different error description: "rec_len is smaller than minimal", maybe it is related.

I did a clean install of Jaunty and after 13 hours of uptime, it gave me that error. It is worth mentioning that I was downloading a total of 8gb and some of these files were big (>2GB).

Before I did this clean install, I had constant corruptions everytime I rebooted the machine. Started happening after upgrade from Intrepid to Jaunty.

I didn't want to mess with kernels and just installed Intrepid again. Problem disappeared.
The cpu is dual core and using 64 bits.

Revision history for this message

Dmitry Diskin (diskin) wrote on 2009-05-10:

#12

Seems to be related (duplicate?) of https://bugs.launchpad.net/ubuntu/+source/linux/+bug/346691

Revision history for this message

muchasuerte (nicotra-andrea) wrote on 2009-05-10:

#13

I have the same error on my laptop ( kubuntu 9.04 kernel 2.6.28-11-generic arch: X86_64 )

that is a log:
[ 631.748086] CE: hpet increasing min_delta_ns to 15000 nsec
[ 1058.506543] EXT3-fs error (device sda3): ext3_add_entry: bad entry in directory #9388091: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
[ 1182.239176] EXT3-fs error (device sda3): ext3_free_inode: bit already cleared for inode 9380218
[ 1291.255541] EXT3-fs error (device sda2): ext3_add_entry: bad entry in directory #172460: rec_len % 4 != 0 - offset=0, inode=1718313839, rec_len=30049, name_len=108
[ 1291.255560] Aborting journal on device sda2.
[ 1291.256668] Remounting filesystem read-only
[ 1291.257616] EXT3-fs error (device sda2) in start_transaction: Journal has aborted
[ 1291.257634] EXT3-fs error (device sda2) in ext3_create: IO failure

also I found error like:

clocksource tsc unstable (delta= -62811680 ns )

I must try installation of kubuntu 9.04 6 times before it will go correctly

Revision history for this message

muchasuerte (nicotra-andrea) wrote on 2009-05-12:

#14

lspci_dmesg.log Edit (52.7 KiB, text/plain)

that bug could be connected with that: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/275359
I verify that the corruption is connected with the wifi connection, if I use the normal ethernet I don't have corruption.

Network controller: Intel Corporation PRO/Wireless 5300 AGN [Shiloh] Network Connection

Revision history for this message

Fumihito YOSHIDA (hito) wrote on 2009-05-12:

#15

Hi muchasuerte,

OMFG, this ext3 problem("ext3_free_blocks_sb: bit already cleared") exists in some cases,

case1) by inconsistentcy file system.
(e.g. caused by kernel crash, because kernel crash losts "cached/buffered" ext3 operations)

case2) by hardware troubles.
(e.g. HDD, Memory, CPU troubles)

case3) by coding error, such as memory barrier.

Your case is 1? or not?

Revision history for this message

muchasuerte (nicotra-andrea) wrote on 2009-05-12: Re: [Bug 209346] Re: disk corruption ext3_free_blocks_sb: bit already cleared for block 232785

#16

Download full text (10.2 KiB)

Hi Fumihito YOSHIDA,

I think that my case is 1, because:

the problem could be related with wifi firmware bug, I don't have corruption
with wired connection,

if I use apt-get update and apt-get dist-upgrade and I'm on wifi connection
I have error on ext3 partition

I have check: HDD ( with seatool test al test passed ), Memory ( also check
memory and no problem was found )
I have that on dmesg

[ 44.001108] Clocksource tsc unstable (delta = -76736091 ns)
[ 249.909549] process `skype.real' is using obsolete setsockopt
SO_BSDCOMPAT
[ 2014.282976] CE: hpet increasing min_delta_ns to 15000 nsec
[ 2754.552046] CE: hpet increasing min_delta_ns to 22500 nsec
[ 4289.458793] RPC: Registered udp transport module.
[ 4289.458799] RPC: Registered tcp transport module.
[ 4507.716061] CE: hpet increasing min_delta_ns to 33750 nsec

it could be an CPU troubles?

the laptop has a week of life, is an Compal-KHLB2

2009/5/12 Fumihito YOSHIDA <email address hidden>

Hi Fumihito YOSHIDA,

I think that my case is 1, because:

the problem could be related with wifi firmware bug, I don't have corruption
with wired connection,

if I use apt-get update and apt-get dist-upgrade and I'm on wifi connection
I have error on ext3 partition

I have check: HDD ( with seatool test al test passed ), Memory ( also check
memory and no problem was found )
I have that on dmesg

[   44.001108] Clocksource tsc unstable (delta = -76736091 ns)
[  249.909549] process `skype.real' is using obsolete setsockopt
SO_BSDCOMPAT
[ 2014.282976] CE: hpet increasing min_delta_ns to 15000 nsec
[ 2754.552046] CE: hpet increasing min_delta_ns to 22500 nsec
[ 4289.458793] RPC: Registered udp transport module.
[ 4289.458799] RPC: Registered tcp transport module.
[ 4507.716061] CE: hpet increasing min_delta_ns to 33750 nsec

it could be an CPU troubles?

the laptop has a week of life, is an Compal-KHLB2

2009/5/12 Fumihito YOSHIDA <hito@kugutsu.org>

> Hi muchasuerte,
>
> OMFG, this ext3 problem("ext3_free_blocks_sb: bit already cleared")
> exists in some cases,
>
>  case1) by inconsistentcy file system.
>        (e.g. caused by kernel crash, because kernel crash losts
> "cached/buffered" ext3 operations)
>
>  case2) by hardware troubles.
>        (e.g. HDD, Memory, CPU troubles)
>
>  case3) by coding error, such as memory barrier.
>
> Your case is 1? or not?
>
> --
> disk corruption ext3_free_blocks_sb: bit already cleared for block 232785
> https://bugs.launchpad.net/bugs/209346
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in “linux” source package in Ubuntu: New
> Status in “linux-2.6” source package in Debian: New
>
> Bug description:
> i booted up my computer (xeon 5130, on tyan s2696, WD raptor connected by
> sata), running 64bit hardy, up to date as of yesterday. it booted ok, and i
> opened a few applications (firefox, liferea). i started downloading a live
> cd from the ubuntu website.
>
> then halfway through the download firefox pops up a message saying that it
> cannot write the iso file because the disk is locked. i have a look in
> nautilus and the iso file is marked as locked, i notice that the disk is
> full. i try to delete some files using nautilus and get more messages about
> the disk being locked. i try to open a terminal, but bash wont load. switch
> to a virtual terminal, log in and run dmesg. (luckly had a usb disk mounted
> so i save dmesg and syslog).
>
> [   74.631330] NET: Registered protocol family 10
> [   74.631568] lo: Disabled Privacy Extensions
> [   74.632685] ADDRCONF(NETDEV_UP): eth1: link is not ready
> [   85.124125] ath0: no IPv6 routers present
> [   85.515886] eth0: no IPv6 routers present
> [  903.875961] EXT3-fs error (device sda9): ext3_free_blocks_sb: bit
> already cleared for block 232785
> [  903.875971] Aborting journal on device sda9.
> [  903.876456] ext3_abort called.
> [  903.876459] EXT3-fs error (device sda9): ext3_journal_start_sb: Detected
> aborted journal
> [  903.876462] Remounting filesystem read-only
> [  903.877537] Remounting filesystem read-only
> [  903.877819] EXT3-fs error (device sda9): ext3_free_blocks_sb: bit
> already cleared for block 232786
> [  903.878272] EXT3-fs error (device sda9): ext3_free_blocks_sb: bit
> already cleared for block 232787
> [  903.878678] EXT3-fs error (device sda9): ext3_free_blocks_sb: bit
> already cleared for block 232788
> [  903.879076] EXT3-fs error (device sda9): ext3_free_blocks_sb: bit
> already cleared for block 232789
> [  903.879477] EXT3-fs error (device sda9) in ext3_free_blocks_sb: Journal
> has aborted
> [  903.879821] EXT3-fs error (device sda9) in ext3_free_blocks_sb: Journal
> has aborted
> [  903.880166] EXT3-fs error (device sda9) in ext3_free_blocks_sb: Journal
> has aborted
> [  903.880514] EXT3-fs error (device sda9) in ext3_free_blocks_sb: Journal
> has aborted
> [  903.880868] EXT3-fs error (device sda9) in ext3_reserve_inode_write:
> Journal has aborted
> [  903.880871] EXT3-fs error (device sda9) in ext3_truncate: Journal has
> aborted
> [  903.881551] EXT3-fs error (device sda9) in ext3_reserve_inode_write:
> Journal has aborted
> [  903.881554] EXT3-fs error (device sda9) in ext3_orphan_del: Journal has
> aborted
> [  903.882238] EXT3-fs error (device sda9) in ext3_reserve_inode_write:
> Journal has aborted
> [  903.934531] __journal_remove_journal_head: freeing b_committed_data
> [  903.934538] __journal_remove_journal_head: freeing b_committed_data
> [  903.934541] __journal_remove_journal_head: freeing b_committed_data
>
> syslog had lots of:
> Mar 30 10:55:15 oberon gdmsetup[6315]: CRITICAL: Error opening file: No
> such file or directory
> Mar 30 10:55:40 oberon last message repeated 3 times
> Mar 30 10:56:05 oberon last message repeated 2 times
> Mar 30 10:56:13 oberon gdmsetup[6379]: CRITICAL: Error opening file: No
> such file or directory
> Mar 30 10:56:13 oberon gdmsetup[6379]: CRITICAL: Error opening file: No
> such file or directory
>
> (i opened login screen prefs at the start of the session to turn off the
> gdm sounds).
>
> i can provide more of dmesg and syslog if needed
>
> on rebooting, fsck was unable to repair the errors (but i will file that as
> a separate issue).
>
> I have checked that all the cables were secure.
>
> I have checked the disk using smartcrl, it seems to be fine
>
> sam@oberon:~$ sudo smartctl -a  /dev/sda
> smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce
> Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Raptor family
> Device Model:     WDC WD1500ADFD-00NLR5
> Serial Number:    WD-WMAP41915867
> Firmware Version: 21.07QR5
> User Capacity:    150,039,945,216 bytes
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   7
> ATA Standard is:  ATA/ATAPI-7 published, ANSI INCITS 397-2005
> Local Time is:    Sun Mar 30 19:13:36 2008 BST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> General SMART Values:
> Offline data collection status:  (0x82) Offline data collection activity
>                                        was completed without error.
>                                        Auto Offline Data Collection:
> Enabled.
> Self-test execution status:      (   0) The previous self-test routine
> completed
>                                        without error or no self-test has
> ever
>                                        been run.
> Total time to complete Offline
> data collection:                 (4783) seconds.
> Offline data collection
> capabilities:                    (0x7b) SMART execute Offline immediate.
>                                        Auto Offline data collection on/off
> support.
>                                        Suspend Offline collection upon new
>                                        command.
>                                        Offline surface scan supported.
>                                        Self-test supported.
>                                        Conveyance Self-test supported.
>                                        Selective Self-test supported.
> SMART capabilities:            (0x0003) Saves SMART data before entering
>                                        power-saving mode.
>                                        Supports SMART auto save timer.
> Error logging capability:        (0x01) Error logging supported.
>                                        General Purpose Logging supported.
> Short self-test routine
> recommended polling time:        (   2) minutes.
> Extended self-test routine
> recommended polling time:        (  72) minutes.
> Conveyance self-test routine
> recommended polling time:        (   5) minutes.
>
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED
>  WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x000b   200   200   051    Pre-fail  Always
>   -       0
>  3 Spin_Up_Time            0x0007   162   159   021    Pre-fail  Always
>   -       4925
>  4 Start_Stop_Count        0x0032   100   100   040    Old_age   Always
>   -       97
>  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always
>   -       0
>  7 Seek_Error_Rate         0x000a   200   200   051    Old_age   Always
>   -       0
>  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always
>   -       1440
>  10 Spin_Retry_Count        0x0012   100   253   051    Old_age   Always
>     -       0
>  11 Calibration_Retry_Count 0x0012   100   253   051    Old_age   Always
>     -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always
>     -       97
> 194 Temperature_Celsius     0x0022   106   104   000    Old_age   Always
>     -       41
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always
>     -       0
> 197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always
>     -       0
> 198 Offline_Uncorrectable   0x0012   200   200   000    Old_age   Always
>     -       0
> 199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always
>     -       0
> 200 Multi_Zone_Error_Rate   0x0008   200   200   051    Old_age   Offline
>    -       0
>
> SMART Error Log Version: 1
> No Errors Logged
>
> SMART Self-test log structure revision number 1
> Num  Test_Description    Status                  Remaining  LifeTime(hours)
>  LBA_of_first_error
> # 1  Conveyance offline  Completed without error       00%      1440
>   -
> # 2  Extended offline    Completed without error       00%      1440
>   -
> # 3  Short offline       Completed without error       00%      1439
>   -
>
> SMART Selective self-test log data structure revision number 1
>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>    1        0        0  Not_testing
>    2        0        0  Not_testing
>    3        0        0  Not_testing
>    4        0        0  Not_testing
>    5        0        0  Not_testing
> Selective self-test flags (0x0):
>  After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
>

Revision history for this message

Fumihito YOSHIDA (hito) wrote on 2009-05-12:

#17

Hi muchasuerte,

> the problem could be related with wifi firmware bug, I don't have corruption
> with wired connection,

This ext3 corruption happens everytime?
(Can you reproduce each time with i5300 wifi?)

If Yes, your pattern seems 3), because;
a) You can use ext3 with onboard ethernet connection.
b) You cannot use ext3 with wifi connection Intel 5300.
a+b) mean: your file system is healthy, but you access with i5300 driver,
that break your filesystem(or filesystem barrier).

I'd bet that is not filesystem consistency issue, it is filesystems coding bug.

Revision history for this message

muchasuerte (nicotra-andrea) wrote on 2009-05-15:

#18

my bug was fixed with the package linux-backports-modules-2.6.28 - 2.6.28-11.12

Revision history for this message

Renan Teston Inácio (zerocaronte) wrote on 2009-05-16:

#19

As Dmitry Diskin pointed out, seems related: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/346691/comments/3
I don't know if that particular commit helps, but it points to http://lkml.org/lkml/2008/11/14/121, where there is a script to stress test the file system.
With kernels 2.6.28-11.42(jaunty)1 and 2.6.28-12.43(jaunty-proposed), all family of extN partitions becomes corrupted on the first run. I compiled 2.6.29.3 vanilla and the stress test passes on it.

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2009-06-19:

#20

From a few comments here it seems this should be resolved with a 2.6.29 or newer kernel. Based on the fact that the kernel for Karmic will target 2.6.31 I would hope that this is indeed fixed already for the karmic kernel. As a result I'm marking the actively developer "linux (Ubuntu)" task as Fix Released bug I've also gone ahead and opened a Jaunty nomination in the case where we can isolate the specific patch to backport as a stable release update to Jaunty.

For anyone who hasn't already tested a 2.6.29 or newer kernel, the Ubuntu kernel team began building upstream mainline kernel builds which you could use to test, see https://wiki.ubuntu.com/KernelMainlineBuilds. Alternatively, you could run the kernel for Karmic.

Changed in linux (Ubuntu):
status:	New → Fix Released
Changed in linux (Ubuntu Jaunty):
importance:	Undecided → High
status:	New → Triaged

Manoj Iyer (manjo) on 2009-07-09

Changed in linux (Ubuntu Jaunty):
assignee:	nobody → Manoj Iyer (manjo)

Revision history for this message

Manoj Iyer (manjo) wrote on 2009-07-09:

#21

Is this still an issue with the latest Jaunty kernel (2.6.28-14.47) ? and does mainline kernel fix this issue ? https://wiki.ubuntu.com/KernelMainlineBuilds ?

Changed in linux (Ubuntu Jaunty):
status:	Triaged → Incomplete

Revision history for this message

Renan Teston Inácio (zerocaronte) wrote on 2009-07-10:

#22

I tested kernels 2.6.28-14.46 (jaunty-proposed) and 2.6.28.10 (mainline) in a virtual machine and the problem still exists.

Revision history for this message

Manoj Iyer (manjo) wrote on 2009-07-13:

#23

Lets see if the solution proposed by the RHN errata fixes the issue for you in jaunty. Can you please try the kernel in:

http://people.ubuntu.com/~manjo/lp209346-jaunty/

and report if that fixes it for you.

Revision history for this message

Renan Teston Inácio (zerocaronte) wrote on 2009-07-15:

#24

Kernel 2.6.28-02062810-generic does not fix the problem.

Revision history for this message

Manoj Iyer (manjo) wrote on 2009-07-16:

#25

Renan,

Did you try the kernel I posted above ?

Thanks

Revision history for this message

Renan Teston Inácio (zerocaronte) wrote on 2009-07-16:

#26

Yes, that was the kernel version indicated by "uname -a" after booting with your kernel.
I can reproduce this easily on a virtual machine (64bits guest, haven't tried 32bits) using the script posted on http://lkml.org/lkml/2008/11/14/121 . Maybe you can reproduce it in the same way?

Revision history for this message

Manoj Iyer (manjo) wrote on 2009-07-23:

#27

Renan,

Can you try the kernel in http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.31-rc4/

This is the 2.6.31-rc4 or latest karmic, if this fixes your problem, then I can go see what changed that could have fixed the bug.

Revision history for this message

Manoj Iyer (manjo) wrote on 2009-07-23:

#28

Renan,

Please note this article that reads that disabling write back cache seems to fix the problem.

http://sr5tech.com/write_back_cache_experiments.htm

Revision history for this message

Renan Teston Inácio (zerocaronte) wrote on 2009-07-24:

#29

If I understood correctly, disabling write back cache must be done on the device (ie disk). However, the script I pointed in comment 19 tests the filesystem creating it in RAM. However, I also tried mounting with data=ordered instead of data=writeback (default), but it didn't affect the results.

Kernel 2.6.31-rc4 fixes the problem.
Note that 2.6.29-rc1 also fixes it, if you want to check an earlier version.

I also did the same test on a 32 bits virtual machine and got the same results using default kernel for Ubuntu 9.04 Server (2.6.28) and 2.6.31-rc4.

Revision history for this message

Manoj Iyer (manjo) wrote on 2009-09-02:

#30

Renan,

Looks like Jaunty (9.04) has the fix fs: new inode i_state corruption fix, which fixes the issue reported in http://lkml.org/lkml/2008/11/14/121. (SHAID: 7ef0d7377cb287e08f3ae94cebc919448e1f5dff) and noted by you on comment #19. Also, since this works for 2.6.29-rc1 and 2.6.31-rc4, the the karmic kernel should have the bits that should fix this problem. Shall I close this bug as wont fix? and wait for the fix to appear in the next upgrade release?

Revision history for this message

beetlejuicer (crand275) wrote on 2009-09-23:

#31

I have same problem as Renan.

Running 8.10x64 for some months without issue.

Upgraded to 9.04x64, and after a day or two the system became unusable due to filesystem corruption.

Attempting to mount harddrive from live cd produces:

ubuntu@ubuntu:~$ dmesg |tail
[ 187.346997] mount -t ufs -o ufstype=sun|sunx86|44bsd|ufs2|5xbsd|old|hp|nextste p|nextstep-cd|openstep ...
[ 187.346998]
[ 187.346999] >>>WARNING<<< Wrong ufstype may corrupt your filesystem, default is ufstype=old
[ 187.347290] ufs_read_super: bad magic number
[ 187.350311] hfs: can't find a HFS filesystem on dev sda5.
[ 220.243302] mtrr: no MTRR for d0000000,10000000 found
[ 221.794027] lp: driver loaded but no devices found
[ 221.888269] ppdev: user-space parallel port driver
[ 257.041369] EXT3-fs error (device sda5): ext3_check_descriptors: Block bitmap for group 0 not in group (block 1114114)!
[ 257.041677] EXT3-fs: group descriptors corrupted!

running e2fsck -c /dev/sda5

found 10 inodes containing multiply-claimed blocks, all in ~/.mozilla/firefox/???/
lots of unattached inodes
lots of inode ref counts wrong
tons of free block counts wrong
tons of free inodes count wrong & directory count wrong

Sorry I don't have more detailed information, I hope that this helps.

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2011-07-18: Closing unsupported series nomination.

#32

This bug was nominated against a series that is no longer supported, ie jaunty. The bug task representing the jaunty nomination is being closed as Won't Fix.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.