I have a similar problem. I don't know if it's exactly the same thing. Here's my scenario.
I had an AMD Athlon 64 3700 @ 2.2GHz motherboard. 4 GB memory, running Debian 2.6.32 kernel. Hooked up to the motherboard were two sata drives (sda and sdb). Each drive had two partitions (sda1, sda2, sdb1, sdb2). I ran raid as follows (md0 is swap, and md1 is the root partition):
md1 : active raid1 sdb2[0] sda2[1]
240195776 blocks [2/2] [UU]
unused devices: <none>
Everything worked fine for years. A few weeks ago, I decided to get a new motherboard, with an Intel i7 950 @ 3.07 GHz and 12 GB memory. An easy drop-in replacement, right? As soon as I started the computer, everything worked. A few minutes later, I get these errors:
Mar 12 10:57:32 marc kernel: [ 2818.975238] EXT3-fs error (device md1): ext3_readdir: bad entry in directory #21532139: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0
Mar 12 10:57:32 marc kernel: [ 2818.975244] Aborting journal on device md1.
Mar 12 10:57:32 marc kernel: [ 2818.977021] ext3_abort called.
Mar 12 10:57:32 marc kernel: [ 2818.977023] EXT3-fs error (device md1): ext3_journal_start_sb: Detected aborted journal
Mar 12 10:57:32 marc kernel: [ 2818.977025] Remounting filesystem read-only
Mar 12 10:57:32 marc kernel: [ 2819.002578] Remounting filesystem read-only
Of course, this would require a reboot, and an fsck. Only to happen again a few minutes later. This is what I did to further troubleshoot:
1. It's not the memory. I ran memtest86+ for days at a time with no errors. I replaced the memory with 4 GB from a different manufacturer. Problems continued.
2. The old processor was single-core, non-hyperthreading. The new processor is quad-core, hyperthreading. I went into the BIOS and turned off hyperthreading and multi-core. Problems continued.
3. It's probably not the hard drives. I have never had a hardware errors, and they were working fine two weeks ago with the old motherboard.
4. I have forced a resync of the raid array twice. Once by removing and re-adding sda2. Another time by removing and re-adding sdb2. Problems continued.
5. I did not change the kernel when I changed the motherboard.
6. At this point it might be a linux software raid issue. I installed a new hard drive with a single ext3 partition (sdc1). I copied the contents of the raid array to the new drive (cp -avx / /mnt). Rebooted the computer from sdc. Now sdc1 is my root partition. I have not yet had any errors with the root partition on sdc1. If I mount the raid partition (md1) and start using it, then I will get ext3 errors on it almost immediately.
Hello,
I have a similar problem. I don't know if it's exactly the same thing. Here's my scenario.
I had an AMD Athlon 64 3700 @ 2.2GHz motherboard. 4 GB memory, running Debian 2.6.32 kernel. Hooked up to the motherboard were two sata drives (sda and sdb). Each drive had two partitions (sda1, sda2, sdb1, sdb2). I ran raid as follows (md0 is swap, and md1 is the root partition):
# cat /proc/mdstat
Personalities : [raid1]
md0 : active (auto-read-only) raid1 sdb1[0] sda1[1]
4000064 blocks [2/2] [UU]
md1 : active raid1 sdb2[0] sda2[1]
240195776 blocks [2/2] [UU]
unused devices: <none>
Everything worked fine for years. A few weeks ago, I decided to get a new motherboard, with an Intel i7 950 @ 3.07 GHz and 12 GB memory. An easy drop-in replacement, right? As soon as I started the computer, everything worked. A few minutes later, I get these errors:
Mar 12 10:57:32 marc kernel: [ 2818.975238] EXT3-fs error (device md1): ext3_readdir: bad entry in directory #21532139: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0 start_sb: Detected aborted journal
Mar 12 10:57:32 marc kernel: [ 2818.975244] Aborting journal on device md1.
Mar 12 10:57:32 marc kernel: [ 2818.977021] ext3_abort called.
Mar 12 10:57:32 marc kernel: [ 2818.977023] EXT3-fs error (device md1): ext3_journal_
Mar 12 10:57:32 marc kernel: [ 2818.977025] Remounting filesystem read-only
Mar 12 10:57:32 marc kernel: [ 2819.002578] Remounting filesystem read-only
Of course, this would require a reboot, and an fsck. Only to happen again a few minutes later. This is what I did to further troubleshoot:
1. It's not the memory. I ran memtest86+ for days at a time with no errors. I replaced the memory with 4 GB from a different manufacturer. Problems continued.
2. The old processor was single-core, non-hyperthreading. The new processor is quad-core, hyperthreading. I went into the BIOS and turned off hyperthreading and multi-core. Problems continued.
3. It's probably not the hard drives. I have never had a hardware errors, and they were working fine two weeks ago with the old motherboard.
4. I have forced a resync of the raid array twice. Once by removing and re-adding sda2. Another time by removing and re-adding sdb2. Problems continued.
5. I did not change the kernel when I changed the motherboard.
6. At this point it might be a linux software raid issue. I installed a new hard drive with a single ext3 partition (sdc1). I copied the contents of the raid array to the new drive (cp -avx / /mnt). Rebooted the computer from sdc. Now sdc1 is my root partition. I have not yet had any errors with the root partition on sdc1. If I mount the raid partition (md1) and start using it, then I will get ext3 errors on it almost immediately.
7. I run Debian, this is an Ubuntu forum. I know.
Let me know if you need more info from me.
Marc