kernel 2.6.26 reports massive filesystem errors on RAID5 device; but on 2.6.24 it is fine

Bug #256316 reported by wateenellende on 2008-08-09
20
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
Expired
Medium
linux (Ubuntu)
High
Unassigned

Bug Description

Binary package hint: linux-image-2.6.26-5-generic

I'm running Hardy on a AMD64 machine with a RAID5 array in it, ext3 formatted. It has been working fine for years, with the exception of Gutsy's kernel that couldn't handle that many IDE devices (bug 157909, fixed in hardy).

To make backups, I bought an external drive and an eSata controller. The controller is not supported in Hardy's kernel (2.6.24), but it is in Intrepid, so I installed linux-image-2.6.26-5-generic from intrepid.

Running this kernel causes several changes:
1) My new controller is recognized
2) All disks are now known as /dev/sd* and not /dev/hd*
3) Copying from /dev/md0 results in many filesystem errors
4) e2fsck reports massive errors on the filesystem on /dev/md0

If I reboot into 2.6.24, all problems are gone and everything is back to normal. This implies that there is a major problem somewhere! I happen to mount the device read-only, but mounting it rw will no doubt result in severe damage to the fs. I accidentally tested this by running e2fsck under 2.6.26, which did introduce real errors that I then had to fix under 2.6.24. Now, under 2.6.24, all is fine and 2.6.26 still reports many errors.

wateenellende (fpbeekhof) wrote :
wateenellende (fpbeekhof) wrote :

AFAIK, assignee is maintainer of the package.

Changed in linux:
assignee: nobody → canonical-kernel-team
description: updated
Changed in linux:
assignee: canonical-kernel-team → nobody
Changed in linux:
status: Unknown → Confirmed
Changed in linux:
status: Confirmed → In Progress

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

No problem. I picked up both the generic & server package, and tried
to install them both, so I could, if needed, compare them.
Installed generic first, all ok. However, it seems on must really make a choice:

$ sudo dpkg -i linux-image-2.6.27-1-server_2.6.27-1.2_amd64.deb
Selecting previously deselected package linux-image-2.6.27-1-server.
(Reading database ... 203428 files and directories currently installed.)
Unpacking linux-image-2.6.27-1-server (from
linux-image-2.6.27-1-server_2.6.27-1.2_amd64.deb) ...
Done.
dpkg: error processing
linux-image-2.6.27-1-server_2.6.27-1.2_amd64.deb (--install):
 trying to overwrite `/lib/firmware/atmsar11.fw', which is also in
package linux-image-2.6.27-1-generic
dpkg-deb: subprocess paste killed by signal (Broken pipe)
Running postrm hook script /usr/sbin/update-grub.
Searching for GRUB installation directory ... found: /boot/grub
Searching for default file ... found: /boot/grub/default
Testing for an existing GRUB menu.lst file ... found: /boot/grub/menu.lst
Searching for splash image ... none found, skipping ...
Found kernel: /boot/vmlinuz
Found kernel: /boot/vmlinuz.old
Found kernel: /boot/vmlinuz-2.6.27-1-generic
Found kernel: /boot/vmlinuz-2.6.26-5-generic
Found kernel: /boot/vmlinuz-2.6.24-19-generic
Found kernel: /boot/memtest86+.bin
Updating /boot/grub/menu.lst ... done

Errors were encountered while processing:
 linux-image-2.6.27-1-server_2.6.27-1.2_amd64.deb

I'll reboot into linux-image-2.6.27-1-generic now, and start testing....

On Fri, Aug 29, 2008 at 2:13 AM, Leann Ogasawara <email address hidden> wrote:
> The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the
> upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would
> appreciate it if you could please test this newer 2.6.27 Ubuntu kernel.
> There are one of two ways you should be able to test:
>
> 1) If you are comfortable installing packages on your own, the linux-
> image-2.6.27-* package is currently available for you to install and
> test.
>
> --or--
>
> 2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer
> 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4.
> Please watch http://www.ubuntu.com/testing for Alpha5 to be announced.
> You should then be able to test via a LiveCD.
>
> Please let us know immediately if this newer 2.6.27 kernel resolves the
> bug reported here or if the issue remains. More importantly, please
> open a new bug report for each new bug/regression introduced by the
> 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please
> specifically note if the issue does or does not appear in the 2.6.26
> kernel. Thanks again, we really appreicate your help and feedback.
>
> ** Tags added: cft-2.6.27
>
> --
> kernel 2.6.26 reports massive filesystem errors on RAID5 device; but on 2.6.24 it is fine
> https://bugs.launchpad.net/bugs/256316
> You received this bug notification because you are a direct subscriber
> of the bug.
>

wateenellende (fpbeekhof) wrote :

Well, no luck.

$ uname -a
Linux DeathStar 2.6.27-1-generic #1 SMP Sat Aug 23 23:19:01 UTC 2008
x86_64 GNU/Linux
$ cat */*.avi > /dev/null
cat: Amadeus (DVDivX)/Amadeus.The_Directors_Cut_(1984).AC3.CD1.MoonAge.ShareReacto.avi:
Input/output error
cat: Auberge Espagnol/AubergeEspagnol-CD2.avi: Input/output error
... from here on I stopped the test.

From syslog:
Aug 29 09:02:00 DeathStar kernel: [ 1207.904014] attempt to access
beyond end of device
Aug 29 09:02:00 DeathStar kernel: [ 1207.904014] md0: rw=0,
want=7771191736, limit=2929692160
Aug 29 09:02:00 DeathStar kernel: [ 1207.910440] attempt to access
beyond end of device
Aug 29 09:02:00 DeathStar kernel: [ 1207.911275] md0: rw=0,
want=7771191736, limit=2929692160
Aug 29 09:03:31 DeathStar kernel: [ 1299.012020] attempt to access
beyond end of device
Aug 29 09:03:31 DeathStar kernel: [ 1299.012020] md0: rw=0,
want=11222412024, limit=2929692160
Aug 29 09:03:31 DeathStar kernel: [ 1299.022440] attempt to access
beyond end of device
Aug 29 09:03:31 DeathStar kernel: [ 1299.023546] md0: rw=0,
want=11222412024, limit=2929692160

As I posted in the corresponding kernel bugzilla, my suspicion is that
the hpt374 driver in libata is returning bogus data when reading from
disk. This includes bogus file-system metadata, which in turn induces
the kernel to start reading from places on disk that do not exist -
which causes the error that we observe. This theory hasn't exactly
been proven, it's just my best guess.

wateenellende (fpbeekhof) wrote :

I just opened a separate bug report for the failed installation of linux-image-2.6.27-1-server: bug 262783
It's tagged "linux-2.6.27" .

Changed in linux:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → High
status: New → Triaged
wateenellende (fpbeekhof) wrote :

In the changelog for 2.6.27-rc7 is the following text. If there's a new kernel to try, let me know.

commit 62ff2ecf7a4e69f7271b7f7a57aaee76ffe610f2
Author: Masoud Sharbiani <email address hidden>
Date: Wed Sep 10 22:22:34 2008 +0200

    ide: Fix pointer arithmetic in hpt3xx driver code (3rd try)

    git commit 74811f355f4f69a187fa74892dcf2a684b84ce99 causes crash at
    module load (or boot) time on my machine with a hpt374 controller.
    The reason for this is that for initializing second controller which sets
    (hwif->dev == host->dev[1]) to true (1), adds 1 to a void ptr, which
    advances it by one byte instead of advancing it by sizeof(hpt_info) bytes.
    Because of this, all initialization functions get corrupted data in info
    variable which causes a crash at boot time.

    This patch fixes that and makes my machine boot again.

    The card itself is a HPT374 raid conroller: Here is the lspci -v output:
    03:06.0 RAID bus controller: HighPoint Technologies, Inc. HPT374 (rev
    07)
            Subsystem: HighPoint Technologies, Inc. Unknown device 0001
            Flags: bus master, 66MHz, medium devsel, latency 120, IRQ 28
            I/O ports at 8000 [size=8]
            I/O ports at 7800 [size=4]
            I/O ports at 7400 [size=8]
            I/O ports at 7000 [size=4]
            I/O ports at 6800 [size=256]
            Expansion ROM at fe8e0000 [disabled] [size=128K]
            Capabilities: [60] Power Management version 2

    03:06.1 RAID bus controller: HighPoint Technologies, Inc. HPT374 (rev
    07)
            Subsystem: HighPoint Technologies, Inc. Unknown device 0001
            Flags: bus master, 66MHz, medium devsel, latency 120, IRQ 28
            I/O ports at 9800 [size=8]
            I/O ports at 9400 [size=4]
            I/O ports at 9000 [size=8]
            I/O ports at 8800 [size=4]
            I/O ports at 8400 [size=256]
            Capabilities: [60] Power Management version 2

    Signed-off-by: Masoud Sharbiani <email address hidden>
    Cc: Sergei Shtylyov <email address hidden>
    Cc: Andrew Morton <email address hidden>
    [bart: use dev_get_drvdata() per Sergei's suggestion]
    Signed-off-by: Bartlomiej Zolnierkiewicz <email address hidden>

We are currently at linux version 2.6.27.2 so your fix should be included. Tell us if it works.

wateenellende (fpbeekhof) wrote :

That fix is most likely for the legacy ide driver. There hasn't been
activity in the kernel bugzilla to indicate that there is a fix. I've
sent an email, if they say that the correct driver has indeed been
patched I'll give it a spin.

Philipp Dreimann wrote:
> We are currently at linux version 2.6.27.2 so your fix should be
> included. Tell us if it works.
>

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

This bug report was marked as Triaged a while ago but has not had any updated comments for quite some time. Please let us know if this issue remains in the current Ubuntu release, http://www.ubuntu.com/getubuntu/download . If the issue remains, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-triage
Changed in linux (Ubuntu):
status: Triaged → Incomplete
Fail2Ban (failtoban) on 2010-02-20
tags: added: kernel-bug
Mitch Towner (kermiac) wrote :

@ Fail2Ban: please do not assign tags without first reading https://wiki.ubuntu.com/Bugs/Tags
Thanks in advance!

tags: removed: kernel-bug
tags: added: cherry-pick kernel-fs
Changed in linux:
importance: Unknown → Medium
Ralf-Peter (rohbeck) wrote :

I had a similar problem with the HighPoint 1640 4-port SATA RAID board, see http://ubuntuforums.org/showpost.php?p=10400695&postcount=18 and following for details.
Building and installing the hpt374 driver from the HighPoint web site fixed it.
Recently I installed Debian squeeze and the bug did not show. Squeeze uses the hpt366 for the board.

Changed in linux:
status: In Progress → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.