Bug #208551 “mdadm with Raid5 stuck in uninterruptable sleep” : Bugs : linux package : Ubuntu

Revision history for this message

DesktopMan (christian-auby) wrote on 2008-03-28:

#1

Sigh. Spoke too soon. Ran mdadm -D while it was beeing copied to, hanged again. 2TB transferred. Guess it's directly related to the number of processes that access to the device. Won't be able to restart it until tomorrow, but I can try any suggestions on the hanged system.

Revision history for this message

DesktopMan (christian-auby) wrote on 2008-03-29:

#2

mdadm -D returned after a couple of minutes, at which point it started writing again. During the period it was running nothing was written.

Revision history for this message

DesktopMan (christian-auby) wrote on 2008-03-30:

#3

Was copying from a file set up with losetup + cryptsetup on a raid5 array (the one above) to a raid6 array, all of which with XFS. During this copy I ran mdadm --examine --scan, and the raid5 crashed (the one I was reading from), giving me input/output errors. md device is fine on the other hand, and remounting (the read only) filesystem was no problem. dmesg output:

[131405.242868] xfs_force_shutdown(dm-1,0x1) called from line 420 of file /build /buildd/linux-2.6.24/fs/xfs/xfs_rw.c. Return address = 0xffffffff883cdf59
[131405.242892] Filesystem "dm-1": I/O Error Detected. Shutting down filesystem : dm-1
[131405.242932] Please umount the filesystem, and rectify the problem(s)
[131405.242958] xfs_force_shutdown(dm-1,0x1) called from line 420 of file /build /buildd/linux-2.6.24/fs/xfs/xfs_rw.c. Return address = 0xffffffff883cdf59

Not sure if it's related to the first post or not. Any input would be appreciated.

Revision history for this message

Twigathy (twigathy) wrote on 2008-05-06:

#4

I'm not certain if I'm having the same trouble as you, but mdadm fell over pretty hard for me on 2.6.24-16-server, mdadm - v2.6.3 - 20th August 2007 when expanding 5x500GB -> 6x500GB. I lost all the data on the raid (oops).

Possibly this is a bug in sata_sil with lots of disk writes? 5 of the 6 disks were on siI 3512 based SATA cards (The other was an onboard mobo SATA port). Similarly, I can write to the disks individually fine, they check out okay with badblocks and smartctl.

Did you get any weirdness in dmesg? I had a couple of odd things about the SATA link going down... so possibly unrelated.

Revision history for this message

DesktopMan (christian-auby) wrote on 2008-05-07:

#5

Not sure if it is related, might be. I honestly gave up on it after concluding that the problem was too erratic and virtually impossible for me to debug. If I remember correctly I also got messages about the SATA link going down, then reset and back up.

Revision history for this message

Twigathy (twigathy) wrote on 2008-05-07:

#6

Hm, so what did you do instead? Buy new controller cards or give up on raid? ;)

Revision history for this message

Twigathy (twigathy) wrote on 2008-05-07:

#7

Hi,

I googled a little further; looks like this is a bug in sata_siI after all

Check out http://www.ussg.iu.edu/hypermail/linux/kernel/0707.1/0024.html

Doesn't seem to be a fix for it! This isn't too good for me - I have 3 of these cards :-(

Revision history for this message

Carl Streeter (carl-linux) wrote on 2008-06-16:

#8

raid_patch.txt Edit (1.7 KiB, text/plain)

I'm having the same issue pointed to in the thread mentioned above:
http://www.issociate.de/board/post/471929/2.6.24-rc6_reproducible_raid5_hang.html

It seems that this was fixed in kernel version 2.6.25. Would it be possible to backport this to ubuntu kernels? It's basically impossible to use XFS on SW raid without it:
http://marc.info/?l=linux-kernel&m=120027546428622&w=2

At least, it's impossible when dealing with multi terabyte raid5 arrays, which I don't think are particularly uncommon at this point.

Revision history for this message

Andrew Cholakian (andrew-cholakian) wrote on 2008-08-07:

#9

I managed to patch the stock ubuntu kernel (2.6.24-18) with the patches I found on the second link on the above post on the LKML. Seems stable, I've been running it in production on two large raid5 arrays without issue. The patches didn't apply perfectly but they do work.

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2008-08-19:

#10

Hi Guys,

I just wanted to let you know the latest Alpha for the upcoming Intrepid Ibex 8.10 is available. The kernel for Intrepid is based on a 2.6.26 kernel at the moment. This 2.6.26 kernel has the patch which was referenced to have fixed this issue in 2.6.25. For more information regarding the latest Alpha for Intrepid refer to - http://www.ubuntu.com/testing. If anyone would be willing to test and confirm this is fixed with the Intrepid kernel that would be great. But based on the patch existing in the Intrepid kernel and the comment made from Andrew that this patch resolves the issue for him I'm tentatively marking this "Fix Released" against Intrepid.

I'll additionally open a Hardy SRU nomination but it is really a decision to be made by the kernel team if this fix will be backported. I've included below the upstream git commit id and patch description for the kernel team to reference. Thanks.

commit 6ed3003c19a96fe18edf8179c4be6fe14abbebbc
Author: NeilBrown <email address hidden>
Date: Wed Feb 6 01:40:00 2008 -0800

md: fix an occasional deadlock in raid5

Changed in linux:
status:	New → Fix Released
assignee:	nobody → ubuntu-kernel-team
importance:	Undecided → Medium
status:	New → Triaged

Colin Ian King (colin-king) on 2008-08-20

Changed in linux:
assignee:	ubuntu-kernel-team → colin-king
status:	Triaged → In Progress

Revision history for this message

Colin Ian King (colin-king) wrote on 2008-08-21:

#11

Hi,

I've applied commit 6ed3003c19a96fe18edf8179c4be6fe14abbebbc and built for testing linux - 2.6.24-20.39cking4 package - you can download the package from my PPA at: https://launchpad.net/~colin-king/+archive

Please can you test this fix and let me know if it works so that we can add it to the next release of Hardy.

To test, add the following lines to your apt sources.list:

deb http://ppa.launchpad.net/colin-king/ubuntu hardy main
deb-src http://ppa.launchpad.net/colin-king/ubuntu hardy main

alternatively, follow the instructions at: https://help.ubuntu.com/8.04/add-applications/C/extra-repositories-adding.html

Thanks, Colin

Revision history for this message

DesktopMan (christian-auby) wrote on 2008-08-21: Re: [Bug 208551] Re: mdadm, Raid5 and XFS stuck in uninterruptable sleep

#12

I had to swap to Debian as this bug made the server useless. I haven't
had any deadlocks here yet, but it might still apply for all I know.

Or is Debian using different code? I'm running testing, on 2.6.25-2

I am happy someone eventually identified the cause though.

Christian

Colin King wrote:
> Hi,
>
> I've applied commit 6ed3003c19a96fe18edf8179c4be6fe14abbebbc and built
> for testing linux - 2.6.24-20.39cking4 package - you can download the
> package from my PPA at: https://launchpad.net/~colin-king/+archive
>
> Please can you test this fix and let me know if it works so that we can
> add it to the next release of Hardy.
>
> To test, add the following lines to your apt sources.list:
>
> deb http://ppa.launchpad.net/colin-king/ubuntu hardy main
> deb-src http://ppa.launchpad.net/colin-king/ubuntu hardy main
>
> alternatively, follow the instructions at: https://help.ubuntu.com/8.04
> /add-applications/C/extra-repositories-adding.html
>
> Thanks, Colin
>

Revision history for this message

Colin Ian King (colin-king) wrote on 2008-08-28: Re: mdadm, Raid5 and XFS stuck in uninterruptable sleep

#13

Since we don't have DesktopMan now to test this fix, marking it as "Won't Fix", unless anyone has the same hardware and is willing to test this for Hardy.

Changed in linux:
status:	In Progress → Won't Fix

Revision history for this message

Andrew Cholakian (andrew-cholakian) wrote on 2008-08-28:

#14

I used to have this bug, I'd be willing to test out your kernel. Right now I'm using my own kernel w/ patches applied. Would that work Colin? Have you done any testing yourself?

I'm running a 6x1TB raid5 array with XFS on top on a Dell Poweredge 1800 (An older 64 bit xeon).

Revision history for this message

Colin Ian King (colin-king) wrote on 2008-09-02:

#15

Hi Andrew,

If you can try out the my kernel in the PPA just to verify this kernel with the single patch to fix this bug it would give us a clear indication that this issue is fixed against the current Hardy kernel sources. This allows us to the OK it for inclusion into the Hardy kernel for the next point release.

Much appreciated if you could test this.

Colin

Revision history for this message

Andrew Cholakian (andrew-cholakian) wrote on 2008-09-10:

#16

I've tested Colin's patch and it's live on 2 production 64bit servers. Seems to work just fine.

Revision history for this message

Colin Ian King (colin-king) wrote on 2008-09-11:

#17

SRU justification:

Impact: mdadm, Raid5 get stuck in uninterruptable sleep under heavy I/O
load. Copying data to a Raid 5 XFS partition results in a permanent lock
on several processes related to it, getting stuck in the D(+) state.
Occurs when large quantities of data (10-40 GB) is copied, resulting in
processes being unkillable, and the system cannot reboot and requires
power cycling the server.

Fix: The patch from commit 6ed3003c19a96fe18edf8179c4be6fe14abbebbc. The
fix is to not make any generic_make_request() calls in raid5
make_request until all waiting has been done. We do this by simply
setting STRIPE_HANDLE instead of calling handle_stripe(). This causes a
performance hit, so this patch also only calls raid5_activate_delayed()
at unplug time, never in raid5. This seems to bring back the
performance numbers. [quoting the commit message]

Testing: Without the patch, Raid 5 using md on an XFS filesystem locks
up under heavy data copying - this is repeatable. With the patch, the
lock up does not occur.

Patch tested from my PPA build by Andrew Cholakian (see previous message)

Changed in linux:
milestone:	none → ubuntu-8.04.2
status:	Won't Fix → Fix Committed

Revision history for this message

Martin Pitt (pitti) wrote on 2008-10-14:

#18

linux 2.6.24-21 copied to hardy-updates.

Changed in linux:
status:	Fix Committed → Fix Released

Revision history for this message

kpolberg (kpolberg) wrote on 2008-10-20:

#19

I am still having deadlocks, only thing that will fix it is setting the stripe_cache_size on the md device higher.

echo 16384 > /sys/block/md0/md/stripe_cache_size

Linux sarah 2.6.24-21-generic #1 SMP Mon Aug 25 16:57:51 UTC 2008 x86_64 GNU/Linux

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sde1[0] sdc1[7] sdd1[6] sdf1[5] sdg1[4] sdb1[3] sda1[2] sdh1[1]
3418686208 blocks level 5, 256k chunk, algorithm 2 [8/8] [UUUUUUUU]
[========>............] resync = 44.4% (216937632/488383744) finish=84.5min speed=53527K/sec

unused devices: <none>

root@sarah:~# xfs_info /dev/md0
meta-data=/dev/md0 isize=256 agcount=75, agsize=11446528 blks
         = sectsz=4096 attr=1
data = bsize=4096 blocks=854671552, imaxpct=25
         = sunit=64 swidth=192 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=2
         = sectsz=4096 sunit=1 blks, lazy-count=0
realtime =none extsz=786432 blocks=0, rtextents=0

If you need some more information, please ask.

Revision history for this message

Martin Pitt (pitti) wrote on 2008-11-14:

#20

New SRU fixes it harder apparently.

Changed in linux:
status:	Fix Released → In Progress

Revision history for this message

Martin Pitt (pitti) wrote on 2008-11-14:

#21

Accepted into intrepid-proposed, please test and give feedback here. Please see https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message

Martin Pitt (pitti) wrote on 2008-11-14:

#24

Accepted into hardy-proposed, please test and give feedback here. Please see https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in linux:
milestone:	ubuntu-8.04.2 → none
status:	In Progress → Fix Committed

Revision history for this message

Martin Pitt (pitti) wrote on 2008-11-27:

#25

Accepted linux into hardy-proposed, please test and give feedback here. Please see https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Revision history for this message

Tony (tonybaca) wrote on 2008-12-03:

#26

Martin,

Hope this doesn't repeat.

If you’re willing to help a newbe, I am willing to test this on my fresh intrepid install. I have a very similar system as described above showing the same kind of problems. I posted my problems in another bug (147464) but after reading this bug, I think this is closer to what I am seeing.

I have a ferash install of 8.10 (mythbuntu). I was able to stop this problem, but only if I set rsize=8092 in fstab. This killed throughput! I finally reformatted the array to ext3 and that fixed the problem. Right now I have been testing the array with JFS and so far have not had the system lock up.

I followed the instruction to enable proposed, but I don’t know what I need to update to test this fix. I am willing to any testing you need, my system is not a production system and there is no important data on the machine.

Tony

Revision history for this message

Martin Pitt (pitti) wrote on 2008-12-04: Re: [Bug 208551] Re: mdadm with Raid5 stuck in uninterruptable sleep

#27

Tony [2008-12-03 23:36 -0000]:
> I followed the instruction to enable proposed, but I don’t know what I
> need to update to test this fix.

A normal system upgrade should pull in the new 2.6.27-10 kernel. I. e.
you shold get a couple of linux-image, linux-restricted-modules
packages with 2.6.27-10.20 version.

Revision history for this message

Tony (tonybaca) wrote on 2008-12-06:

#28

Download full text (3.7 KiB)

I upgraded to 2.6.27-10. There where otehr upgrade that occured at that time too. I reformated my array to XFS. I tried to copy a large amount of data. It failed in the same mannor. After reboot, the array is rebuilding, but I found this inthe log:

Dec 6 10:44:54 Server kernel: [44205.953002] Call Trace:
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff802ac083>] ? find_get_pages+0x43/0x110
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff802b6c74>] ? pagevec_lookup+0x24/0x30
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffffa0d9302d>] ? xfs_cluster_write+0xad/0x180 [xfs]
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffffa0d93598>] ? xfs_page_state_convert+0x498/0x760 [xfs]
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffffa0d939c1>] ? xfs_vm_writepage+0x71/0x120 [xfs]
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff802b9554>] ? pageout+0x124/0x280
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff802ab1da>] ? page_waitqueue+0xa/0x90
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff802b9b5d>] ? shrink_page_list+0x34d/0x530
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff802b9ee2>] ? shrink_inactive_list+0x1a2/0x4b0
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff802ba26b>] ? shrink_zone+0x7b/0x160
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff802ba3dd>] ? shrink_zones+0x8d/0x150
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff802ba526>] ? do_try_to_free_pages+0x86/0x2e0
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff802ba877>] ? try_to_free_pages+0x67/0x70
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff802b9380>] ? isolate_pages_global+0x0/0x50
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff802b2b49>] ? __alloc_pages_internal+0x239/0x520
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff802d5c6d>] ? alloc_pages_current+0xad/0x110
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff802ac617>] ? __page_cache_alloc+0x67/0x80
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff802ad253>] ? __grab_cache_page+0x63/0xb0
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff803171a9>] ? block_write_begin+0x89/0xf0
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffffa0d9248a>] ? xfs_vm_write_begin+0x2a/0x30 [xfs]
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffffa0d92050>] ? xfs_get_blocks+0x0/0x20 [xfs]
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff802ab93c>] ? generic_perform_write+0xbc/0x1c0
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff802ad6a2>] ? generic_file_buffered_write+0x92/0x170
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffffa0d9b2f3>] ? xfs_write+0x6b3/0x9b0 [xfs]
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffffa0d96ca8>] ? xfs_file_aio_write+0x58/0x60 [xfs]
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff802e9b79>] ? do_sync_write+0xf9/0x140
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff80267050>] ? autoremove_wake_function+0x0/0x40
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff80387071>] ? aa_file_permission+0x21/0xf0
Dec 6 10:44:54 Server kernel: [44205.953002] [<ffffffff80387198>] ? apparmor_file_...

I upgraded to 2.6.27-10.  There where otehr upgrade that occured at that time too.  I reformated my array to XFS.  I tried to copy a large amount of data.  It failed in the same mannor.  After reboot, the array is rebuilding, but I found this inthe log:

Dec  6 10:44:54 Server kernel: [44205.953002] Call Trace:
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802ac083>] ? find_get_pages+0x43/0x110
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802b6c74>] ? pagevec_lookup+0x24/0x30
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffffa0d9302d>] ? xfs_cluster_write+0xad/0x180 [xfs]
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffffa0d93598>] ? xfs_page_state_convert+0x498/0x760 [xfs]
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffffa0d939c1>] ? xfs_vm_writepage+0x71/0x120 [xfs]
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802b9554>] ? pageout+0x124/0x280
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802ab1da>] ? page_waitqueue+0xa/0x90
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802b9b5d>] ? shrink_page_list+0x34d/0x530
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802b9ee2>] ? shrink_inactive_list+0x1a2/0x4b0
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802ba26b>] ? shrink_zone+0x7b/0x160
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802ba3dd>] ? shrink_zones+0x8d/0x150
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802ba526>] ? do_try_to_free_pages+0x86/0x2e0
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802ba877>] ? try_to_free_pages+0x67/0x70
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802b9380>] ? isolate_pages_global+0x0/0x50
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802b2b49>] ? __alloc_pages_internal+0x239/0x520
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802d5c6d>] ? alloc_pages_current+0xad/0x110
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802ac617>] ? __page_cache_alloc+0x67/0x80
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802ad253>] ? __grab_cache_page+0x63/0xb0
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff803171a9>] ? block_write_begin+0x89/0xf0
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffffa0d9248a>] ? xfs_vm_write_begin+0x2a/0x30 [xfs]
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffffa0d92050>] ? xfs_get_blocks+0x0/0x20 [xfs]
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802ab93c>] ? generic_perform_write+0xbc/0x1c0
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802ad6a2>] ? generic_file_buffered_write+0x92/0x170
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffffa0d9b2f3>] ? xfs_write+0x6b3/0x9b0 [xfs]
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffffa0d96ca8>] ? xfs_file_aio_write+0x58/0x60 [xfs]
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802e9b79>] ? do_sync_write+0xf9/0x140
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff80267050>] ? autoremove_wake_function+0x0/0x40
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff80387071>] ? aa_file_permission+0x21/0xf0
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff80387198>] ? apparmor_file_permission+0x28/0x30
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff80361c46>] ? security_file_permission+0x16/0x20
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802ea23b>] ? vfs_write+0xcb/0x130
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff802ea395>] ? sys_write+0x55/0x90
Dec  6 10:44:54 Server kernel: [44205.953002]  [<ffffffff8021285a>] ? system_call_fastpath+0x16/0x1b
Dec  6 10:44:54 Server kernel: [44205.953002]
Dec  6 11:00:09 Server syslogd 1.5.0#2ubuntu6: restart.
Dec  6 11:00:09 Server kernel: Inspecting /boot/System.map-2.6.27-10-generic

Revision history for this message

Launchpad Janitor (janitor) wrote on 2009-01-08:

#29

Download full text (12.0 KiB)

This bug was fixed in the package linux - 2.6.24-23.46

---------------
linux (2.6.24-23.46) hardy-proposed; urgency=low

[Alessio Igor Bogani]

* rt: Updated PREEMPT_RT support to rt21
- LP: #302138

[Amit Kucheria]

* SAUCE: Update lpia patches from moblin tree
- LP: #291457

[Andy Whitcroft]

* SAUCE: replace gfs2_bitfit with upstream version to prevent oops
- LP: #276641

[Colin Ian King]

  * isdn: Do not validate ISDN net device address prior to interface-up
    - LP: #237306
  * hwmon: (coretemp) Add Penryn CPU to coretemp
    - LP: #235119
  * USB: add support for Motorola ROKR Z6 cellphone in mass storage mode
    - LP: #263217
  * md: fix an occasional deadlock in raid5
    - LP: #208551

[Stefan Bader]

  * SAUCE: buildenv: Show CVE entries in printchanges
  * SAUCE: buildenv: Send git-ubuntu-log informational message to stderr
  * Xen: dma: avoid unnecessarily SWIOTLB bounce buffering
    - LP: #247148
  * Update openvz patchset to apply to latest stable tree.
    - LP: #301634
  * XEN: Fix FTBS with stable updates
    - LP: #301634

[Steve Conklin]

* Add HID quirk for dual USB gamepad
- LP: #140608

[Tim Gardner]

  * Enable CONFIG_AX25_DAMA_SLAVE=y
    - LP: #257684
  * SAUCE: Correctly blacklist Thinkpad r40e in ACPI
    - LP: #278794
  * SAUCE: ALPS touchpad for Dell Latitude E6500/E6400
    - LP: #270643

[Upstream Kernel Changes]

  * Revert "[Bluetooth] Eliminate checks for impossible conditions in IRQ
    handler"
    - LP: #217659
  * KVM: VMX: Clear CR4.VMXE in hardware_disable
    - LP: #268981
  * iov_iter_advance() fix
    - LP: #231746
  * Fix off-by-one error in iov_iter_advance()
    - LP: #231746
  * USB: serial: ch341: New VID/PID for CH341 USB-serial
    - LP: #272485
  * x86: Fix 32-bit x86 MSI-X allocation leakage
    - LP: #273103
  * b43legacy: Fix failure in rate-adjustment mechanism
    - LP: #273143
  * x86: Reserve FIRST_DEVICE_VECTOR in used_vectors bitmap.
    - LP: #276334
  * openvz: merge missed fixes from vanilla 2.6.24 openvz branch
    - LP: #298059
  * openvz: some autofs related fixes
    - LP: #298059
  * openvz: fix ve stop deadlock after nfs connect
    - LP: #298059
  * openvz: fix netlink and rtnl inside container
    - LP: #298059
  * openvz: fix wrong size of ub0_percpu
    - LP: #298059
  * openvz: fix OOPS while stopping VE started before binfmt_misc.ko loaded
    - LP: #298059
  * x86-64: Fix "bytes left to copy" return value for copy_from_user()
  * NET: Fix race in dev_close(). (Bug 9750)
    - LP: #301608
  * IPV6: Fix IPsec datagram fragmentation
    - LP: #301608
  * IPV6: dst_entry leak in ip4ip6_err.
    - LP: #301608
  * IPV4: Remove IP_TOS setting privilege checks.
    - LP: #301608
  * IPCONFIG: The kernel gets no IP from some DHCP servers
    - LP: #301608
  * IPCOMP: Disable BH on output when using shared tfm
    - LP: #301608
  * IRQ_NOPROBE helper functions
    - LP: #301608
  * MIPS: Mark all but i8259 interrupts as no-probe.
    - LP: #301608
  * ub: fix up the conversion to sg_init_table()
    - LP: #301608
  * x86: adjust enable_NMI_through_LVT0()
    - LP: #301608
  * SCSI ips: handle scsi_add_host() failure, and other err cl...

This bug was fixed in the package linux - 2.6.24-23.46

---------------
linux (2.6.24-23.46) hardy-proposed; urgency=low

[Alessio Igor Bogani]

* rt: Updated PREEMPT_RT support to rt21
    - LP: #302138

[Amit Kucheria]

* SAUCE: Update lpia patches from moblin tree
    - LP: #291457

[Andy Whitcroft]

* SAUCE: replace gfs2_bitfit with upstream version to prevent oops
    - LP: #276641

[Colin Ian King]

* isdn: Do not validate ISDN net device address prior to interface-up
    - LP: #237306
  * hwmon: (coretemp) Add Penryn CPU to coretemp
    - LP: #235119
  * USB: add support for Motorola ROKR Z6 cellphone in mass storage mode
    - LP: #263217
  * md: fix an occasional deadlock in raid5
    - LP: #208551

[Stefan Bader]

* SAUCE: buildenv: Show CVE entries in printchanges
  * SAUCE: buildenv: Send git-ubuntu-log informational message to stderr
  * Xen: dma: avoid unnecessarily SWIOTLB bounce buffering
    - LP: #247148
  * Update openvz patchset to apply to latest stable tree.
    - LP: #301634
  * XEN: Fix FTBS with stable updates
    - LP: #301634

[Steve Conklin]

* Add HID quirk for dual USB gamepad
    - LP: #140608

[Tim Gardner]

* Enable CONFIG_AX25_DAMA_SLAVE=y
    - LP: #257684
  * SAUCE: Correctly blacklist Thinkpad r40e in ACPI
    - LP: #278794
  * SAUCE: ALPS touchpad for Dell Latitude E6500/E6400
    - LP: #270643

[Upstream Kernel Changes]

* Revert "[Bluetooth] Eliminate checks for impossible conditions in IRQ
    handler"
    - LP: #217659
  * KVM: VMX: Clear CR4.VMXE in hardware_disable
    - LP: #268981
  * iov_iter_advance() fix
    - LP: #231746
  * Fix off-by-one error in iov_iter_advance()
    - LP: #231746
  * USB: serial: ch341: New VID/PID for CH341 USB-serial
    - LP: #272485
  * x86: Fix 32-bit x86 MSI-X allocation leakage
    - LP: #273103
  * b43legacy: Fix failure in rate-adjustment mechanism
    - LP: #273143
  * x86: Reserve FIRST_DEVICE_VECTOR in used_vectors bitmap.
    - LP: #276334
  * openvz: merge missed fixes from vanilla 2.6.24 openvz branch
    - LP: #298059
  * openvz: some autofs related fixes
    - LP: #298059
  * openvz: fix ve stop deadlock after nfs connect
    - LP: #298059
  * openvz: fix netlink and rtnl inside container
    - LP: #298059
  * openvz: fix wrong size of ub0_percpu
    - LP: #298059
  * openvz: fix OOPS while stopping VE started before binfmt_misc.ko loaded
    - LP: #298059
  * x86-64: Fix "bytes left to copy" return value for copy_from_user()
  * NET: Fix race in dev_close(). (Bug 9750)
    - LP: #301608
  * IPV6: Fix IPsec datagram fragmentation
    - LP: #301608
  * IPV6: dst_entry leak in ip4ip6_err.
    - LP: #301608
  * IPV4: Remove IP_TOS setting privilege checks.
    - LP: #301608
  * IPCONFIG: The kernel gets no IP from some DHCP servers
    - LP: #301608
  * IPCOMP: Disable BH on output when using shared tfm
    - LP: #301608
  * IRQ_NOPROBE helper functions
    - LP: #301608
  * MIPS: Mark all but i8259 interrupts as no-probe.
    - LP: #301608
  * ub: fix up the conversion to sg_init_table()
    - LP: #301608
  * x86: adjust enable_NMI_through_LVT0()
    - LP: #301608
  * SCSI ips: handle scsi_add_host() failure, and other err cleanups
    - LP: #301608
  * CRYPTO xcbc: Fix crash with IPsec
    - LP: #301608
  * CRYPTO xts: Use proper alignment
    - LP: #301608
  * SCSI ips: fix data buffer accessors conversion bug
    - LP: #301608
  * SCSI aic94xx: fix REQ_TASK_ABORT and REQ_DEVICE_RESET
    - LP: #301608
  * x86: replace LOCK_PREFIX in futex.h
    - LP: #301608
  * ARM pxa: fix clock lookup to find specific device clocks
    - LP: #301608
  * futex: fix init order
    - LP: #301608
  * futex: runtime enable pi and robust functionality
    - LP: #301608
  * file capabilities: simplify signal check
    - LP: #301608
  * hugetlb: ensure we do not reference a surplus page after handing it to
    buddy
    - LP: #301608
  * ufs: fix parenthesisation in ufs_set_fs_state()
    - LP: #301608
  * spi: pxa2xx_spi clock polarity fix
    - LP: #301608
  * NETFILTER: Fix incorrect use of skb_make_writable
    - LP: #301608
  * NETFILTER: fix ebtable targets return
    - LP: #301608
  * SCSI advansys: fix overrun_buf aligned bug
    - LP: #301608
  * pata_hpt*, pata_serverworks: fix UDMA masking
    - LP: #301608
  * moduleparam: fix alpha, ia64 and ppc64 compile failures
    - LP: #301608
  * PCI x86: always use conf1 to access config space below 256 bytes
    - LP: #301608
  * e1000e: Fix CRC stripping in hardware context bug
    - LP: #301608
  * atmel_spi: fix clock polarity
    - LP: #301608
  * x86: move out tick_nohz_stop_sched_tick() call from the loop
    - LP: #301608
  * macb: Fix speed setting
    - LP: #301608
  * ioat: fix 'ack' handling, driver must ensure that 'ack' is zero
    - LP: #301608
  * VT notifier fix for VT switch
    - LP: #301608
  * USB: ftdi_sio: Workaround for broken Matrix Orbital serial port
    - LP: #301608
  * USB: ftdi_sio - really enable EM1010PC
    - LP: #301608
  * SCSI: fix BUG when sum(scatterlist) > bufflen
    - LP: #301608
  * x86: don't use P6_NOPs if compiling with CONFIG_X86_GENERIC
    - LP: #301608
  * Fix default compose table initialization
    - LP: #301608
  * SCSI: gdth: bugfix for the at-exit problems
    - LP: #301608
  * sched: fix race in schedule()
    - LP: #301608
  * nfsd: fix oops on access from high-numbered ports
    - LP: #301608
  * sched_nr_migrate wrong mode bits
    - LP: #301608
  * NETFILTER: xt_time: fix failure to match on Sundays
    - LP: #301608
  * NETFILTER: nfnetlink_queue: fix computation of allocated size for
    netlink skb
    - LP: #301608
  * NETFILTER: nfnetlink_log: fix computation of netlink skb size
    - LP: #301608
  * zisofs: fix readpage() outside i_size
    - LP: #301608
  * jbd2: correctly unescape journal data blocks
    - LP: #301608
  * jbd: correctly unescape journal data blocks
    - LP: #301608
  * aio: bad AIO race in aio_complete() leads to process hang
    - LP: #301608
  * async_tx: avoid the async xor_zero_sum path when src_cnt >
    device->max_xor
    - LP: #301608
  * SCSI advansys: Fix bug in AdvLoadMicrocode
    - LP: #301608
  * BLUETOOTH: Fix bugs in previous conn add/del workqueue changes.
    - LP: #301608
  * relay: fix subbuf_splice_actor() adding too many pages
    - LP: #301608
  * slab: NUMA slab allocator migration bugfix
    - LP: #301608
  * S390 futex: let futex_atomic_cmpxchg_pt survive early functional tests.
    - LP: #301608
  * Linux 2.6.24.4
    - LP: #301608
  * time: prevent the loop in timespec_add_ns() from being optimised away
    - LP: #301632
  * kbuild: soften modpost checks when doing cross builds
    - LP: #301632
  * mtd: memory corruption in block2mtd.c
    - LP: #301632
  * md: remove the 'super' sysfs attribute from devices in an 'md' array
    - LP: #301632
  * V4L: ivtv: Add missing sg_init_table()
    - LP: #301632
  * UIO: add pgprot_noncached() to UIO mmap code
    - LP: #301632
  * USB: new quirk flag to avoid Set-Interface
    - LP: #301632
  * NOHZ: reevaluate idle sleep length after add_timer_on()
    - LP: #301632
  * slab: fix cache_cache bootstrap in kmem_cache_init()
    - LP: #301632
  * xen: fix RMW when unmasking events
    - LP: #301632
  * xen: mask out SEP from CPUID
    - LP: #301632
  * xen: fix UP setup of shared_info
    - LP: #301632
  * PERCPU : __percpu_alloc_mask() can dynamically size percpu_data storage
    - LP: #301632
  * alloc_percpu() fails to allocate percpu data
    - LP: #301632
  * vfs: fix data leak in nobh_write_end()
    - LP: #301632
  * pci: revert SMBus unhide on HP Compaq nx6110
    - LP: #301632
  * vmcoreinfo: add the symbol "phys_base"
    - LP: #301632
  * USB: Allow initialization of broken keyspan serial adapters.
    - LP: #301632
  * USB: serial: fix regression in Visor/Palm OS module for kernels >=
    2.6.24
    - LP: #301632
  * USB: serial: ti_usb_3410_5052: Correct TUSB3410 endpoint requirements.
    - LP: #301632
  * CRYPTO xcbc: Fix crash when ipsec uses xcbc-mac with big data chunk
    - LP: #301632
  * mtd: fix broken state in CFI driver caused by FL_SHUTDOWN
    - LP: #301632
  * ipmi: change device node ordering to reflect probe order
    - LP: #301632
  * AX25 ax25_out: check skb for NULL in ax25_kick()
    - LP: #301632
  * NET: include <linux/types.h> into linux/ethtool.h for __u* typedef
    - LP: #301632
  * SUNGEM: Fix NAPI assertion failure.
    - LP: #301632
  * INET: inet_frag_evictor() must run with BH disabled
    - LP: #301632
  * LLC: Restrict LLC sockets to root
    - LP: #301632
  * netpoll: zap_completion_queue: adjust skb->users counter
    - LP: #301632
  * PPPOL2TP: Make locking calls softirq-safe
    - LP: #301632
  * PPPOL2TP: Fix SMP issues in skb reorder queue handling
    - LP: #301632
  * NET: Add preemption point in qdisc_run
    - LP: #301632
  * sch_htb: fix "too many events" situation
    - LP: #301632
  * SCTP: Fix local_addr deletions during list traversals.
    - LP: #301632
  * NET: Fix multicast device ioctl checks
    - LP: #301632
  * TCP: Fix shrinking windows with window scaling
    - LP: #301632
  * TCP: Let skbs grow over a page on fast peers
    - LP: #301632
  * VLAN: Don't copy ALLMULTI/PROMISC flags from underlying device
    - LP: #301632
  * SPARC64: Fix atomic backoff limit.
    - LP: #301632
  * SPARC64: Fix __get_cpu_var in preemption-enabled area.
    - LP: #301632
  * SPARC64: flush_ptrace_access() needs preemption disable.
    - LP: #301632
  * libata: assume no device is attached if both IDENTIFYs are aborted
    - LP: #301632
  * sis190: read the mac address from the eeprom first
    - LP: #301632
  * bluetooth: hci_core: defer hci_unregister_sysfs()
    - LP: #301632
  * SPARC64: Fix FPU saving in 64-bit signal handling.
    - LP: #301632
  * DVB: tda10086: make the 22kHz tone for DISEQC a config option
    - LP: #301632
  * HFS+: fix unlink of links
    - LP: #301632
  * plip: replace spin_lock_irq with spin_lock_irqsave in irq context
    - LP: #301632
  * signalfd: fix for incorrect SI_QUEUE user data reporting
    - LP: #301632
  * POWERPC: Fix build of modular drivers/macintosh/apm_emu.c
    - LP: #301632
  * PARISC futex: special case cmpxchg NULL in kernel space
    - LP: #301632
  * PARISC pdc_console: fix bizarre panic on boot
    - LP: #301632
  * PARISC fix signal trampoline cache flushing
    - LP: #301632
  * acpi: bus: check once more for an empty list after locking it
    - LP: #301632
  * fbdev: fix /proc/fb oops after module removal
    - LP: #301632
  * macb: Call phy_disconnect on removing
    - LP: #301632
  * file capabilities: remove cap_task_kill()
    - LP: #301632
  * locks: fix possible infinite loop in fcntl(F_SETLKW) over nfs
    - LP: #301632
  * Linux 2.6.24.5
    - LP: #301632
  * splice: use mapping_gfp_mask
    - LP: #301634
  * fix oops on rmmod capidrv
    - LP: #301634
  * USB: gadget: queue usb USB_CDC_GET_ENCAPSULATED_RESPONSE message
    - LP: #301634
  * JFFS2: Fix free space leak with in-band cleanmarkers
    - LP: #301634
  * Increase the max_burst threshold from 3 to tp->reordering.
    - LP: #301634
  * USB: remove broken usb-serial num_endpoints check
    - LP: #301634
  * V4L: Fix VIDIOCGAP corruption in ivtv
    - LP: #301634
  * Linux 2.6.24.6, 2.6.24.7
    - LP: #301634

linux (2.6.24-22.45) hardy-security; urgency=low

[Upstream Kernel Changes]

* Don't allow splice() to files opened with O_APPEND
    - CVE-2008-4554
  * sctp: Fix oops when INIT-ACK indicates that peer doesn't support AUTH
    - CVE-2008-4576
  * sctp: Fix kernel panic while process protocol violation parameter
    - CVE-2008-4618
  * hfsplus: fix Buffer overflow with a corrupted image
    - CVE-2008-4933
  * hfsplus: check read_mapping_page() return value
    - CVE-2008-4934
  * net: Fix recursive descent in __scm_destroy().
    - CVE-2008-5029
  * net: unix: fix inflight counting bug in garbage collector
    - CVE-2008-5029
  * security: avoid calling a NULL function pointer in
    drivers/video/tvaudio.c
    - CVE-2008-5033
  * hfs: fix namelength memory corruption
    - CVE-2008-5025
  * V4L/DVB (9621): Avoid writing outside shadow.bytes[] array

-- Stefan Bader <stefan.bader@canonical.com>   Mon, 24 Nov 2008 09:44:34 +0100

Changed in linux:
status:	Fix Committed → Fix Released

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2009-01-08:

#30

DesktopMan, since you are the original bug reporter, it would be great to get confirmation from you that this newer kernel does indeed fix the bug you had reported here. Thanks.

Revision history for this message

DesktopMan (christian-auby) wrote on 2009-01-09:

#31

I do not have this setup anymore (9+ months, needed it operational), but there have been other people reporting the same problem more recently. Hopefully one of them will be able to confirm.

Revision history for this message

MrPogson (f-launchpad-net-g33kay-ca) wrote on 2009-06-06:

#32

I have a system exhibiting the same/similar symptoms.

Running a fresh install of Ubuntu 9.04 jaunty
uname -a: Linux ServerX 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:58:03 UTC 2009 x86_64 GNU/Linux

Motherboard: SUPERMICRO MBD-H8DME-2-O
SATA card: SUPERMICRO AOC-SAT2-MV8 (Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09))

The system has a SW RAID6 array made of four 1TB disks. Currently the array is degraded and only has 3 disks to work with.

md1 : active raid6 sde1[4] sdd1[0] sdc1[2]
1953519872 blocks level 6, 64k chunk, algorithm 2 [4/2] [U_U_]
[==>..................] recovery = 10.8% (105874700/976759936) finish=142.0min speed=102195K/sec

With the array on the PCI-X card I'm able to recreate the crash by failing a drive and reading it to the array. Some time after 50% it will hang and the system is unresponsive.

The system boots from RAID1 md0 two 500GB drive which is on the motherboards controller. I was able to add a disk plugged into the PCI-X to md0 and it would sync w/o problems.

Moving the RAID6 array to the mother boards controller the rebuild will work w/o problems.

kpolberg mentioned adjusting stripe_cache_size.The command he posted:
echo 16384 > /sys/block/md1/md/stripe_cache_size
Looks like it helps, no crash fro 24Hrs.

If it remains stable I will try with a larger array.

Will post more info if needed.

Revision history for this message

MrPogson (f-launchpad-net-g33kay-ca) wrote on 2009-06-06:

#33

forgot to mention the RAID6 array is using reiserfs

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Fix Released	Undecided	Unassigned
	Hardy	Fix Released	Medium	Colin Ian King

Ubuntu
linux package

mdadm with Raid5 stuck in uninterruptable sleep

Bug Description

CVE References

Other bug subscribers

Patches

Remote bug watches

Ubuntulinux package

mdadm with Raid5 stuck in uninterruptable sleep

Bug Description

CVE References

Other bug subscribers

Patches

Remote bug watches

Ubuntu
linux package