[Dell PowerEdge R200] lockups in linux-image-2.6.32-45-server

Bug #1089351 reported by Tamas Csillag
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
High
Unassigned

Bug Description

After upgrading from linux-image-2.6.32-43-server to linux-image-2.6.32-45-server, we started experiencing intermittent lockups on one of our servers: TCP connections frozen, no pings for 5-10 minutes, everything fine afterwards. Issue happens 1-3 times a day, usually under loads over 2.0. Server is a dedicated bacula server, usually no significant load during the day. Nothing in dmesg or syslog. Based on the Ubuntu kernel changelog, I'd guess network TSO or a SCSI problem. As this is not a full-time critical server, we can do some testing.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-45-server 2.6.32-45.101
Regression: Yes
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.32-43.97-server 2.6.32.59+drm33.24
Uname: Linux 2.6.32-43-server x86_64
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
Date: Wed Dec 12 11:57:18 2012
MachineType: Dell Inc. PowerEdge R200
PciMultimedia:

ProcCmdLine: root=/dev/md0 ro quiet splash crashkernel=384M-2G:64M,2G-:128M
ProcEnviron:
 LANG=en_IE.UTF-8
 SHELL=/bin/bash
SourcePackage: linux
dmi.bios.date: 05/15/2009
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.4.3
dmi.board.name: 0TY019
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.4.3:bd05/15/2009:svnDellInc.:pnPowerEdgeR200:pvr:rvnDellInc.:rn0TY019:rvrA00:cvnDellInc.:ct23:cvr:
dmi.product.name: PowerEdge R200
dmi.sys.vendor: Dell Inc.

Revision history for this message
Tamas Csillag (tamas-csillag) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Re: lockups in linux-image-2.6.32-45-server

Does this issue go away if you boot back into linux-image-2.6.32-43-server ?

Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Also, would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.7 kernel[0] (Not a kernel in the daily directory) and install both the linux-image and linux-image-extra .deb packages.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.7-raring/

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Tamas Csillag (tamas-csillag) wrote :

We didn't experience it since we booted back to linux-image-2.6.32-43-server.
I'll try to test 3.7 kernel starting today-tomorrow.

Revision history for this message
Tamas Csillag (tamas-csillag) wrote :

Test going with 3.7, new lines in dmesg:

tamasc@cuimhne:~$ dmesg | tail
[ 49.389912] Fusion MPT misc device (ioctl) driver 3.04.20
[ 49.390012] mptctl: Registered with Fusion MPT base driver
[ 49.390015] mptctl: /dev/mptctl @ (major,minor=10,220)
[ 63.056982] megasas: fasync_helper was not called first
[ 66.540792] st0: Block limits 1 - 8388608 bytes.
[ 594.404072] dm-0: WRITE SAME failed. Manually zeroing.
[ 898.980064] dm-0: WRITE SAME failed. Manually zeroing.
[ 1202.184070] dm-0: WRITE SAME failed. Manually zeroing.
[ 1507.152072] dm-0: WRITE SAME failed. Manually zeroing.
[ 1507.472064] dm-0: WRITE SAME failed. Manually zeroing.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
tags: added: kernel-fixed-upstream
Changed in linux (Ubuntu):
status: Expired → Confirmed
Revision history for this message
penalvch (penalvch) wrote :

Tamas Csillag, could you please test for this in Precise server via http://releases.ubuntu.com/12.04.2/ and advise is this still reproducible?

description: updated
tags: added: kernel-fixed-upstream-v3.7 latest-bios-1.4.3
removed: kernel-fixed-upstream needs-upstream-testing networking
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
summary: - lockups in linux-image-2.6.32-45-server
+ [Dell PowerEdge R200] lockups in linux-image-2.6.32-45-server
Revision history for this message
Tamas Csillag (tamas-csillag) wrote :

We'd need to dist-upgrade all our Bacula servers to 12.04, which is not really an option at the moment.
We could do that upgrade around March/2014 at best.

Tamas

Revision history for this message
penalvch (penalvch) wrote :

Tamas Csillag, would a live environment be ok in the interim just as a test?

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.