Kernel panics under load

Bug #750359 reported by Alvin
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Steps to reproduce:
- Configure Ubuntu as rsync server
- Start a few (4 in this case) rsync jobs to copy data to the server
- Within the hour, the server will crash

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-30-server 2.6.32-30.59
Regression: Yes
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.32-30.59-server 2.6.32.29+drm33.13
Uname: Linux 2.6.32-30-server x86_64
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access /dev/snd/: No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
CurrentDmesg:
 [ 15.318664] svc: failed to register lockdv1 RPC service (errno 97).
 [ 15.322150] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
 [ 15.328101] NFSD: starting 90-second grace period
 [ 21.550018] eth0: no IPv6 routers present
Date: Mon Apr 4 16:52:06 2011
MachineType: Supermicro X7SLA
PciMultimedia:

ProcCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.32-30-server root=/dev/mapper/vg0-root ro quiet splash
ProcEnviron:
 LANG=C
 SHELL=/bin/bash
SourcePackage: linux
dmi.bios.date: 07/10/2009
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1.0a
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: X7SLA
dmi.board.vendor: Supermicro
dmi.board.version: 1234567890
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: Supermicro
dmi.chassis.version: 1234567890
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1.0a:bd07/10/2009:svnSupermicro:pnX7SLA:pvr1234567890:rvnSupermicro:rnX7SLA:rvr1234567890:cvnSupermicro:ct3:cvr1234567890:
dmi.product.name: X7SLA
dmi.product.version: 1234567890
dmi.sys.vendor: Supermicro

Revision history for this message
Alvin (alvind) wrote :
Revision history for this message
Alvin (alvind) wrote :

Adding a more useful log file

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Hi Alvin,
  To my reading of those logs something is running out of memory an then things are getting more and more upset (with softlockup detection in each cpu) - this shouldn't happen in principal, something should kill whatever is eating up all the memory.

  The other question is what is running out of memory; it might be useful to start your test and then every few minutes gather the output of

   free
   cat /proc/slabinfo
   ps -eafuw

and then attach the last couple of these before it dies to this report.

One thing I've seen in the past is that rsync leaks ~96bytes/file while the process is running (It's not really a leak - they keep it around to detect if the same file comes around again). If you are backing up something with zillions of small files it can mean your rsync processes get to be huge - I've hit that problem with backing up large source file trees with literally millions of files.

Dave

Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
tags: removed: regression-potential
Revision history for this message
penalvch (penalvch) wrote :

Alvin, thank you for reporting this bug and helping make Ubuntu better. This bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you run the following command in the development release from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux <replace-with-bug-number>

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.

Please let us know your results. Thanks in advance.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Alvin (alvind) wrote :

I had forgotten all about this bug. Yes, as far as I know this is still an issue. However, it is now cleare what caused it: LVM snapshots. LVM snapshots have a huge impact on disk performance. Create several, then cause some I/O. Ubuntu will crash after a while.

I probably forgot about this bug because, until shortly, Ubuntu couldn't even boot when snapshots were present.

Revision history for this message
Dave Gilbert (ubuntu-treblig) wrote :

Hi Alvin,
  It's worth trying the upstream kernel as suggested in Chrostopher's message; but also when Ubuntu crashes, if it's LVM related then it would be interested to get a new set of the dmesg errors when it crashes, but also a dump of your lvm info; perhaps the output of pvdisplay, vgdisplay and lvdisplay ?

Dave

Revision history for this message
Alvin (alvind) wrote :

It's a bit late for that. We stopped using Ubuntu and ended the support contract with Canonical due to bugs like these. Like I said, due to bug #563895 it was impossible to even boot with snapshots present. Together with the performance hit and related instability we couldn't keep using Ubuntu. (Bug #712392 is also a result of having snapshots.)

Maybe the new kernel fixes this. Maybe not. It's certainly a bug worth fixing.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.