Ubuntu Slow, crash, sluggish from version 10.04 upwards

Bug #1046326 reported by Michael Atkinson
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

Ubuntu will at random points suddenly become very slow and sluggish. graphics run far behind, keyboard strokes take multiple seconds to show, graphics are updated in super slowmo mode, or if you're unlucky; the system dies completely.

This is related to a problem with the interrupt handling. On my own machine (MSI i7, nvidia card, 16 Gb, 5 sata HDD's) I have had this problem from 10.04 (when I bough the hware) up to 12.04. I have since upgraded up to various kernels up to 3.5.3 with varying results. Some of the suggestions in posts make the periods between crashes shorter, some days it works fine, some days it happens every five minutes, but it NEVER goes away.

If you consider this is interrupt related, that makes perfect sense. Every single 'crash' is marked by an unanswered interrupt. See below syslog:

Sep 5 15:13:14 Server kernel: [112195.631364] irq 16: nobody cared (try booting with the "irqpoll" option)
Sep 5 15:13:14 Server kernel: [112195.631368] Pid: 17901, comm: firefox Tainted: P O 3.5.3-030503-generic #201208252335
Sep 5 15:13:14 Server kernel: [112195.631369] Call Trace:
Sep 5 15:13:14 Server kernel: [112195.631370] <IRQ> [<ffffffff810e5a5d>] __report_bad_irq+0x3d/0xe0
Sep 5 15:13:14 Server kernel: [112195.631378] [<ffffffff810e5ce5>] note_interrupt+0x135/0x190
Sep 5 15:13:14 Server kernel: [112195.631380] [<ffffffff810e3559>] handle_irq_event_percpu+0xa9/0x210
Sep 5 15:13:14 Server kernel: [112195.631382] [<ffffffff810e370e>] handle_irq_event+0x4e/0x80
Sep 5 15:13:14 Server kernel: [112195.631384] [<ffffffff810e6874>] handle_fasteoi_irq+0x64/0x120
Sep 5 15:13:14 Server kernel: [112195.631388] [<ffffffff81016632>] handle_irq+0x22/0x40
Sep 5 15:13:14 Server kernel: [112195.631391] [<ffffffff816a49ea>] do_IRQ+0x5a/0xe0
Sep 5 15:13:14 Server kernel: [112195.631394] [<ffffffff8169a86a>] common_interrupt+0x6a/0x6a
Sep 5 15:13:14 Server kernel: [112195.631395] <EOI> [<ffffffff816a2c2d>] ? system_call_fastpath+0x1a/0x1f
Sep 5 15:13:14 Server kernel: [112195.631398] handlers:
Sep 5 15:13:14 Server kernel: [112195.631401] [<ffffffff814bc670>] usb_hcd_irq
Sep 5 15:13:14 Server kernel: [112195.631462] [<ffffffffa0d65cb0>] nv_kern_isr [nvidia]
Sep 5 15:13:14 Server kernel: [112195.631470] [<ffffffffa00640f0>] rhine_interrupt [via_rhine]
Sep 5 15:13:14 Server kernel: [112195.631471] Disabling IRQ #16

The problem is massively compounded by the fact that irqpoll and irqfixup options NO LONGER WORK AT ALL ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/855199 ).

No matter if you switch to Nouveau, diff kernel, or other drivers, the problem remains is varying severity. Again; logical if there is a problem with the kernel interrupt handling itself.

So here's the kicker. It is easy to resolve! By forcing a restart of the interrupt handlers the system INSTANTLY comes back to 100% functional life:

root@Server:~# service network-manager restart

My theory (for what it's worth; I have programmed assembly for 30 years) is that there is a multi-threading fault somewhere, which causes the kernel to miss an interrupt, or to miss allocating it.

The one thing they have in common though: Shared interrupts (usually 16) and LOAD on that interrupt (USB (KB+MS), VGA, ETH) on high speed machines.

I have created a keyboard shortcut that does the service restart and this works perfectly 100% of the time. If any dev needs more info, contact me.

Just to prove a point: Same machine on win7 works perfectly.

Michael

description: updated
description: updated
bugbot (bugbot)
tags: added: lucid
bugbot (bugbot)
tags: added: crash
tags: added: performance
Revision history for this message
Michael Atkinson (efowyqq) wrote :

It is mindblowing how nobody reacts to these things.

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

You might have better luck when a bug is filed against the right package

affects: xserver-xorg-video-intel (Ubuntu) → linux (Ubuntu)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.6 kernel[0] (Not a kernel in the daily directory) and install both the linux-image and linux-image-extra .deb packages.

Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. Please only remove that one tag and leave the other tags. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.6-rc7-quantal/

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key needs-upstream-testing
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Michael Atkinson (efowyqq) wrote :

as advised, problem was tested against 3.6.rc7 and showed same problem.

Note that 3.6.rc7 had a problem with resolution and nvidia card refused to operate over 1024x768 on first boot and did not boot at all on 2nd boot.

tags: added: oneiric precisekernel-bug-exists-upstream
removed: needs-upstream-testing
tags: added: kernel-bug-exists-upstream precise
removed: precisekernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report[0]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

If you are comfortable with opening a bug upstream, It would be great if you can report back the upstream bug number in this bug report. That will allow us to link this bug to the upstream report.

[0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
penalvch (penalvch) wrote :

Michael Atkinson, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available (not the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.12

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

summary: - Ubuntu Slow, crash, sluggish from version 10.04 upwards on i3/i5/i7
- machines
+ Ubuntu Slow, crash, sluggish from version 10.04 upwards
description: updated
Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.