linux 3.2.0-18 - 22 kernel panic on boot, Alienware m17x, Dell xps 1340

Bug #972723 reported by Brian C. Ladd
56
This bug affects 10 people
Affects Status Importance Assigned to Milestone
Linux
Confirmed
Medium
linux (Ubuntu)
High
Unassigned

Bug Description

lsb_release -rd
Description: Ubuntu precise (development branch)
Release: 12.04

apt-cache policy linux-image-generic
linux-image-generic:
  Installed: 3.2.0.21.23
  Candidate: 3.2.0.21.23
  Version table:
 *** 3.2.0.21.23 0
        500 http://us.archive.ubuntu.com/ubuntu/ precise/main amd64 Packages
        100 /var/lib/dpkg/status

When booting Alienware m17x with Precise beta (updated twice a day), all versions of kernel after 3.2.0-17 (3.2.0-18 through -21, 3.3.0-999 dailies, 3.3.0rc7precise, 3.4.0rc1precise) panic about 7 seconds into the boot sequence. I, instead, expect linux to boot. -17 boots cleanly every time on the same hardware.

Managed to capture a photograph of the stack trace. Photo and background available at http://ubuntuforums.org/showpost.php?p=11813173&postcount=14

The trace is

                ? ite_cir_isr
                ? wakeup_preempt_entitity.isra
                handle_irqevent_percpu
                ? check_preempt_curr
                handle_irq_event
                handle_edge_irq
                handle_irq
                do_IRQ
                common_interrupt
                ? system_call_fastpath
RIP value
 (null)>] (null)

io- not syncing: Fatal exception in interrupt
comm: upstart-udev-br Tainted: G D 3.

                panic
                oops_end
                no_context
                __bad_area_nosemaphore
                ? native_send_call_func_single_ipi
                bad_area_nosemaphore
                do_page_fault
                ? scsi_request_fn
                ? blk_complete_request
                ? scsi_done
                ? ata_scsi_go_complete
                page_fault
                ? ite+cir_isr
                ? wakeup_preempt_entity.isra
                handle_irqevent_percpu
                ? check_preempt_curr
                handle_irq_event
                handle_edge_irq
                handle_irq
                do_IRQ
                common_interrupt
                ? system_call_fastpath

The actual stack traces for 3.3 and 3.4 kernels are different; all the versions of 3.2 produce similar traces to that given (actually used -21, I believe). apport does not want to accept a bug report on the current configuration, claiming there is a problem with support for the current configuration. Let me know what I can do to help.

Brad Figg (brad-figg)
affects: linux-meta (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 972723

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: precise
Revision history for this message
Brian C. Ladd (drbcladd) wrote : Re: linux 3.2.0-18 - 21 kernel panic on boot, Alienware m17x

Attempting to run apport-collect and it crashes all over the place:

apport-collect 972723
No packages found matching linux.
ERROR: hook /usr/share/apport/general-hooks/ubuntu.py crashed:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/apport/report.py", line 718, in add_hoo
ks_info
    symb['add_info'](self, ui)
  File "/usr/share/apport/general-hooks/ubuntu.py", line 37, in add_info
    match_error_messages(report)
  File "/usr/share/apport/general-hooks/ubuntu.py", line 121, in match_error_messages
    if report['ProblemType'] == 'Package':
  File "/usr/lib/python2.7/UserDict.py", line 23, in __getitem__
    raise KeyError(key)
KeyError: 'ProblemType'
Traceback (most recent call last):
  File "/usr/share/apport/apport-gtk", line 493, in <lambda>
    GLib.idle_add(lambda: self.collect_info(on_finished=self.ui_update_view))
  File "/usr/lib/python2.7/dist-packages/apport/ui.py", line 861, in collect_info
    icthread.exc_raise()
  File "/usr/lib/python2.7/dist-packages/apport/REThread.py", line 34, in run
    self._retval = self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.7/dist-packages/apport/ui.py", line 111, in thread_collect_info
    if report['ProblemType'] == 'Crash' and \
  File "/usr/lib/python2.7/UserDict.py", line 23, in __getitem__
    raise KeyError(key)
KeyError: 'ProblemType'

That hangs the collector window and I cannot submit anything. Thus I don't know how to make the bug report complete

Brian C. Ladd (drbcladd)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.2.0-21.34)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get dist-upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.2.0-21.34
Revision history for this message
Brian C. Ladd (drbcladd) wrote : Re: linux 3.2.0-18 - 21 kernel panic on boot, Alienware m17x
Download full text (21.0 KiB)

The versions of the precise kernel installed on my machine are:
linux-image-3.2.0-17-generic 3.2.0-17.27 Linux kernel image for version 3.2.0 on 64bit x86 SMP
linux-image-3.2.0-18-generic 3.2.0-18.29 Linux kernel image for version 3.2.0 on 64bit x86 SMP
linux-image-3.2.0-19-generic 3.2.0-19.31 Linux kernel image for version 3.2.0 on 64bit x86 SMP
linux-image-3.2.0-20-generic 3.2.0-20.33 Linux kernel image for version 3.2.0 on 64bit x86 SMP
linux-image-3.2.0-21-generic 3.2.0-21.34 Linux kernel image for version 3.2.0 on 64bit x86 SMP

All after -17 (the top one) exhibit the described behavior. I cannot gathter data on any newer kernel because they all fail to boot. They panic before logging to the drive is enabled.

Below find the output of the commands
lspci -vvvn
lspnp -vvv
cat /proc/interrupts

Note: These were all run on -17 since none of the other kernels boot.

00:00.0 0600: 10de:0a82 (rev b1)
 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0

00:00.1 0500: 10de:0a88 (rev b1)
 Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0

00:03.0 0601: 10de:0aae (rev b2)
 Subsystem: 10de:cb79
 Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0
 Region 0: I/O ports at 1c00 [size=256]

00:03.1 0500: 10de:0aa4 (rev b1)
 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
 Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:03.2 0c05: 10de:0aa2 (rev b1)
 Subsystem: 10de:cb79
 Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Interrupt: pin A routed to IRQ 10
 Region 0: I/O ports at 3080 [size=64]
 Region 4: I/O ports at 3040 [size=64]
 Region 5: I/O ports at 2000 [size=64]
 Capabilities: <access denied>
 Kernel driver in use: nForce2_smbus
 Kernel modules: i2c-nforce2

00:03.3 0500: 10de:0a89 (rev b1)
 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

00:03.5 0b40: 10de:0aa3 (rev b1)
 Subsystem: 10de:cb79
 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0 (750ns min, 250ns max)
 Interrupt: pin B routed to IRQ 19
 Region 0: Memory at f0600000 (32-bit, non-prefetchable) [size=512K]
 Kernel driver in use: nvidia
 Kernel modules: nvidia, nvidia_current_updates, nvidia_curre...

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Peter Hurley (phurley) wrote :

Hi Brad,

This bug exists in all kernels up to and including mainline (which as of today was 3.4-rc1).

The ite_probe() function in the ite-cir driver (drivers/media/rc/ite-cir.c) installs an interrupt service routine via request_irq() before properly initializing both the hardware and crucial function tables the ISR uses. As the documentation for request_irq() specifies, all initialization required for an ISR to function properly must before performed before calling request_irq(), as an interrupt may be dispatched even before request_irq() returns.

Normally, this bug would be difficult to trigger and reproduce. However, now that CONFIG_IRQ_REMAPPING is on, the interrupt is being immediately triggered for remapping purposes.

Since the function table has not yet been initialized by the ite_probe() function, the ISR, ite_cir_isr(), tries to call through the function table which immediately panics (since it tries to jump to vma 0).

As the hardware does not perform a critical function, perhaps the best temporary solution is to blacklist it.

Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.2.0-22.35)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get dist-upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.2.0-22.35
Revision history for this message
Yotam Benshalom (benshalom) wrote : Re: linux 3.2.0-18 - 21 kernel panic on boot, Alienware m17x

I suffer from an identical problem on dell xps studio 1340 with nvidia 9500m card. The 3.0.0-17 kernel boots fine, but the 3.2.0-22 (and all the former 3.2 I tried) leads to kernel panic. No logs are saved, and most of the data runs on the screen too fast to be read. It seems to be the same bug, although I cannot be sure of that.
Another report of this problem on a different dell computer is found here: http://ubuntuforums.org/showthread.php?t=1928957

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: bot-stop-nagging
Revision history for this message
Yotam Benshalom (benshalom) wrote :

A workaround which was suggested in the thread http://ubuntuforums.org/showthread.php?p=11819330 fixed the issue for me:

echo "blacklist ite-cir" | sudo tee /etc/modprobe.d/ite-cir.conf
sudo depmod -a 3.2.0-22-generic
sudo update-initramfs -u -k 3.2.0-22-generic

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Brian et al,

First, can you try booting the latest 3.2.0-22.35 kernel. I just want to confirm this isn't resolved before we investigate further.

Assuming the issue remains, lets try and figure out the exact change which introduced this regression. Between the 3.2.0-17 and 3.2.0-18 kernels there were quite a few changes, most notably rebases to upstream stable Linux kernel v3.2.7, v3.2.8, and v3.2.9. 3.2.0-17.27 was actually based on upstream stable Linux kernel v3.2.6. So lets have you do the following set of tests:

1) Install and boot the upstream v3.2.6 kernel. Does it work (ie boot properly) or panic? If it works, proceed to step 2. Note that I would expect this kernel to work, but want confirmation.
2) Install and boot the upstream v3.2.7 kernel. Does it work (ie boot properly) or panic? If it works, proceed to step 3.
3) Install and boot the upstream v3.2.8 kernel. Does it work (ie boot properly) or panic? If it works, proceed to step 4
4) Install and boot the upstream v3.2.9 kernel. Does it work or panic?

Let me know your results of the above. The above stable kernels can each be found at the following:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2.6-precise/
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2.7-precise/
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2.8-precise/
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2.9-precise/

Changed in linux (Ubuntu):
importance: Medium → High
status: Confirmed → Triaged
summary: - linux 3.2.0-18 - 21 kernel panic on boot, Alienware m17x
+ linux 3.2.0-18 - 22 kernel panic on boot, Alienware m17x, Dell xps 1340
tags: added: kernel-da-key
Revision history for this message
Brian C. Ladd (drbcladd) wrote :

Update: blacklisting ite-cir seems to fix the problem:

Ran
    echo "blacklist ite-cir" | sudo tee /etc/modprobe.d/ite-cir.conf
    sudo depmod -a 3.2.0-21-generic
    sudo update-initramfs -u -k 3.2.0-21-generic

For each of the affected kernels through 3.2.0-22 and was able to consistently warmboot and coldboot. 3.3.0 and 3.4.0 dailies have an additional, probably unrelated, crash later on in the boot sequence.

Revision history for this message
Peter Hurley (phurley) wrote :

Hi Leann,

I'm not sure you saw my comment #5; this bug is a race condition that exists in the ite-cir driver (from driver's introduction last year to current mainline). The driver installs an ISR before properly initializing. If an IRQ occurs after the driver has installed the ISR (via request_irq) but before it has initialized its internal call tables, the ite_cir_isr() function will panic (by accessing vma 0). Incidentally, that's not the only problem with initialization order in this driver: it hasn't initialized hardware either before installing the ISR.

Although initially I believed this bug's easy reproducibility was due to CONFIG_IRQ_REMAP being on, I realize now that's not the case (or, at least, not exclusively).

After more extensive static analysis, the most likely possibility is "genirq: Handle pending irqs in irq_startup()" which was introduced in stable v.3.2.9. This retriggers pending edge-type interrupts when installing an ISR:
   request_irq => request_threaded_irq => __setup_irq => irq_startup => check_irq_resend => irq_retrigger

The only way to be certain would be with git bisect. Of course, understanding the easy reproducibility is mostly academic as the actual bug is a race condition that has always existed (at least since the driver's introduction).

Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
Daniel Manrique (roadmr) wrote :

Somehow we're seeing a similar problem on kernel 3.0.0-19 for Ubuntu 11.10; this didn't happen with 3.0.0-17. See bug 984387. This may help narrow things down if needed.

Also bug 784484 comes to mind, it was a (potentially) similar problem with the ene_ir driver that surfaced in kernel 2.6.35-rc2. That bug has a link to the exact commit that started triggering the issue.

Let me know if we can help in any way to pinpoint this; this machine is used primarily for testing so we can run any needed tests, install/reinstall anything that may be needed, and/or run a bisection process.

Revision history for this message
Daniel Manrique (roadmr) wrote :

Er, sorry for being vague; We have a Dell Studio 1340XPS that also panics with the ite_cir module, we first observed this on Ubuntu 11.10, where upgrading to kernel 3.0.0-19 causes the crash. Also, on Ubuntu Precise, the installer panics as it's trying to load the driver; I had no idea why but now, looking at this report, seems like it may be the cause.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Everyone,

Bug 984387 is seeing this same issue. I've build a test kernel for Precise (I'm basically proposing we temporarily disable this driver until we can get a proper fix in place). Could you test and let me know your results? Thanks.

http://people.canonical.com/~ogasawara/lp984387/amd64/

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Could we also get testing against the following. It's more along the lines of a proper fix:

http://people.canonical.com/~henrix/lp984387-postponeirq/amd64/

Revision history for this message
Luis Henriques (henrix) wrote :

I've uploaded another test kernel for Oneiric with the same fix here:

http://people.canonical.com/~henrix/lp984387-postponeirq-oneiric/

Could someone running Oneiric give it a try and report back?

BTW, Peter Hurley: thanks for your analysis, it looks like you were right.

Revision history for this message
Richard Kent Jordan (rjordan) wrote :

Leann,
I can +1 your linux-image-3.2.0-23-generic_3.2.0-23.37~postponeirq_amd64.deb fix. Finally have a working kernel again!!!

BTW this is a mainline kernel bug, and is pretty widespread. Seen the same effect on 11.10 (any kernel after 17), 12.04rc (including the default boot on the desktop cd) and the latest release of LMDE.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Thanks Richard.

We indeed plan to send this upstream. We just wanted to get a few additional test confirmations first.

For now I'm going to go ahead and mark this as a duplicate of bug 984387 which is tracking the same issue. Please continue to follow that bug for the latest updates.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.