Kernel Oops - unable to handle kernel NULL pointer dereference; IP: [<ffffffffa007e3c6>] ips_detect_cpu+0x76/0x1d0 [intel_ips]

Bug #648631 reported by Lauri Siltala on 2010-09-27
70
This bug affects 11 people
Affects Status Importance Assigned to Milestone
Linux
Invalid
Undecided
Unassigned
linux (Ubuntu)
Undecided
AceLan Kao
Maverick
Undecided
Unassigned

Bug Description

=== Problem ===
From the Maverick 2.6.35-17 kernel and onward, the INTEL_IPS config option was enabled to fix Bug 601057. Here, the intel_ips module causes an Oops which causes X not to load and the computer boots to a tty login prompt. Blacklisting INTEL_IPS fixes the bug completely.

=== dmesg (2.6.35-22.33) ===
(snip)
[ 20.162308] intel ips 0000:00:1f.6: No CPUID match found.
[ 20.162319] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 20.162424] IP: [<ffffffffa007e3c6>] ips_detect_cpu+0x76/0x1d0 [intel_ips]
[ 20.162500] PGD aefed067 PUD a4804067 PMD 0
[ 20.162616] Oops: 0000 [#1] SMP
[ 20.162702] last sysfs file: /sys/devices/pci0000:ff/0000:ff:02.1/uevent
[ 20.162749] CPU 1
[ 20.162780] Modules linked in: intel_ips(+) snd_page_alloc bluetooth led_class lp parport ahci r8169 mii libahci
[ 20.163144]
[ 20.163176] Pid: 418, comm: modprobe Not tainted 2.6.35-22-generic #33-Ubuntu FJNBB06/LIFEBOOK A530
[ 20.163229] RIP: 0010:[<ffffffffa007e3c6>] [<ffffffffa007e3c6>] ips_detect_cpu+0x76/0x1d0 [intel_ips]
[ 20.163312] RSP: 0018:ffff8800a4883c48 EFLAGS: 00010202
[ 20.163355] RAX: 0000000000a800c8 RBX: 0000000000000000 RCX: 0000000000a800c8
[ 20.163401] RDX: 0000000000000000 RSI: ffff8800a4883c64 RDI: 0000000000a800c8
[ 20.163451] RBP: ffff8800a4883c88 R08: 0000000000000000 R09: 0000000000000000
[ 20.163499] R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000a800c8
[ 20.163565] R13: ffff8800a4331b40 R14: ffff8800b2af3090 R15: 00000000fffffff4
[ 20.163637] FS: 00007f5616ada700(0000) GS:ffff880001f00000(0000) knlGS:0000000000000000
[ 20.163729] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 20.163792] CR2: 0000000000000008 CR3: 00000000aeffb000 CR4: 00000000000006e0
[ 20.163860] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 20.163926] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 20.163994] Process modprobe (pid: 418, threadinfo ffff8800a4882000, task ffff8800a48596e0)
[ 20.164085] Stack:
[ 20.164138] ffff8800a4883c88 ffffffff81142fa4 ffff8800a4883d14 00000000b2af3000
[ 20.164283] <0> ffff8800b2af3000 ffff8800a4331b40 ffff8800b2af3000 00000000fffffff4
[ 20.164499] <0> ffff8800a4883cd8 ffffffffa007efb1 ffff8800a4883cb8 ffff8800aed53e10
[ 20.164763] Call Trace:
[ 20.164823] [<ffffffff81142fa4>] ? kmem_cache_alloc_notrace+0xb4/0xd0
[ 20.164896] [<ffffffffa007efb1>] ips_probe+0x71/0x710 [intel_ips]
[ 20.164967] [<ffffffff812d6ec7>] local_pci_probe+0x17/0x20
[ 20.165034] [<ffffffff812d71b9>] __pci_device_probe+0xe9/0xf0
[ 20.165104] [<ffffffff812b951a>] ? kobject_get+0x1a/0x30
[ 20.165172] [<ffffffff81383c29>] ? get_device+0x19/0x20
[ 20.165238] [<ffffffff812d825a>] pci_device_probe+0x3a/0x60
[ 20.165304] [<ffffffff81387fe8>] really_probe+0x68/0x190
[ 20.165371] [<ffffffff81388155>] driver_probe_device+0x45/0x70
[ 20.165436] [<ffffffff8138821b>] __driver_attach+0x9b/0xa0
[ 20.165503] [<ffffffff81388180>] ? __driver_attach+0x0/0xa0
[ 20.165566] [<ffffffff81387428>] bus_for_each_dev+0x68/0x90
[ 20.165633] [<ffffffff81387e5e>] driver_attach+0x1e/0x20
[ 20.165697] [<ffffffff8138771e>] bus_add_driver+0xde/0x280
[ 20.165761] [<ffffffff81388560>] driver_register+0x80/0x150
[ 20.165825] [<ffffffff8158d0f6>] ? notifier_call_chain+0x56/0x80
[ 20.165845] input: Fujitsu FUJ02B1 as /devices/LNXSYSTM:00/LNXSYBUS:00/FUJ02B1:00/input/input4
[ 20.165898] ACPI: Fujitsu FUJ02B1 [FJEX] (on)
[ 20.166032] [<ffffffff812d84e6>] __pci_register_driver+0x56/0xd0
[ 20.166103] [<ffffffff81084f05>] ? __blocking_notifier_call_chain+0x65/0x80
[ 20.166176] [<ffffffffa0084000>] ? ips_init+0x0/0x20 [intel_ips]
[ 20.166245] [<ffffffffa008401e>] ips_init+0x1e/0x20 [intel_ips]
[ 20.166314] [<ffffffff8100204c>] do_one_initcall+0x3c/0x1a0
[ 20.166382] [<ffffffff8109bc6b>] sys_init_module+0xbb/0x200
[ 20.166450] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
[ 20.166517] Code: a0 48 c7 c7 88 ec ac 81 48 c7 c3 80 0a 08 a0 e8 c1 fc 23 e1 48 85 c0 74 44 bf ac 01 00 00 4c 89 e6 e8 3f 7e fb e0 66 90 41 89 c4 <8b> 53 08 41 c1 e4 12 41 c1 ec 15 41 69 c4 e8 03 00 00 39 c2 0f
[ 20.168316] RIP [<ffffffffa007e3c6>] ips_detect_cpu+0x76/0x1d0 [intel_ips]
[ 20.168411] RSP <ffff8800a4883c48>
[ 20.168470] CR2: 0000000000000008
[ 20.168539] ---[ end trace 934ab8f8f0f56d90 ]---

=== Xorg.0.log (2.6.35-22.33) ===
(snip)
[ 95.229] (WW) Falling back to old probe method for vesa
[ 95.229] (WW) Falling back to old probe method for fbdev
[ 95.229] (II) Loading sub module "fbdevhw"
[ 95.229] (II) LoadModule: "fbdevhw"
[ 95.229] (II) Loading /usr/lib/xorg/modules/libfbdevhw.so
[ 95.230] (II) Module fbdevhw: vendor="X.Org Foundation"
[ 95.230] compiled for 1.9.0, module version = 0.0.2
[ 95.230] ABI class: X.Org Video Driver, version 8.0
[ 95.230] (EE) open /dev/fb0: No such file or directory
[ 197.074] (EE) intel(0): No kernel modesetting driver detected.
[ 197.074] (II) UnloadModule: "intel"
[ 197.074] (EE) Screen(s) found, but none have a usable configuration.
[ 197.074]
Fatal server error:
[ 197.074] no screens found
[ 197.074]
Please consult the The X.Org Foundation support
  at http://wiki.x.org
 for help.
[ 197.074] Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[ 197.074]
[ 197.080] ddxSigGiveUp: Closing log

=== Upstream Testing ===
2.6.35.1 works. (This maps to 2.6.35-15.)
2.6.35.2 works. (This maps to 2.6.35-16.)
2.6.36-rc6 works.

=== Original Report ===
Binary package hint: xserver-xorg-video-intel

Basically, on Maverick, X fails to start properly for me on newer kernels. Kernels up to 2.6.35-15 work just fine, but the later ones don't. While booting, I get into a terminal login. If I run startx from there, it complains that it couldn't load the screen properly and offers low graphics mode.

Driver version: 2:2.12.0-1ubuntu4

dmesg log from kernel 2.6.35-17 (one of the not working ones): http://pastebin.com/9JNcNAN1
Xorg.log: http://pastebin.com/2XHxX45P

Note that these logs have drivers & Xorg from the xorg-edgers ppa. Removing it with ppa-purge doesn't change the situation in any way though.
Also, my GPU is Intel HD Graphics, lspci | grep VGA output being:
00:02.0 VGA compatible controller: Intel Corporation Core Processor Integrated Graphics Controller (rev 02)

It's my first time reporting a bug, I hope I didn't miss anything important! I already made a thread on the maverick forums about this, where I was advised to report a bug.

Bryce Harrington (bryce) on 2010-09-27
tags: added: edgers
tags: added: maverick
Lauri Siltala (latelol) wrote :

To my untrained eye, the dmesg log in my report shows that *something* is crashing at bootup (starting at line 765) which isn't happening on the kernels that are working just fine. I'd bet that's related.

Stenten (stenten) on 2010-09-28
affects: xserver-xorg-video-intel (Ubuntu) → linux (Ubuntu)
Stenten (stenten) wrote :

Please purge the xorg-edgers PPA so you can get a cleaner debugging environment (default Maverick packages).

Please upgrade your system, since there's a chance that this bug might have already been fixed in a more recent kernel. At the text console, login, and then run "sudo apt-get update && sudo apt-get upgrade".

Once you boot into the current Maverick kernel (2.6.35-22.33 at the time of writing this), please attach debugging information to this report by typing "apport-collect -p linux 648631" into a terminal.

If your problem is still present in the current Maverick kernel:

The -15 kernel (which is the kernel that introduced this bug) rebased to the mainline 2.6.35.1 kernel. This indicates that there might be a problem introduced between the 2.6.35 and 2.6.35.1 mainline kernels. Please install both of these mainline kernels from the Mainline Kernel PPA [1], along with the most current mainline kernel (v2.6.36-rc5-maverick/ at the time of writing this) (installation instructions can be found here [2]). Then please test whether the bug is still present in all three of these mainline kernels (including dmesg from all three kernels would also be helpful when you report if they work or not).

Thank you for taking the time to debug and hopefully make Ubuntu better.

[1]: http://kernel.ubuntu.com/~kernel-ppa/mainline/
[2]: https://wiki.ubuntu.com/Kernel/MainlineBuilds

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: kernel-oops needs-upstream-testing
Lauri Siltala (latelol) wrote :

Ah, maybe I wasn't clear enough. The -15 ḱernel didn't introduce the bug, as that kernel works perfectly. It's only the newer ones than that not working. Either way, I will run these tests and report back later.

Stenten (stenten) wrote :

Ah, I see what you meant now, sorry.

The 2.6.35-16 kernel rebased to the 2.6.35.2 mainline kernel, so the same logic applies, you just have to test the mainline 2.6.35.1 and 2.6.35.2 kernels instead of 2.6.35 vs. 2.6.35.1 for the presence of the bug (and of course still do everything else in my first post).

Lauri Siltala (latelol) wrote :

I ran into an unrelated problem yesterday trying to run apport-collect to get the information you requested.
Upon running it, it opened up a text-only web browser and took me to the launchpad login page. The problem was, the "Continue" button didn't work at all using the browser, it didn't even seem to consider the button a link. Any ideas?

Stenten (stenten) wrote :

Were you trying to click it? You have to arrow down and highlight the link and then hit Enter.

If it actually doesn't work, only attaching dmesg from the current maverick kernel will suffice for an apport-collect.

tags: added: kernel-graphics
Stenten (stenten) wrote :

Actually, now that I think about it, it would be helpful for you to attach both dmesg and /var/log/Xorg.0.log for each of the kernels you test.

Lauri Siltala (latelol) wrote :

No, I wasn't trying to click it. I did understand how to use a link with that browser, it just didn't seem to think the Continue button was a link at all. (Other links worked fine.) I think it may be related to the Continue button being different from Firefox too (I can't see where does it lead to..) I'll do the dmesg and Xorg logs later today or perhaps tomorrow.

Lauri Siltala (latelol) wrote :

on Firefox, even.

Lauri Siltala (latelol) wrote :

I tested the mainline kernels you asked me to, all of them worked with no noticeable issues whatsoever. I also used the 2.6.35-3 kernel because I do not have the -16 one and so I wasn't sure whether that one should work fine or not. Either way, here are my dmesg and Xorg logs. Sorry for taking so long, by the way!

Lauri Siltala (latelol) wrote :
Lauri Siltala (latelol) wrote :
Lauri Siltala (latelol) wrote :
Lauri Siltala (latelol) wrote :
Lauri Siltala (latelol) wrote :
Lauri Siltala (latelol) wrote :
Lauri Siltala (latelol) wrote :

Is there a method for submitting more than one attachment at once, anyway? I couldn't figure one out, at least..

Stenten (stenten) wrote :

Excellent, thank you.

Please also test the current Ubuntu kernel (2.6.35-22 at the moment) by just upgrading your system (sudo apt-get update && sudo apt-get upgrade). This will tell us whether or not this has already been fixed in the more recent kernels shipped with Ubuntu. (They might be held back, so you can install them with "apt-get install linux-generic linux-headers" etc; just 'apt-get install' those held back packages. You could also 'apt-get dist-upgrade', but just make sure it doesn't want to remove anything important.)

And no, there's no way to attach more than one attachment to a comment.

Lauri Siltala (latelol) wrote :

I have already tested it, and it doesn't work either.

Stenten (stenten) on 2010-10-04
tags: removed: needs-upstream-testing
Stenten (stenten) wrote :

Please attach dmesg and Xorg.0.log for the -22 kernel then please.

Lauri Siltala (latelol) wrote :
Lauri Siltala (latelol) wrote :

Leaving for today by the way, I'll submit more information if needed tomorrow.

Stenten (stenten) wrote :

Please also attach "lspci -vvnn > lspci.txt".

Stenten (stenten) on 2010-10-04
description: updated
summary: - Intel graphics drivers not working on new kernels on Maverick?
+ Kernel Oops - unable to handle kernel NULL pointer dereference; IP:
+ [<ffffffffa007e3c6>] ips_detect_cpu+0x76/0x1d0 [intel_ips]
Stenten (stenten) wrote :

Please also attach the output of 'uname -a'.

Lauri Siltala (latelol) wrote :

Which kernel do you want me to attach the output of? All of the ones I've tested?
Two other observations which I probably should've mentioned before, I'm not sure if they're related or not:

I also have this bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/642421
It doesn't actually seem to do anything to the others with it, but since it's a modprobe error and modprobe seems to be what's crashing..

When I run startx from a terminal on a broken kernel, all I get is a completely black screen. When trying to reboot using the magic sysrq key, after pressing sysrq + e I'm taken to a terminal for a brief moment, after which X suddenly starts up, complaining that it couldn't load the screne properly.

Stenten (stenten) wrote :

> Which kernel do you want me to attach the output of?

2.6.35-22.

Lauri Siltala (latelol) wrote :
Lauri Siltala (latelol) wrote :
Lauri Siltala (latelol) wrote :

Also, looking at the edited report, perhaps you didn't read one of my comments closely enough. I don't know whether the kernel 2.6.35-16 works because I never had it installed for some reason. I'd test it, but I did some looking around and I can't seem to find any packages for it. I found the source at https://launchpad.net/ubuntu/+source/linux/2.6.35-16.22 though, should I compile it myself or something?

Stenten (stenten) wrote :

In that case, it looks like this is probably the change that introduced the bug (from the 2.6.35-17 changelog):
 * [Config] Enable INTEL_IPS
    - LP: #601057

Could you compile the current Ubuntu kernel, disabling the INTEL_IPS module and testing that?

Lauri Siltala (latelol) wrote :

I guess blacklisting the module isn't enough, then?
(I've never compiled a kernel before, I'll do so later on if needed though)

Stenten (stenten) wrote :

Blacklisting it is fine too.

Lauri Siltala (latelol) wrote :

Well, the kernel 2.6.35-22 booted with no noticeable issues with intel_ips blacklisted.

Lauri Siltala (latelol) wrote :
Stenten (stenten) on 2010-10-06
description: updated
tags: added: amd64 kernel-needs-review regression-potential
removed: edgers
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Stenten (stenten) wrote :

Thank you for taking the time to narrow down the cause of this bug. I've marked the bug as ready to be viewed by a developer, so they'll continue the debugging process once they get a chance.

Lauri Siltala (latelol) wrote :

You're welcome. And thanks to you too for the help!

AceLan Kao (acelankao) wrote :

Lauri,

Does blacklist intel_ips make the system enter X successful?
It seems not work for me, but yes, no error msg in dmesg.

My kernel version is vmlinuz-2.6.35-22-generic

Stenten (stenten) wrote :

AceLan Kao,

If you're experiencing this bug too, please file a separate bug report by
typing "ubuntu-bug linux" into a terminal. There's a strict
one-report-per-person policy with kernel bugs. Thank you.

Lauri Siltala (latelol) wrote :

Indeed, I have X working flawlessly with intel_ips blacklisted.

AceLan Kao (acelankao) wrote :

Yes, it's working now. Sorry, I forgot to remove other rework on the kernel.

AceLan Kao (acelankao) on 2010-10-11
Changed in linux:
importance: Unknown → Undecided
status: Unknown → New
status: New → Invalid
AceLan Kao (acelankao) wrote :

Lauri,

I found the root cause and informed the owner of the driver.
Could you also help me to confirm if my modification also works for you?
Thanks.

http://people.canonical.com/~acelan/bugs/lp648631/

Lauri Siltala (latelol) wrote :

I tested the kernel of yours, and I still have the same oops with it.

AceLan Kao (acelankao) wrote :

Lauri,

Aha, I got it, your CPU is the newest one and not listed in the code.
I'll drop the author an email to see how to fix this problem.
So, before the patch comes out, please blacklisting intel_ips modules :)

AceLan Kao (acelankao) wrote :

Lauri,

In the same directory has the new packages, could help to test it again.
Thanks.

http://people.canonical.com/~acelan/bugs/lp648631/

AceLan Kao (acelankao) on 2010-10-13
Changed in linux (Ubuntu):
assignee: nobody → AceLan Kao (acelankao)
Lauri Siltala (latelol) wrote :

That package indeed seems to work now. Dmesg says that intel_ips doesn't support my CPU, but X works fine, so..

AceLan Kao (acelankao) wrote :

Great, that's the correct behavior. Non supported CPU will encounter the NULL pointer dereference OOPS.
The patch fixed the problem.
I'll send SRU to the kernel team list for review and hope this patch could be included soon.
Thanks for the bug report.

AceLan Kao (acelankao) on 2010-10-19
Changed in linux (Ubuntu):
status: Confirmed → In Progress
Steve Magoun (smagoun) wrote :

The kernel at http://people.canonical.com/~acelan/bugs/lp648631/ fixes a boot failure with the same symptoms ('No CPUID...' message in dmesg followed by an OOPS, non-functional X) on a Thinkpad Edge 13 w/ Intel Pentium U5400.

AceLan Kao (acelankao) wrote :

Steve, does this bug still exist?
I saw the patch already be applied on 2.6.35-22.35 security release.

Hi,

I still have this problem yesterday even after an update.
However, blacklisting the module solved the trouble.

On Sat, Nov 20, 2010 at 6:30 AM, AceLan Kao <email address hidden> wrote:
> Steve, does this bug still exist?
> I saw the patch already be applied on 2.6.35-22.35 security release.
>
> --
> Kernel Oops - unable to handle kernel NULL pointer dereference; IP: [<ffffffffa007e3c6>] ips_detect_cpu+0x76/0x1d0 [intel_ips]
> https://bugs.launchpad.net/bugs/648631
> You received this bug notification because you are a direct subscriber
> of a duplicate bug (654143).
>

--
Nicolas Palix
http://proton.inrialpes.fr/~npalix/

Steve Magoun (smagoun) wrote :

@AceLan: I can reproduce the bug in 2.6.35-22.35.

I can't reproduce in 2.6.35-23.40; the changelog indicates it may have been fixed by an upstream commit in 2.6.35-23.36 ("intel_ips: potential null dereference").

Jonas Norlander (jonorland) wrote :

It works for me on my Acer Aspire One 753 with the 2.6.35-23.40 generic kernel

AceLan Kao (acelankao) wrote :

Thank you guys, I'm going to close this bug.
Please feel free to re-open this bug, if you encounter this problem again.

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Chris Van Hoof (vanhoof) on 2011-02-18
tags: added: hwe-blocker
Chris Van Hoof (vanhoof) wrote :

Closing the remaining bug task as Fix Released in: 2.6.35-23.40

Changed in linux (Ubuntu Maverick):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.