dramatic kernel panic on Ubuntu 11.04 and derivatives- when rebooted.

Bug #784484 reported by Jakub Grzeszczuk
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned
Oneiric
Expired
Medium
Unassigned

Bug Description

Dear readers,

I am writing to You with a request for help. I have been an Ubuntu fan since 8.04, an been a happy user until 10.10 including it too.

I am experiencing a very hard to fight problem. Whenever I try to restart my PC (not turn it on from zero)- it gets a flashing scroll and caps lock (which I got to know means a kernel panic) on Ubuntu 11.04, Kubuntu 11.04, Lubuntu 11.04, Xubuntu 11.04 and even Linux Mint 11 which is an 11.04 cousin.

I have tried running a memtest- it gave a result of no errors after doing more than 6 passes in over 16 hours.

tried the modeset, nomodeset, acpi settings in grub configuration files- no results.

Whenever I power OFF the PC, then turn it ON again- all goes fine. But when on any of those distros I try to restart by pressing a restart option- I get this kernel panic.

Never happened on my x64 10.10 Ubuntu / Kubuntu combo.

Could please someone try to help me at least find the reason?

best regards, Jakub Grzeszczuk

P.S. Attachments:

boot.log: http://pastebin.com/feMv424L

dmesg: http://pastebin.com/XtMfE63J

dmesg.0: http://pastebin.com/GuXZCgkS

kern.log: http://pastebin.com/v8tmcCm4
---
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23.
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: kuba 1424 F.... pulseaudio
CRDA: Error: [Errno 2] Nie ma takiego pliku ani katalogu
Card0.Amixer.info:
 Card hw:0 'SB'/'HDA ATI SB at 0xfdef0000 irq 16'
   Mixer name : 'Realtek ALC1200'
   Components : 'HDA:10ec0888,14627210,00100101 HDA:10573055,00305557,00100900'
   Controls : 31
   Simple ctrls : 18
Card1.Amixer.info:
 Card hw:1 'HDMI'/'HDA ATI HDMI at 0xfdfec000 irq 48'
   Mixer name : 'ATI R6xx HDMI'
   Components : 'HDA:1002aa01,00aa0100,00100000'
   Controls : 4
   Simple ctrls : 1
Card1.Amixer.values:
 Simple mixer control 'IEC958',0
   Capabilities: pswitch pswitch-joined penum
   Playback channels: Mono
   Mono: Playback [on]
CurrentDmesg:
 [ 34.838806] hda-intel: IRQ timing workaround is activated for card #1. Suggest a bigger bdl_pos_adj.
 [ 35.960329] EXT4-fs (sda7): re-mounted. Opts: errors=remount-ro,commit=0
 [ 58.218689] exe (1700): /proc/1700/oom_adj is deprecated, please use /proc/1700/oom_score_adj instead.
 [ 68.416212] PPP BSD Compression module registered
 [ 68.437050] PPP Deflate Compression module registered
DistroRelease: Ubuntu 11.04
HibernationDevice: RESUME=UUID=87896b4c-d9b8-478f-9a7f-e0b7eaf76e29
InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Release amd64 (20110427.1)
MachineType: Micro-Star International GT735
Package: linux (not installed)
ProcEnviron:
 LANGUAGE=pl_PL:en
 PATH=(custom, no user)
 LANG=pl_PL.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.38-8-generic root=UUID=8b039581-58ad-485b-9f30-4f2f78f775f9 ro
ProcVersionSignature: Ubuntu 2.6.38-8.42-generic 2.6.38.2
RelatedPackageVersions:
 linux-restricted-modules-2.6.38-8-generic N/A
 linux-backports-modules-2.6.38-8-generic N/A
 linux-firmware 1.52
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: yes
StagingDrivers: rt2860sta
Tags: natty staging
Uname: Linux 2.6.38-8-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

dmi.bios.date: 11/25/2009
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: A1721AMS V1.0V
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: MS-1721
dmi.board.vendor: MSI
dmi.board.version: Ver 1.000
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 10
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrA1721AMSV1.0V:bd11/25/2009:svnMicro-StarInternational:pnGT735:pvrVer1.000:rvnMSI:rnMS-1721:rvrVer1.000:cvnToBeFilledByO.E.M.:ct10:cvrToBeFilledByO.E.M.:
dmi.product.name: GT735
dmi.product.version: Ver 1.000
dmi.sys.vendor: Micro-Star International
---
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23.
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: kuba 1424 F.... pulseaudio
CRDA: Error: [Errno 2] Nie ma takiego pliku ani katalogu
Card0.Amixer.info:
 Card hw:0 'SB'/'HDA ATI SB at 0xfdef0000 irq 16'
   Mixer name : 'Realtek ALC1200'
   Components : 'HDA:10ec0888,14627210,00100101 HDA:10573055,00305557,00100900'
   Controls : 31
   Simple ctrls : 18
Card1.Amixer.info:
 Card hw:1 'HDMI'/'HDA ATI HDMI at 0xfdfec000 irq 48'
   Mixer name : 'ATI R6xx HDMI'
   Components : 'HDA:1002aa01,00aa0100,00100000'
   Controls : 4
   Simple ctrls : 1
Card1.Amixer.values:
 Simple mixer control 'IEC958',0
   Capabilities: pswitch pswitch-joined penum
   Playback channels: Mono
   Mono: Playback [on]
CurrentDmesg:
 [ 34.838806] hda-intel: IRQ timing workaround is activated for card #1. Suggest a bigger bdl_pos_adj.
 [ 35.960329] EXT4-fs (sda7): re-mounted. Opts: errors=remount-ro,commit=0
 [ 58.218689] exe (1700): /proc/1700/oom_adj is deprecated, please use /proc/1700/oom_score_adj instead.
 [ 68.416212] PPP BSD Compression module registered
 [ 68.437050] PPP Deflate Compression module registered
DistroRelease: Ubuntu 11.04
HibernationDevice: RESUME=UUID=87896b4c-d9b8-478f-9a7f-e0b7eaf76e29
InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Release amd64 (20110427.1)
MachineType: Micro-Star International GT735
Package: linux (not installed)
ProcEnviron:
 LANGUAGE=pl_PL:en
 PATH=(custom, no user)
 LANG=pl_PL.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.38-8-generic root=UUID=8b039581-58ad-485b-9f30-4f2f78f775f9 ro
ProcVersionSignature: Ubuntu 2.6.38-8.42-generic 2.6.38.2
RelatedPackageVersions:
 linux-restricted-modules-2.6.38-8-generic N/A
 linux-backports-modules-2.6.38-8-generic N/A
 linux-firmware 1.52
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: yes
StagingDrivers: rt2860sta
Tags: natty staging
Uname: Linux 2.6.38-8-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

dmi.bios.date: 11/25/2009
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: A1721AMS V1.0V
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: MS-1721
dmi.board.vendor: MSI
dmi.board.version: Ver 1.000
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 10
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrA1721AMSV1.0V:bd11/25/2009:svnMicro-StarInternational:pnGT735:pvrVer1.000:rvnMSI:rnMS-1721:rvrVer1.000:cvnToBeFilledByO.E.M.:ct10:cvrToBeFilledByO.E.M.:
dmi.product.name: GT735
dmi.product.version: Ver 1.000
dmi.sys.vendor: Micro-Star International

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

Also adding a picture of a message on boot screen when the panic occurs:

http://img689.imageshack.us/i/dsc0044yx.jpg/

Hope we can solve it together :)

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

Added another test.

Chcecked if the error appears on the mainline version of kernel too- it does.

Tested kernel was 2.39-999 from 16th may daily build (64 bit version).

Revision history for this message
Daniel Manrique (roadmr) wrote :

Hi Jakub,

Thank you for taking the time to report this bug and helping to make Ubuntu better.

Also thanks for testing the mainline kernel, it's a good first step, and since you mention your system worked OK with Maverick (kernel 2.6.35), it's probably something that changed between that kernel and the one shipped with Natty (2.6.38).

Still, it would be useful to have some more information about your system. To that end, I'd like to ask you to please execute the following command, as it will automatically gather debugging information, in a terminal:

apport-collect -p linux 784484

Please do this running the 2.6.38 kernel that the 11.04 system uses by default.

Thanks!

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : AcpiTables.txt

apport information

tags: added: apport-collected natty staging
description: updated
Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : AlsaDevices.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : AplayDevices.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : ArecordDevices.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : BootDmesg.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : Card0.Amixer.values.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : Card0.Codecs.codec.0.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : Card0.Codecs.codec.1.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : Card1.Codecs.codec.0.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : IwConfig.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : Lspci.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : Lsusb.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : PciMultimedia.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : ProcCpuinfo_.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : ProcModules.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : UdevDb.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : UdevLog.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : WifiSyslog.txt

apport information

description: updated
Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : AcpiTables.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : AlsaDevices.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : AplayDevices.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : ArecordDevices.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : BootDmesg.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : Card0.Amixer.values.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : Card0.Codecs.codec.0.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : Card0.Codecs.codec.1.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : Card1.Codecs.codec.0.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : IwConfig.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : Lspci.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : Lsusb.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : PciMultimedia.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : ProcCpuinfo_.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : ProcModules.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : UdevDb.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : UdevLog.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote : WifiSyslog.txt

apport information

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

Did as You asked me to do, I firstly want to thank you for a response- I am holding huge hopes towards us solving that problem. :)

Cheers, Jakub

Revision history for this message
Charlie Kravetz (cjkgeek) wrote :

Thanks for reporting this bug and any supporting documentation. Since this bug has enough information provided for a developer to begin work, I'm going to mark it as confirmed and let them handle it from here. Thanks for taking the time to make Ubuntu better!

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Triaged
tags: added: regression-release
Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

So, just so I can know... Can I be counting on it, that soon I could be using 11.04 with it's full potential, OR... I can more expect the bug to be fixed in 11.10?

Hope to get an answer soon, Kuba

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

Hello? I would really love to know just what is going on...

Also, if it does let You know about anything- the same error appears on the new Fedora 15 (didn't show up on F14)- I hope it can lead us somewhere...

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

Please, really... It is very important that I could use the system I love to it's full potential. Can't anyone help me? :(

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

Also- suspend and hibernate options work properly.

Revision history for this message
Daniel Manrique (roadmr) wrote :

Hi Jakub,

We are sorry that we do not always have the capacity to work on all triaged bugs as soon as we would like. This bug is in triaged status, this means that it stands the best chance of being looked at by developers, however I am unable to provide an estimate of when this might happen.

For now, I can suggest a possible workaround that involves using the previous, known-good Maverick kernel (2.6.35).

For this, you should create a file /etc/apt/sources.list.d/maverick.list containing these two lines:

deb http://us.archive.ubuntu.com/ubuntu/ maverick main restricted
deb http://us.archive.ubuntu.com/ubuntu/ maverick-updates main restricted

(you might want to substitute "us" for the identifier of a mirror closer to you).

Then run these commands:

sudo apt-get update
sudo apt-get install linux-image-2.6.35-28-generic

When your system is booting, press the left shift key to get the grub menu and select the kernel you want to boot. Please test that everything works as expected, then you could potentially configure the system to boot the Maverick kernel by default, however I'd appreciate if you could test new Natty kernels as they come out, and let me know if one of them solves your problem.

I'll continue to keep an eye on this bug's progress, hopefully we'll get it looked at soon. In the meanwhile I hope the proposed workaround helps with your system's behavior.

Thanks!

---
Ubuntu Bug Squad volunteer triager
http://wiki.ubuntu.com/BugSquad

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

First of all- a BIG thank You!

You see- this is all I needed- someone to tell me what "triaged" means (I am not a native english speaker), and second of all- just give me a hint what is happening with my report right now ;)

I am more than happy to know, that the priority is that high actually.

More to tell You- tested the newest kernel from natty-proposed repository and the freshest mainline kernel- the bug still existed (just giving U info, if it helps).

Cheers and once more I am sooo glad to just hear from You (I got sort of left down when my question on that bug turned to "inactive").

Thank You once more,

greetings from Poland,

Kuba.

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

Dear Daniel,

the older Maverick Kernel does the trick- most of all works, the BUG above is fixed- Ubuntu restarts normally, BUT...

what lacks is wireless- I can see wireless networks, but I cannot connect with them (on 2.6.38-10 I can connect normally). however mobile internet connecion works just fine- using it right now.

Any suggestions from You what could fix my wi-fi connectivity on the older kernel?

Cheers, Kuba.

Revision history for this message
Brendan McLearie (bren-internode) wrote :

Also experiencing similar here. Asus Z7S dual Quad Xeons. Server kernel 35-28 no problem at all. upgrade to 38-8 with Ubuntu 11.04 and am in disaster zone.

From some time back I had to add clock=acpi_pm to stabalise a previous stability problem. Have tried every variation of this with 11.04 to no avail.

Have removed all sensor drivers and chipset modules.

Have enabled pre-release repos and installed kernel 38-10 still getting panics.

Now back on 35-28, stable.

Any help appreciated. Happy to assist any testing.

Have been browsing lots of forums and there seems to be a common thread on these panics around ACPI and particualrly ASUS - perhaps some of their erlier implementations had some broken stuff thats not being identified until this newer kernel?

Rehardless, this is definitiely a change in this kernel causing grief. And its not fixed in 38-10.

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

Dear Brendan,

on a downside I regret more people have such an issue- but on the other- the more people have it, the faster it should be solved. Tell me- are U on 35-28 on 11.04? if yes, is Your wifi working properly?

The problem (if U have the same as me) has nothing to do with any settings that U can change in grub, or in any boot cfg- it has to be a 100% fault of a kernel. it appears in ANY linux kernel on ALL distros- since I've experienced the bug on 11.04 with 38-8, I've tried testing repos on Ubuntu, and all other distros that use the new kernel- OpenSuse, Fedora, all Ubuntu derivatives (Kubuntu, Xubuntu, Lubuntu, Linux Mint)- it appears everywhere. if U would like to assist, U could check if such an error appears on Your setup with for example Fedora 15- if yes, I think we should somehow push the problem somewhere higher, to mainline linux kernel developers...

Greets, Jakub

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

P.S. It's not even fixed in .39 my friend. (not even mentioning .38 :P)

Revision history for this message
Brendan McLearie (bren-internode) wrote :

OK will try fedora if I get a chance over the weekend. Sorry I dont have an answer on your wifi side. My machine is a server class running multiple VMs. Connection is via GB LAN and I've overridden all the network autoconfig stuff to force a static IP - its the DHCP server for my internal network too.

Completely agree that we need to get this escalated.

IMO all the panics we are seeing in related threads could be the same cause. Depending on which kernel version I'm using it either panics during boot or shortly afterwards. I think there is a thermal / ACPI component to the behavior though this is only a gut feel. Have poured over log files and there is nothing deterministric about the panics on my system other than perhaps general CPU load.

38-8 gives me various [Hardware Error] during early kernel load. 38-10 doesnt but still panics.

Agree with you that this seems like a kernel error (or at least closely releated eg module). I think many people are chasing their tails based on some sort of threshold load pattern that has theirs panicing at a repeatable point thereby giving a false indication that the driver/code etc at that point is the problem. Whereas it is likley to be some ACPI / timing issue that happens to trigger at that load level. (does this make sense?)

If there are some of those dedicated developers reading this: I'm happy to take instruction in providing more information.

Cheers
Brendan

Revision history for this message
Daniel Manrique (roadmr) wrote :

@Brendan:

I'd advise you to file a different bug for your problem. Even though they look similar, if the hardware is different they are likely to be more useful to the kernel team if you file a new report. To do so, use this command:

ubuntu-bug linux

@Jakub:

Your wireless isn't working possibly because the module required to do so isn't present for that version of the kernel on an 11.04 installation.

The question here would be "how to get wireless support on a 2.6.35 kernel under Natty", and you stand a higher chance of having someone answer that if you put that in answers.launchpad.net or askubuntu.com. If you do so, feel free to refer them to this bug for an explanation of why you need that.

I'd tell you myself how to get this working but I really don't know :) sorry... hopefully someone else will have an answer, and hopefully also, the actual problem with the 2.6.38 kernel will get looked at soon.

Thanks for your patience!

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

@Daniel:

just to clear my thoughts- is there a chance that the bug will be solved before the release of 11.10?

Cheers, Kuba

Revision history for this message
Eliah Kagan (degeneracypressure) wrote :

@Jakub Grzeszczuk
Does this bug also occur on an Oneiric daily-live (http://cdimage.ubuntu.com/daily-live/current/)?

Revision history for this message
Brendan McLearie (bren-internode) wrote :

Thanks Daniel. Have done so. Cheers.

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

Dear @Eliah,

I haven't tried 11.10 yet... I thought it wouldn't be the best idea to check it out on an Alpha stage, because it might be panicing just because of it's own instability, therefore I wouldn't be able to determine if it's the same bug or not...

But as soon as I get to a fast internet connection I will surely download OO 11.10 and see how it goes.

Cheers, Kuba

Revision history for this message
Eliah Kagan (degeneracypressure) wrote :

It's been a while since Alpha 1 came out, so most current daily-lives are probably more stable than it was. But if you want to wait until the next alpha is released, Alpha 2 is expected to be released on July 7th (https://launchpad.net/ubuntu/+milestone/oneiric-alpha-2).

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

Dear Bug Thread followers,

I am very sorry to bring such bad news, but the bug exists even in Oneiric Daily builds with kernel 3.0.6 (or 3.0.0-6 [can't remember]), so even the new version of linux kernel didn't change a thing :(

I hope the problem gets looked at soon- it's a pity, that in 9 months I will loose support for my linux- still rolling on 10.10- when updates stop, I'll shamefully have to switch to windows if the kernel panics on restart don't get fixed- and that would make me very, very sad :(

Regards, Kuba

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

Hello?

As we can see, October is slowly coming closer, and another release of 11.10 Alpha landed- with the above described BUG still existent...

Does it mean that I should mentally start preparing myself for the fact, that the newest Ubuntu will, same as 11.04, not be fully operational on my laptop?

Yours sincerely,

Kuba

Revision history for this message
ian_hawdon (ian-hawdon) wrote :

I too request an update on this issue, though for me, the bug can happen on cold boot too, but it seems to be the same error message. (That is, if the ATI drivers haven't completely blocked the text from showing on boot!)

But exists on 11.10 Alpha 3 too, even when booting from the live CD.

Does not happen on 10.10

Revision history for this message
ian_hawdon (ian-hawdon) wrote :

Just to add, I have no idea how to log this problem, as when the Kernel panics it seems to make the file system read only, and most of the message has scrolled off the top of the screen when the crash occurs

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

Dear @ian-hawdon,

What machine do U have?

Regards, J.G.

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

Don't know if any developers are following this thread anymore, I honestly hope someone does... just to bring up new info- the bug is exactly same on newest Ubuntu 11.10 Beta 1 released in September, with the newest 3.0.010-generic kernel on amd64 machine. Whenever restarted- kernel panic. When turned off and on again- all goes fine.

I really, really hope I would get to use Ubuntu 11.10 (I already skipped the whole 11.04 fun :/).

Best regards, Jakub.

Revision history for this message
Eliah Kagan (degeneracypressure) wrote :

Perhaps this can get confirmed and nominated for Oneiric. Is anyone else running Oneiric (11.10) and experiencing this bug?

Triagers/developers: For this to be ready to work on in Oneiric, is any other information required? Should an Oneiric user experiencing it submit a duplicate, so that there is conveniently available separate Apport-attached info?

tags: added: oneiric
Revision history for this message
Daniel Manrique (roadmr) wrote :

Hi folks,

Just an update, looks like this is confirmed for Oneiric too, I'll see if it's possible to add a task for it so it gets more closely tracked.

At this point we're just waiting for a developer to take a look at it (I just helped triage it, I'm not really someone who could get it fixed). Since it looks to be really hardware-specific though, it'll be difficult to track down without having someone with the actual hardware to look at things.

Sorry I don't have anything more definite at the moment. I'll see if, other than inquiring about an Oneiric task, there's something else to be done.

Thanks!

Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

I would be really, really thankful if it was possible to make it somehow work... I'd hate to simply have to leave Ubuntu of such inconvinience. For now I am going to stay on Oneiric and keep updating with proposed updates, but don't know if that can help.

Please, if there is anything I can do more to provide more information, just instruct me how I could do it! :)

Cheers from Germany, Kuba.

Revision history for this message
Daniel Manrique (roadmr) wrote :

A quick update on this bug.

Jakub, the original reporter, was able to determine, by testing mainline kernels, that the code causing this regression was introduced at some point between 2.6.38-rc1 (works fine) and 2.6.38-rc2 (crashes on reboot).

He is now testing kernels produced through git bisect, in order to isolate the problematic commit. We'll proceed from there once the troublesome code has been located. I guess at that point we could make a test kernel available for others affected by this problem, depending on how hardware-specific it turns out to be (i.e. if it's traced to a particular component and you don't have that exact same component, the test kernel will probably still not work).

Thanks!

---
Ubuntu Bug Squad volunteer triager
http://wiki.ubuntu.com/BugSquad

Revision history for this message
Daniel Manrique (roadmr) wrote :

It's been about 2 weeks, so here's another update on progress on this bug.

Jakub has been testing bisected kernels, we should need to test about 5 more kernels (not precise, you know how git bisect is). I expect we should be able to provide something more concrete in a few days.

Thanks for your help and patience!

Revision history for this message
Daniel Manrique (roadmr) wrote :

Here are the results for the bisection process Jakub conducted.

Here's the bisect log done on the mainline kernel tree:

git bisect start
# bad: [1bae4ce27c9c90344f23c65ea6966c50ffeae2f5] Linux 2.6.38-rc2
git bisect bad 1bae4ce27c9c90344f23c65ea6966c50ffeae2f5
# good: [c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470] Linux 2.6.38-rc1
git bisect good c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470
# good: [c745552a82cbf3a82adea5210212ed31ec03388d] Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6
git bisect good c745552a82cbf3a82adea5210212ed31ec03388d
# bad: [01c40c048b0f3f377e6d27b35fd99f04efcc21dd] [media] v4l: Include linux/videodev2.h in media/v4l2-ctrls.h
git bisect bad 01c40c048b0f3f377e6d27b35fd99f04efcc21dd
# good: [6183040680c56ec4bd3d7c9398cbc05e84d60c1f] [media] saa7134: Fix analog mode for Kworld SBTVD
git bisect good 6183040680c56ec4bd3d7c9398cbc05e84d60c1f
# bad: [76f1ef427c0aab3d3c917b497562ea2cdaaae056] [media] rc/imon: default to key mode instead of mouse mode
git bisect bad 76f1ef427c0aab3d3c917b497562ea2cdaaae056
# good: [14b67c2969ebf50bd5534b2a0c441f8569a9361e] [media] gspca - ov534: Propagate errors to higher level
git bisect good 14b67c2969ebf50bd5534b2a0c441f8569a9361e
# good: [34b8fc8e683cbcbbe47806260ef5dc505915b45f] [media] V4L2: WL1273 FM Radio: Replace ioctl with unlocked_ioctl
git bisect good 34b8fc8e683cbcbbe47806260ef5dc505915b45f
# bad: [2e4c55626a0c30b5b2bc9469c025a563a81c3785] [media] rc/ene_ir: fix oops on module load
git bisect bad 2e4c55626a0c30b5b2bc9469c025a563a81c3785
# good: [7d2edfc23e9852591cb031a26093cdcd07a34a90] [media] rc/imon: fix ffdc device detection oops
git bisect good 7d2edfc23e9852591cb031a26093cdcd07a34a90

And the resulting, first bad commit as identified by git:

2e4c55626a0c30b5b2bc9469c025a563a81c3785 is the first bad commit
commit 2e4c55626a0c30b5b2bc9469c025a563a81c3785
Author: Kyle McMartin <email address hidden>
Date: Thu Jan 6 16:59:33 2011 -0300

    [media] rc/ene_ir: fix oops on module load

    dev->rdev is accessed in ene_setup_hw_settings, so it needs to be wired
    up before then.

    [Jarod Wilson]: Also fix a possible improper resource freeing bug while
    we're looking at possible probe issues here.

    Signed-off-by: Kyle McMartin <email address hidden>
    CC: Maxim Levitsky <email address hidden>
    Signed-off-by: Jarod Wilson <email address hidden>
    Signed-off-by: Mauro Carvalho Chehab <email address hidden>

:040000 040000 17fd4e10f1702cacbd5e82fd16e970bc7ed572d2 0cb5b4d88b70c470570c801259d1007495b509bd M drivers

The fact that the code is in the apparently little-used ene_ir module is a bit disconcerting. However, I'm attaching Jakub's lsmod output, showing that he indeed has that module loaded, meaning that it's not entirely impossible that some change to this code is causing the problem.

---
Ubuntu Bug Squad volunteer triager
http://wiki.ubuntu.com/BugSquad

Revision history for this message
Daniel Manrique (roadmr) wrote :

Actually the list of modules Jakub is running is visible in his procModules.txt attachment here:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/784484/+attachment/2134277/+files/ProcModules.txt

Jakub, the ene_ir module is a handler for infrared receivers, so if you're not using your systems' infrared capabilities, one workaround for this problem is to blacklist the ene_ir module:

- Open a terminal
- type:
sudo pico /etc/modprobe.d/blacklist.conf
- At the very end of that file, add:
blacklist ene_ir
- Save the file by pressing ctrl-x
- Reboot the system to apply these changes.

After having done this, the problem should no longer appear, this would confirm the bisection's findings that the faulty code is in that module.

In a short while I'll be adding links to two test kernels, one compiled with the faulty code and one without, just to test that the code does indeed trigger this problem. Of course ,for the test to be valid, you'll need to remove the blacklist line we added previously, so the troublesome module gets loaded and we can observe (or not) the bad behavior.

Thanks so much for your patience and thorough testing through this long process!

---
Ubuntu Bug Squad volunteer triager
http://wiki.ubuntu.com/BugSquad

Revision history for this message
Daniel Manrique (roadmr) wrote :

Just to make 150% sure that the code pinpointed by the git bisect process is at fault, here are two test kernels. One is marked "good", the other "bad". The good one should obviously reboot OK, while the bad one is expected to crash on reboot as reported in this bug. The kernels are based on the 2.6.38 final mainline release.

http://people.canonical.com/~roadmr/lp784484/

To test these, please make sure that you're loading the ene_ir module, meaning that the workaround I propose in the earlier comment is *not* implemented.

One other way to check is to do:

sudo lsmod | grep ene_ir

This should output something if the module is loaded, or nothing if the module is not in use.

Please report back on whether these two kernels behave as predicted; if so, we will have identified the faulty code and we could potentially file an upstream bug or request the help of someone more knowledgeable about kernel and drivers.

As usual, please do this at your leisure, I don't want to impose on your time.

Thanks again!

---
Ubuntu Bug Squad volunteer triager
http://wiki.ubuntu.com/BugSquad

Changed in linux (Ubuntu Oneiric):
status: Triaged → Incomplete
Revision history for this message
Jakub Grzeszczuk (jakub-grzeszczuk) wrote :

Would just like to let the thread followers know that adding:

blacklist ene_ir

to /etc/modprobe.d/blacklist.conf does the trick- the bug is no longer present (the kernel panics) on restarts.

Therefore now we just have to wait when a fix to the code itself will be applied- if not- I still have my individual blacklist fix! :)

P.S. Checked my hardware FULL specs in booklet I dug out from the attic- my motherboard (or actually the whole setup) does NOT have an IR piece at all. Just to mention as a fun fact ;)

Cheers, Kuba

Revision history for this message
Brendan McLearie (bren-internode) wrote :

@Daniel many thanks for working this through. Very much appreciated. I havent had a moment to try it yet but will do so by the weekend and will also post my results here.

I will also cross reference the other MCE bug that I filed and get those guys to have a look.

Cheers
Brendan

Revision history for this message
Brendan McLearie (bren-internode) wrote :

Hi Daniel

 I have now tried your suggested workaround unfortunatley to no avail.

The system still crashes.

I cant confirm the test that the system is loading the driver as it doesnt stay stable long enough to check.

However, just to doubly make sure I went to the exent of removing the /lib/modules/server....38-11/..../rc directory entirely (which contains the ene_ir driver and others) then booted with the 38-11 kernel.

I receieved no boot errors which indicates that the module wasnt trying to load anyway (prob from the blacklist entry), however I didnt event get to login before it crashed and rebooted.

This is on the ASUS Z7S board which is a dual quad xeon - and a damn shame that its usefulness is diminishing.

I note that a fellow owner on my associated bug also failed the fix.

Any other suggestions?

I'm happy to spend some proper time testing under your instruction over the xmas period.

many thanbks
Brendan

Revision history for this message
Daniel Manrique (roadmr) wrote :

Hi Brendan,

It looks like the problem we discovered with the ene_ir module was very specific to Jakub's motherboard, yours may have a different problem.

Thus I'd like to ask you to file a new bug report, if possible using the latest release of Ubuntu (11.10). You can file the report using a Live CD so you don't have to reinstall your system. To do so, boot the LiveCD environment and (assuming it boots) run, on a terminal:

ubuntu-bug linux

Once you do, you can subscribe me (click on "subscribe someone else") so I can follow it up.

Thanks!

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu Oneiric) because there has been no activity for 60 days.]

Changed in linux (Ubuntu Oneiric):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.