BUG: unable to handle kernel paging request at ffff87ffb88dbf18

Bug #1546985 reported by Po-Hsu Lin
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Unassigned
Precise
Invalid
Medium
Unassigned

Bug Description

CID: 201011-6701

Potential kernel regression found on this system while testing with 3.2.0-99-generic #139
It will hang after the suspend/gpu_lockup_after_suspend test case

Step:
1. Install the system with satellite provisioning, with --proposed and the sru toolset
2. Let it run for the SRU tests, or reproduce it manually with "plainbox run -i 'suspend/gpu_lockup_after_suspend'"

The error message will pop up
 BUG: unable to handle kernel paging request at ffff87ffb88dbf18
(This can be found in kern.log)

Another type of error message is:
 BUG: unable to handle kernel NULL pointer dereference at 0000000000000033
(This can't be found in kern.log)

Please see the attachment for screenshots

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-99-generic 3.2.0-99.139
ProcVersionSignature: Ubuntu 3.2.0-99.139-generic 3.2.76
Uname: Linux 3.2.0-99-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
ApportVersion: 2.0.1-0ubuntu17.14
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: ubuntu 1767 F.... pulseaudio
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xf0420000 irq 44'
   Mixer name : 'Intel IbexPeak HDMI'
   Components : 'HDA:10ec0662,103c304b,00100101 HDA:80862804,103c304b,00100000'
   Controls : 30
   Simple ctrls : 13
CurrentDmesg:
 [ 25.233981] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
 [ 35.533182] eth0: no IPv6 routers present
Date: Thu Feb 18 06:18:23 2016
HibernationDevice: RESUME=UUID=e3ead356-ab02-4971-ad9c-7eba91da86e7
InstallationMedia: Ubuntu 12.04.1 LTS "Precise Pangolin" - Release amd64 (20120823.1)
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
MarkForUpload: True
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-99-generic root=UUID=f52105b5-6ded-4d73-95ce-582d2bb3d82c ro quiet splash initcall_debug vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-99-generic N/A
 linux-backports-modules-3.2.0-99-generic N/A
 linux-firmware 1.79.18
RfKill:

SourcePackage: linux
StagingDrivers: mei
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 10/19/2010
dmi.bios.vendor: Hewlett-Packard
dmi.bios.version: 786H1 v01.07
dmi.board.name: 304Bh
dmi.board.vendor: Hewlett-Packard
dmi.chassis.type: 6
dmi.chassis.vendor: Hewlett-Packard
dmi.modalias: dmi:bvnHewlett-Packard:bvr786H1v01.07:bd10/19/2010:svnHewlett-Packard:pn:pvr:rvnHewlett-Packard:rn304Bh:rvr:cvnHewlett-Packard:ct6:cvr:
dmi.sys.vendor: Hewlett-Packard

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Luis Henriques (henrix) wrote :

The bug present in the kern.log seems to belong to an ancient kernel (3.2.0-29.46), so I will ignore that for now (as it was probably fixed in earlier kernels. The NULL pointer in the photo doesn't seem to contain a lot of info, but I'll have look.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Verified again with this system re-installed, this issue still 100% reproducible on 3.2.0-99.

Steps:
1. Re-install the system with --proposed.
2. Run the gpu_lockup_after_suspend on it.

Result:
* For the very beginning, when it crashes on the gpu_lockup_after_suspend job, it will reboot itself automatically.
And I finally see this error message:
Unable to handle kernel paging request.

When this happens
* Caps Lock LED was not blinking.
* the system seems frozen, it does not react to NumLock / CapsLock / VT switching and ssh connection.
* system can still take the magic key to reboot (Ctrl + Alt + PrtScn + b)
(I think the kernel is still alive, so the error can be logged)

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Also, with restless reboot and retest, I luckily capture the "NULL pointer deference" issue on 3.2.0-99

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

When this "NULL pointer deference" issue happens, the system still can be reached via SSH.
Please see the log file for dmesg output

Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu Precise):
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Luis Henriques (henrix) wrote :

After looking at the several logs, they seem to be a bit inconsistent and make it difficult to point to a commit in this specific release. Also, I've failed to reproduce the issue locally.

Would it be possible for you to test the previous kernel (3.2.0-98.138) and double check that the issue does not occur there? If it doesn't, I guess we'll need to start a bisect.

penalvch (penalvch)
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Hi,
This crash-and-auto-restart issue can be reproduced on 3.2.0-98, 3.2.0-97 and 3.2.0-96
I am trying to test it with an older kernel.

Also, this issue only happen to this specific system, other SRU systems are not affected
And one note worthy thing is that the VGA video output on this system is not very stable sometimes, the whole screen will flicker even it's in the BIOS menu.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

I found it could be reproduced by just open the html5 video test page with Firefox, and compare two test results between this and the last cycle (3.2.0-98), the Firefox package is different: 44.0.2+build1-0ubuntu0.12.04.1 vs. 43.0.4+build3-0ubuntu0.12.04.1

And it will crash with Chromium as well.

Will need to test with Chrome tomorrow.

Do you think if it possible that this is caused by the hardware failure?

Revision history for this message
penalvch (penalvch) wrote :

Po-Hsu Lin, in order to allow additional upstream developers to examine the issue, at your earliest convenience, could you please test the latest upstream kernel available from http://kernel.ubuntu.com/~kernel-ppa/mainline/?C=N;O=D ? Please keep in mind the following:
1) The one to test is at the very top line at the top of the page (not the daily folder).
2) The release names are irrelevant.
3) The folder time stamps aren't indicative of when the kernel actually was released upstream.
4) Install instructions are available at https://wiki.ubuntu.com/Kernel/MainlineBuilds .

If testing on your main install would be inconvenient, one may:
1) Install Ubuntu to a different partition and then test this there.
2) Backup, or clone the primary install.

If the latest kernel did not allow you to test to the issue (ex. you couldn't boot into the OS) please make a comment in your report about this, and continue to test the next most recent kernel version until you can test to the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this issue is fixed in the mainline kernel, please add the following tags by clicking on the yellow circle with a black pencil icon, next to the word Tags, located at the bottom of the report description:
kernel-fixed-upstream
kernel-fixed-upstream-X.Y-rcZ

Where X, and Y are the first two numbers of the kernel version, and Z is the release candidate number if it exists.

If the mainline kernel does not fix the issue, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-X.Y-rcZ

Please note, an error to install the kernel does not fit the criteria of kernel-bug-exists-upstream.

Once testing of the latest upstream kernel is complete, please mark this report's Status as Confirmed. Please let us know your results.

Thank you for your understanding.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

This crash and auto reboot issue can be reproduced on 3.2.0-51 and 3.2.0-40 with Firefox (44.0.2) as well. Which is very strange, as we already have this system in our test pool for a long time.

Note that it's not guaranteed to crash with the html5 video in Firefox every time, but it will definitely crash with the gpu_lockup_after_suspend test.

Also, it will crash with Chrome on 3.2.0-99 too.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

I will close this bug as the hardware is not in a stable state, sometime the display will flicker from the very beginning (even on the BIOS screen), which make it hard to determine it's a HW/SW issue.
Thanks.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
Changed in linux (Ubuntu Precise):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.