1002:9480 IP: [<ffffffffa01b6c05>] r600_pcie_gart_tlb_flush+0xf5/0x110 [radeon]; RIP: 0010:[<ffffffffa01b6c05>] [<ffffffffa01b6c05>] r600_pcie_gart_tlb_flush+0xf5/0x110 [radeon]

Bug #1170917 reported by Matthieu Baerts
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Linux
Confirmed
Medium
linux (Ubuntu)
Incomplete
Medium
Unassigned

Bug Description

With using Kernel 3.8 and newer sometimes I have a kernel OOPS at startup and it seems it's because I switch off my ATI card. I added this line in my /etc/rc.local file:
echo OFF > /sys/kernel/debug/vgaswitcheroo/switch

I don't have this bug each time but I guess it depends if this previous command is launched before or after that X11's launch. I didn't have this bug with Precise and Quantal. It seems that this bug has already reported to there:
https://bugzilla.kernel.org/show_bug.cgi?id=49531
https://bugs.freedesktop.org/show_bug.cgi?id=61529

WORKAROUND for kernel versions < 3.12: Note that I no longer have this crash if I remove this line from /etc/init/lightdm.conf :
    echo OFF > /sys/kernel/debug/vgaswitcheroo/switch

With the Kernel 3.12, we always have this crash even if we disable this card before launching X11.

ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: linux-image-3.8.0-19-generic 3.8.0-19.29
ProcVersionSignature: Ubuntu 3.8.0-19.29-generic 3.8.8
Uname: Linux 3.8.0-19-generic x86_64
ApportVersion: 2.9.2-0ubuntu8
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: mbaerts 2074 F.... pulseaudio
 /dev/snd/pcmC0D0p: mbaerts 2074 F...m pulseaudio
Date: Sat Apr 20 11:41:12 2013
HibernationDevice: RESUME=UUID=06229bec-927d-4cfa-9a82-f9949d5151ca
InstallationDate: Installed on 2011-08-10 (618 days ago)
InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Alpha amd64 (20110803.1)
Lsusb:
 Bus 001 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
 Bus 002 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 003: ID 064e:a130 Suyin Corp.
MachineType: Medion P662X
MarkForUpload: True
ProcFB:
 0 inteldrmfb
 1 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.8.0-19-generic root=UUID=e7e166e5-3bef-4daf-88a1-5368a679c4a0 ro profile
RelatedPackageVersions:
 linux-restricted-modules-3.8.0-19-generic N/A
 linux-backports-modules-3.8.0-19-generic N/A
 linux-firmware 1.106
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/13/2010
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 202
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: H36QM
dmi.board.vendor: To be filled by O.E.M.
dmi.board.version: 1.0
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 10
dmi.chassis.vendor: PEGATRON CORPORATION
dmi.chassis.version: 1.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr202:bd04/13/2010:svnMedion:pnP662X:pvr1.0:rvnTobefilledbyO.E.M.:rnH36QM:rvr1.0:cvnPEGATRONCORPORATION:ct10:cvr1.0:
dmi.product.name: P662X
dmi.product.version: 1.0
dmi.sys.vendor: Medion

Revision history for this message
In , Thaddaeus Tintenfisch (thad-fisch-deactivatedaccount) wrote :
Download full text (5.9 KiB)

Created attachment 75598
Xorg log file

I am not able to logout or shutdown my system, a laptop with hybrid graphics, without triggering a hard lockup. However, this does only happen if the dedicated AMD GPU is powered off by vgaswitcheroo. Moreover, it might be somehow related to PRIME support being enabled.

The latest version of xserver-xorg-video-radeon for Ubuntu 13.04 is installed (1:7.1.0-0ubuntu1).

$ lshw -C display
  *-display
       description: VGA compatible controller
       product: RV710 [Mobility Radeon HD 4300 Series]
       vendor: Advanced Micro Devices [AMD] nee ATI
       physical id: 0
       bus info: pci@0000:01:00.0
       version: 00
       width: 32 bits
       clock: 33MHz
       capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
       configuration: driver=radeon latency=0
       resources: irq:46 memory:d0000000-dfffffff ioport:3000(size=256) memory:f4400000-f440ffff memory:f4420000-f443ffff
  *-display
       description: Display controller
       product: Mobile 4 Series Chipset Integrated Graphics Controller
       vendor: Intel Corporation
       physical id: 2
       bus info: pci@0000:00:02.0
       version: 07
       width: 64 bits
       clock: 33MHz
       capabilities: msi pm bus_master cap_list rom
       configuration: driver=i915 latency=0
       resources: irq:45 memory:f0000000-f03fffff memory:e0000000-efffffff ioport:4110(size=8)

Here is syslog output of the bug:
-------------------------------------------------------------------------
[ 142.230685] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 142.230819] IP: [<ffffffffa01f1ba5>] r600_pcie_gart_tlb_flush+0xf5/0x110 [radeon]
[ 142.230977] PGD 0
[ 142.231014] Oops: 0000 [#1] SMP
[ 142.231075] Modules linked in: dm_crypt(F) kvm_intel kvm acer_wmi sparse_keymap snd_hda_codec_realtek xt_hl(F) ip6t_rt(F) snd_hda_intel snd_hda_codec snd_hwdep(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) ipt_REJECT(F) microcode(F) xt_LOG(F) snd_pcm(F) xt_limit(F) xt_tcpudp(F) snd_page_alloc(F) xt_addrtype(F) snd_seq_midi(F) snd_seq_midi_event(F) snd_rawmidi(F) arc4(F) psmouse(F) nf_conntrack_ipv4(F) serio_raw(F) nf_defrag_ipv4(F) xt_state(F) iwldvm snd_seq(F) mac80211 ip6table_filter(F) snd_seq_device(F) ip6_tables(F) snd_timer(F) iwlwifi lpc_ich nf_conntrack_netbios_ns(F) nf_conntrack_broadcast(F) nf_nat_ftp(F) nf_nat(F) nf_conntrack_ftp(F) nf_conntrack(F) iptable_filter(F) cfg80211 ip_tables(F) joydev(F) x_tables(F) snd(F) soundcore(F) mac_hid binfmt_misc(F) coretemp lp(F) parport(F) hid_generic usbhid hid radeon i915 i2c_algo_bit ttm drm_kms_helper wmi r8169 ahci(F) drm libahci(F) video(F)
[ 142.232175] CPU 0
[ 142.232175] Pid: 1135, comm: Xorg Tainted: GF 3.8.0-7-generic #15-Ubuntu Acer TravelMate 8471/TravelMate 8471
[ 142.232175] RIP: 0010:[<ffffffffa01f1ba5>] [<ffffffffa01f1ba5>] r600_pcie_gart_tlb_flush+0xf5/0x110 [radeon]
[ 142.232175] RSP: 0018:ffff88013752bc28 EFLAGS: 00010282
[ 142.232175] RAX: ffffc900047a2f34 RBX: 0000000000000000 RCX: 0000000000000000
[ 142.232175] RDX: 0000000000000000 RSI: 0000000000002f34 RDI: ffff8801359d6000
[ 142.232175] RBP: ffff88013752b...

Read more...

Revision history for this message
Matthieu Baerts (matttbe) wrote :
Revision history for this message
Matthieu Baerts (matttbe) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
tags: added: kernel-bug-exists-upstream
Changed in dri:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
Matthieu Baerts (matttbe) wrote : Re: Powering down inactive GPU while running X causes NULL pointer dereference

I now have this crash all the time at startup (I'm using Ubuntu 13.10, kernel 3.11)

no longer affects: dri
Revision history for this message
Matthieu Baerts (matttbe) wrote :

I forgot to say this: for those who have this bug, simply add this line in a service which is launched before X' start (e.g. in /etc/init/lightdm.conf, just after the line with 'script'):

    echo OFF > /sys/kernel/debug/vgaswitcheroo/switch

penalvch (penalvch)
Changed in linux (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Matthieu Baerts (matttbe) wrote :

@Christopher: without the workaround given in comment #7 I had this problem with the development version of Saucy.

Now, I'm using Trusty and I have kernel oops (due to the Radeon module) when using the kernel 3.12 (I have to report a new bug...)

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
penalvch (penalvch)
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Matthieu Baerts (matttbe) wrote :

@Christopher: I didn't have this bug with Precise and Raring (I've not tested with Precise and the kernel 3.8 but I guess we still have this bug)

Revision history for this message
Matthieu Baerts (matttbe) wrote :

Sorry, I would said: 'I didn't have this bug with Precise and *Quantal*'

I've this bug when using Kernel 3.8 and newer. When using Precise 12.04.0 or Quantal 12.10, I didn't have this bug because the version of the kernel is < 3.8. But with Precise 12.04.3 (kernel 3.8), I guess this bug is there :-)

penalvch (penalvch)
tags: added: needs-upstream-testing regression-release
Revision history for this message
penalvch (penalvch) wrote : Re: IP: [<ffffffffa01b6c05>] r600_pcie_gart_tlb_flush+0xf5/0x110 [radeon]; RIP: 0010:[<ffffffffa01b6c05>] [<ffffffffa01b6c05>] r600_pcie_gart_tlb_flush+0xf5/0x110 [radeon]

Matthieu Baerts, could you please test the latest mainline kernel via http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.12-saucy/ and advise to the results?

description: updated
tags: added: needs-bisect
summary: - Powering down inactive GPU while running X causes NULL pointer
- dereference
+ IP: [<ffffffffa01b6c05>] r600_pcie_gart_tlb_flush+0xf5/0x110 [radeon];
+ RIP: 0010:[<ffffffffa01b6c05>] [<ffffffffa01b6c05>]
+ r600_pcie_gart_tlb_flush+0xf5/0x110 [radeon]
Revision history for this message
Matthieu Baerts (matttbe) wrote :

@Christopher: I'm now using Ubuntu Trusty but I'm not able to start kernel 3.12, it crashes immediately.
The backtrace is not available in /var/log/kern.log, what can I do more? Do you want a screenshot of this crash?

Revision history for this message
penalvch (penalvch) wrote :

Matthie Baerts, so the default Trusty kernel crashes immediately, or the mainline kernel installed in Trusty crashes immediately?

Revision history for this message
Matthieu Baerts (matttbe) wrote :

@Christopher

> so the default Trusty kernel crashes immediately

Yes. According to the backtrace, it's due to the Radeon driver (there are some lines with 'r600_pcie(...)').
I can take a picture but should I open a new bug report?

Revision history for this message
penalvch (penalvch) wrote :

Matthieu Baerts, it would not be necessary to file a new report at this time.

Could you please test the latest upstream kernel available (not the daily folder, but the one all the way at the bottom) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.12

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

tags: added: trusty
Revision history for this message
Matthieu Baerts (matttbe) wrote :

@Christopher: I still have this crash with the upstream kernel version 3.12.0-031200.
It seems that it still crash in 'r600_pcie_gart_tlb_flush'.

Note that I no longer have this crash if I remove this line from /etc/init/lightdm.conf (please have a look to the comment #7 for more details about that):

    echo OFF > /sys/kernel/debug/vgaswitcheroo/switch

But then, my ATI/AMD GPU is still enabled and consume a lot of resources...

tags: added: kernel-bug-exists-upstream-v3.12
removed: needs-upstream-testing
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
penalvch (penalvch)
description: updated
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Matthieu Baerts (matttbe) wrote :

@Christopher: sorry, there is no option to disable this card in BIOS of my laptop.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
description: updated
penalvch (penalvch)
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Matthieu Baerts (matttbe) wrote :

Hello Christopher,

Sorry for the delay.
I'm currently not able to bisect this bug simply because I've not enough free space on my hard disk...
Maybe I can test a patch or a specific version and used my ppa to compile that but I don't know if it will help.

Note that it seems I'm not the only one with this bug, here is what I found:
* https://bugs.freedesktop.org/show_bug.cgi?id=70687 (DRI DRM/Radeon)
* https://bugzilla.kernel.org/show_bug.cgi?id=61891 (DRI, non Intel)

I hope it will help :-)

penalvch (penalvch)
tags: added: needs-upstream-testing
removed: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
penalvch (penalvch)
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Matthieu Baerts (matttbe) wrote :

Hello Christopher,

I confirm that I still have this crash when using the lastest RC version of the kernel (v3.13 RC3) on Ubuntu Trusty.
Note that this bug has also been reported to DRI DRM/Radeon devs: https://bugs.freedesktop.org/show_bug.cgi?id=71930

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-bug-exists-upstream
removed: needs-upstream-testing
penalvch (penalvch)
tags: added: kernel-bug-exists-upstream-v3.13-rc3
removed: kernel-bug-exists-upstream-v3.12
penalvch (penalvch)
summary: - IP: [<ffffffffa01b6c05>] r600_pcie_gart_tlb_flush+0xf5/0x110 [radeon];
- RIP: 0010:[<ffffffffa01b6c05>] [<ffffffffa01b6c05>]
+ 1002:9480 IP: [<ffffffffa01b6c05>] r600_pcie_gart_tlb_flush+0xf5/0x110
+ [radeon]; RIP: 0010:[<ffffffffa01b6c05>] [<ffffffffa01b6c05>]
r600_pcie_gart_tlb_flush+0xf5/0x110 [radeon]
Revision history for this message
penalvch (penalvch) wrote :

Matthieu Baerts, thank you for performing the requested test. Regarding your comments:

>"Note that this bug has also been reported to DRI DRM/Radeon devs: https://bugs.freedesktop.org/show_bug.cgi?id=71930"

The bug hasn't been reported to the DRI DRM/Radeon devs in https://bugs.freedesktop.org/show_bug.cgi?id=71930 as you didn't boot with kernel parameter radeon.runpm=1.

Despite this, would it be possible to either free up some space on your main drive, or do the bisect from your backup drive? This would really help out in getting your bug looked at by a developer and ultimately fixed.

description: updated
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Matthieu Baerts (matttbe) wrote :

> Despite this, would it be possible to either free up some space on your main drive, or do the bisect from your backup drive?

Unfortunately it's currently not possible... I've to find some free time and to buy a new drive.

I know that it's maybe not the same but a few bisects have also be made by other people there: https://bugs.freedesktop.org/show_bug.cgi?id=70687 and https://bugzilla.kernel.org/show_bug.cgi?id=61891

I'll try to do that asap but not before the next month :-/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.