[Radeon HD 5650 and 5470] Kernel BUG during recovery boot and in normal boot (Hybrid graphics)

Bug #727620 reported by afoglia on 2011-03-02
394
This bug affects 59 people
Affects Status Importance Assigned to Milestone
Linux
Confirmed
Medium
xserver-xorg-driver-ati
Fix Released
Critical
linux (Ubuntu)
High
Seth Forshee
xserver-xorg-video-ati (Ubuntu)
Wishlist
Unassigned

Bug Description

[Problem]
On hybrid graphics hardware with this ATI chip and another (e.g. Intel), a failure occurs resulting in a black screen and errors from the radeon kernel module, as shown below.

[Cause]
From upstream developer:

"The switcheroo code needs more work to switch properly on some systems it seems. There are a set acpi methods required to activate/deactivate the respective gpus. The drivers need to load and initialize active hw. If the hw is not active when the driver loads, then the hw is not set up properly and it won't work. Probably some ordering issues in how the switcheroo acpi methods are called."

[Workarounds]
Several options:

1. If your BIOS includes functionality to disable the Intel card, use BIOS settings to select which chip to load.

2. Disable KMS by adding `radeon.modeset=0` in the boot line. Note that the default radeon gallium driver only works with KMS, so YMMV.

[Original Report]
I'm running natty, and every since the upgrade to 6.14.0 I've been unable to consistently boot. After some discussion in the forums, I tried repeatedly to boot into recovery mode. In most cases, I got a black screen. One time though, when I was able to successfully increase the brightness, I saw some errors from the radeon module. I took a photo (available at http://i.imgur.com/P0bQ0.jpg), and here's the stack and call trace, as best as I can read it:

Stack:
 ffff880149eb8000 ffff880149eb8000 0000000000000011 0000000000000911
 00000000fffffff4 ffff88014b6c7800 ffff88014b0f7b58 ffffffffa022aba0
 ffff8801460f7b58 ffff880149eb8000 0000000000000000 0000000000410028
Call Trace:
 [<ffffffffa022aba0>] evergreen_cp_resume+0x3a0/0x630 [radeon]
 [<ffffffffa022c8b7>] evergreen_startup+0x157/0x260 [radeon]
 [<ffffffffa01fe8a0>] ? r600_pcie_gart_init+0x60/0x70 [radeon]
 [<ffffffffa022dbec>] evergreen_init+0x1ac/0x2d0 [radeon]
 [<ffffffffa01a5a69>] radeon_device_init+0x409/0x490 [radeon]
 [<ffffffffa01a7142>] radeon_driver_load_kms+0xb2/0x1a0 [radeon]
 [<ffffffffa007fb2e>] drm_get_pci_dev+0x18e/0x300 [drm]
 [<ffffffff8115426f>] ? kmem_cache_alloc_trace+0xff/0x120
 [<ffffffffa023790e>] radeon_pci_probe+0xb2/0xba [radeon]
 [<ffffffff812fea7f>] local_pci_probe+0x5f/0xd0
 [<ffffffff81300369>] pci_device_probe+0x119/0x120
 [<ffffffff813b8eca>] ? driver_sysfs_add+0x7a/0xb0
 [<ffffffff813b8ff8>] really_probe+0x68/0x190
 [<ffffffff813b9305>] driver_probe_device+0x45/0x70
 [<ffffffff813b93db>] __driver_attach+0xab/0xb0
 [<ffffffff813b9330>] ? __driver_attach+0x0/0xb0
 [<ffffffff813b817e>] bus_for_each_dev+0x5e/0x90
 [<ffffffff813b8e4e>] driver_attach+0x1e/0x20
 [<ffffffff813b89b5>] bus_add_driver+0xc5/0x280
 [<ffffffffa0013000>] ? radeon_init+0x0/0x1000 [radeon]
 [<ffffffff813b9676>] driver_register+0x76/0x140
 [<ffffffffa0013000>] ? radeon_init+0x0/0x1000 [radeon]
 [<ffffffff812ff126>] __pci_register_driver+0x56/0xd0
 [<ffffffffa0080044>] drm_pci_init+0xe4/0xf0 [drm]
 [<ffffffff815bf36e>] ? mutex_lock+0x1e/0x50
 [<ffffffffa0013000>] ? radeon_init+0x0/0x1000 [radeon]
 [<ffffffffa0077688>] drm_init+0x58/0x70 [drm]
 [<ffffffffa00130c4>] radeon_init+0xc4/0x1000 [radeon]
 [<ffffffff81002195>] do_one_initcall+0x45/0x190
 [<ffffffff810a4573>] sys_init_module+0x103/0x260
 [<ffffffff8100c002>] system_call_fastpath+0x16/0x1b
Code: 00 45 8b 84 24 e4 0a 00 00 45 85 c0 0f 8e c7 09 00 00 41 8b 84 24 d4 0a 00 00 89 c2 83 c0 01 40 c1 e2 02 49 03 94 24 c8 0a 00 00 <c7> 02 00 44 05 c0 41 8b 94 24 e4 0a 00 00 41 23 84 24 f4 0a 00
RIP [<ffffffffa0227ad7>] evergreen_cp_start+0x57/0xc80 [radeon]
 RSP <ffff88014b0f7af8>
CRZ: ffffc90411ce1ffc
---[ end trace 37702c56f2e23247 ]---
udevd-work[94]: '/sbin/modprobe -bv pci:v00001002d000068C1sv0000103Csd00001436bc03sc00i00' unexpected exit with status 0x0009

There is also some register info dumped at the top of the screen visible in the photo, that I didn't bother to write, as I'd most certainly get something wrong.

afoglia (afoglia) on 2011-03-02
tags: added: natty
afoglia (afoglia) wrote :

I forgot to mention, my computer is an HP Envy 14, so I have the discrete ATI card, and also integrated graphics from the core i5 (which uses the i915 driver). Just in case it's some interaction between the two that causes the crash.

Vangel Ajanovski (ajanovski) wrote :

I also have the same problem, sometimes it takes just 1-2 resets to be able to boot, and now i reseted the computer 8 times (2 with full power off) and it finally booted. I think that it fails right before showing the Ubuntu logo and progress bar when switching from console to graphics mode.

My computer is HP Pavilion dm4t-1100 wit ATI 5470HD and Intel.

summary: - [Radeon HD 5650] Driver crash during recovery boot
+ [Radeon HD 5650 and 5470] Driver crash during recovery boot and in
+ normal boot

Hi afoglia,

Does it resolve if you downgrade to an older version of -ati?

You can get older .deb files of the driver from Launchpad here:

https://launchpad.net/ubuntu/+source/xserver-xorg-video-ati/+publishinghistory

Click on the link under Version of the version you want to test, then under Builds click the link for your hardware architecture, then grab the -ati and -radeon .debs and install them.

If that doesn't do it, then next guess would be you are having a kernel issue - if you still have a prior kernel you can try booting it (hold down the left shift key during boot to bring up the menu.)

Changed in xserver-xorg-video-ati (Ubuntu):
status: New → Incomplete
Vangel Ajanovski (ajanovski) wrote :

In my situation this is something I found in the logs.
I analyzed the logs and compared it to a normal log and besides the similar stack dump I see one significant difference in the problematic log is this:

[drm] radeon: 3584M of VRAM memory ready
[drm] radeon: 512M of GTT memory ready.

whereas in the normal log:
[drm] radeon: 512M of VRAM memory ready
[drm] radeon: 512M of GTT memory ready.

The laptop has only 4GB RAM and ATI is supposed to have only 512MB.

I have attached the relevant part of the log

This is my complete kern.log file showing the crash being identical to that of afoglia and Vangel Ajanovski.

afoglia (afoglia) wrote :

How do I install the old versions? I tried installing 6.13.2+git20110124.fadee040-0ubuntu4 and ...ubuntu3 but I got dpkg dependency problems stating that they depend on xorg-video-abi-9.0. apt-get can't find that package. (It can find xorg-video-abi-9, but selects xserver-xorg-core instead, and that's at the newest version in natty.)

I also tried the maverick version on that page (6.13.1-1ubuntu5) and again, dpkg has dependency issues, this time the required package is xorg-video-abi-8.0, and that this version of xserver-xorg-video-(ati|radeon) provides xserver-xorg-video-8 which xserver-xorg-core breaks.

If this helps, I did not have these problems under maverick, and while I had minor problems in natty a few weeks ago, they got noticeably, drastically worse when the 6.14 drivers were released.

afoglia (afoglia) wrote :

I tried Bryce's second suggestion of using old kernels. I have two previous versions of 2.6.38 installed, 2.6.38-3-generic and 2.6.38-4-generic. I booted each into recovery and normal mode 4 times, for a total of 16 boots. Here's the number of times the boot was a success, where I either got to the recovery boot menu or gdm, (regardless of whether the screen brightness had to be manually increased from 0, or if the plymouth boot screen displayed).

2.6.38-4-generic, normal: 1 success, 3 failures
2.6.38-4-generic, recovery: 4 successes
2.6.38-3-generic, normal: 4 successes
2.6.38-3-generic, recovery: 3 successes, 1 failure

At no time did I see a stack trace like the one I posted, but I've only seen that in recovery mode. (Would it be written somewhere persistent between boots? It's not in /var/log/syslog.)

I took more notes on the failures. They're pretty vague and qualitative, but have slightly more detail of what each boot was like.

Bryce Harrington (bryce) wrote :

Okay, thanks for the testing. That suggests a regression in the kernel between 2.6.38-3 and -4 (the one failure with -3 may be a random outlier).

Even though this seems to be pinpointed to the kernel, I'll leave the X task open for now so we can keep track of the bug's progress from the X end.

summary: [Radeon HD 5650 and 5470] Driver crash during recovery boot and in
- normal boot
+ normal boot (Regression from 2.6.38-3 to -4)
Changed in xserver-xorg-video-ati (Ubuntu):
importance: Undecided → High
status: Incomplete → Triaged

Hi guys,
i have the same problem.I like very much this OS-Ubuntu, but why such a good system do not resolve this problem such a long time?I see a posts from the last 2 or 3 yaers.It's strange for me.I don't want to experiment with my PC.I tried but it's always a crash,black screen,red screen.....I wait a new better release or help me with something that really works.Thank you very much.

bugbot (bugbot) on 2011-03-13
tags: added: crash
Bryce Harrington (bryce) wrote :

kolio, not sure what you're talking about. The Radeon HD 5650 came on the market on Jan 7, 2010, so it did not exist 2 or 3 years ago. Whatever posts you're looking at are unrelated to this problem.

Bryce Harrington (bryce) wrote :

afoglia, just to confirm - you still seeing this crash with the current kernel?

Bryce Harrington (bryce) on 2011-03-19
Changed in xserver-xorg-video-ati (Ubuntu):
status: Triaged → Confirmed
afoglia (afoglia) wrote :

Yes and no. I did six normal boots with 2.6.38-7.35, then realized there was an update, and booted that both normally and in recovery and here's what I saw

2.6.38-7.36 normal, 6 boots, 5 reached gdm login screen, 1 gdm started but hung before login window appeared (only one of the 5 successful boots showed the plymouth boot screen)
2.6.38-7.36 recovery mode, 5 boots, all hung with the monitor off, no plymouth, brightness key did nothing.
2.6.38-7.35 normal, 6 boots, 3 hung with monitor off, 3 reached gdm

Since I still can't boot in recovery (and I don't see anything in the changelog for -7.36 obviously related), I'd say the bug is still there.

I confirm that the bug is still there. I install last update this morning (I saw xserver-xorg-video-ati, -radeon... update so I was expecting the bug to be solved).

When I restart, I need to reboot more than 5 time before I get a desktop. And now, when the boot crash, I don't get any shell, screen stay black (as if it is off).

I really expect this bug will be solved before the final release. If not, Ubuntu won't work on plenty of the last HP pavilion laptop.

Note : Here is the result of
lspci -v | grep -A 12 VGA :

guillaume@guillaume-HP-Notebook:~$ lspci -v | grep -A 12 VGA
00:02.0 VGA compatible controller: Intel Corporation Core Processor Integrated Graphics Controller (rev 02) (prog-if 00 [VGA controller])
 Subsystem: Hewlett-Packard Company Device 163c
 Flags: bus master, fast devsel, latency 0, IRQ 44
 Memory at c0000000 (64-bit, non-prefetchable) [size=4M]
 Memory at b0000000 (64-bit, prefetchable) [size=256M]
 I/O ports at 5050 [size=8]
 Expansion ROM at <unassigned> [disabled]
 Capabilities: <access denied>
 Kernel driver in use: i915
 Kernel modules: i915

00:16.0 Communication controller: Intel Corporation 5 Series/3400 Series Chipset HECI Controller (rev 06)
 Subsystem: Hewlett-Packard Company Device 163c
--
01:00.0 VGA compatible controller: ATI Technologies Inc Robson CE [AMD Radeon HD 6300 Series] (prog-if 00 [VGA controller])
 Subsystem: Hewlett-Packard Company Device 163c
 Flags: bus master, fast devsel, latency 0, IRQ 43
 Memory at a0000000 (64-bit, prefetchable) [size=256M]
 Memory at c4400000 (64-bit, non-prefetchable) [size=128K]
 I/O ports at 4000 [size=256]
 Expansion ROM at c4440000 [disabled] [size=128K]
 Capabilities: <access denied>
 Kernel driver in use: radeon
 Kernel modules: radeon

01:00.1 Audio device: ATI Technologies Inc Manhattan HDMI Audio [Mobility Radeon HD 5000 Series]
 Subsystem: Hewlett-Packard Company Device 163c
guillaume@guillaume-HP-Notebook:~$

I have a theory... Maybe there is a race condition between the intel and ati driver involved here? My notebook starts up in two seemingly random configurations, or three including the radeon crash:

1. X server is on VT8 -> unable to unload radeon module because it is in use (by some framebuffer I guess). I am also unable to switch to consoles VT1-7. If I use vga_switcheroo to switch to integrated gpu in this mode then radeon crashes.

2. X server is on VT7 -> I can unload the radeon module and use the consoles VT1-6. vga_switcheroo works and I can also use acpi calls to turn off the gpu.

3. The radeon driver crashes. Forcing reboot through RSEIUB.

This makes it difficult to control the temperature since I cannot know if the radeon module is in use or not (i.e. I might or might not be able to use the vga_switcheroo, or unload the module and use a specific acpi call to shut of the gpu).

Download full text (4.4 KiB)

When I have booted into an evironment where both X server and framebuffer uses intel, sometimes when I try to unload the radeon module it crashes like this:

[ 346.860598] radeon 0000:02:00.0: ffff88014b362000 unpin not necessary
[ 346.860619] BUG: unable to handle kernel paging request at ffffc90022680000
[ 346.861758] IP: [<ffffffffa01f00bc>] rs600_gart_set_page+0x3c/0x50 [radeon]
[ 346.863120] PGD 157818067 PUD 157819067 PMD 14959b067 PTE 0
[ 346.864707] Oops: 0002 [#1] SMP
[ 346.866297] last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/drm/card0/card0-VGA-1/status
[ 346.867941] CPU 3
[ 346.867963] Modules linked in: cryptd aes_x86_64 aes_generic binfmt_misc parport_pc ppdev dm_crypt wl(P) lib80211 snd_hda_codec_hdmi snd_hda_codec_realtek arc4 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm joydev snd_seq_midi brcm80211(C) snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device sparse_keymap mac80211 cfg80211 uvcvideo videodev psmouse snd v4l2_compat_ioctl32 intel_ips serio_raw lp soundcore snd_page_alloc parport radeon(-) i915 ttm ahci atl1c libahci drm_kms_helper drm i2c_algo_bit video
[ 346.877408]
[ 346.879334] Pid: 2518, comm: rmmod Tainted: P C 2.6.38-7-generic #38-Ubuntu Acer Aspire 3820/JM31_CP
[ 346.881377] RIP: 0010:[<ffffffffa01f00bc>] [<ffffffffa01f00bc>] rs600_gart_set_page+0x3c/0x50 [radeon]
[ 346.883456] RSP: 0018:ffff8801259e7c68 EFLAGS: 00010286
[ 346.885512] RAX: 00000000ffffffea RBX: ffff880149e00000 RCX: ffffc90022680000
[ 346.887607] RDX: 0000000036822067 RSI: 0000000000000000 RDI: ffff880149e00000
[ 346.889724] RBP: ffff8801259e7c68 R08: 0000000000000000 R09: ffff88014adf7748
[ 346.891854] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000111
[ 346.893993] R13: 0000000000000111 R14: 0000000000000888 R15: 0000000000000001
[ 346.896139] FS: 00007f8e31ee8720(0000) GS:ffff880093180000(0000) knlGS:0000000000000000
[ 346.898319] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 346.900511] CR2: ffffc90022680000 CR3: 0000000125af6000 CR4: 00000000000006e0
[ 346.902754] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 346.905025] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 346.907314] Process rmmod (pid: 2518, threadinfo ffff8801259e6000, task ffff88013d67c440)
[ 346.909638] Stack:
[ 346.911920] ffff8801259e7cb8 ffffffffa01bf146 ffff880149e01338 0002000000000000
[ 346.914277] ffff8801259e7cb8 ffff880149e00000 ffff88014f0c6000 ffffffffa025a590
[ 346.916600] ffff88014f80e000 0000000000000001 ffff8801259e7cd8 ffffffffa01bf49d
[ 346.918952] Call Trace:
[ 346.921281] [<ffffffffa01bf146>] radeon_gart_unbind+0xb6/0x160 [radeon]
[ 346.923664] [<ffffffffa01bf49d>] radeon_gart_fini+0x7d/0x80 [radeon]
[ 346.926056] [<ffffffffa022a146>] evergreen_pcie_gart_fini+0x26/0x30 [radeon]
[ 346.928466] [<ffffffffa022dc8e>] evergreen_fini+0x3e/0x90 [radeon]
[ 346.930871] [<ffffffffa01a5b0b>] radeon_device_fini+0x3b/0xa0 [radeon]
[ 346.933291] [<ffffffffa01a7045>] radeon_driver_unload_kms+0x35/0x60 [radeon]
[ 346.935709] [<ffffffffa0020b16>] drm_put_dev+0xc6/0x1d0 [drm]
[ 346.938125] [<ffffffffa018b11d>] radeon_pci_remove...

Read more...

Does anybody works on this bug ?

Dweia (dweia) wrote :

Bryce Harrington wrote on 2011-03-04:[...] That suggests a regression in the kernel between 2.6.38-3 and -4

The error must have occured a lot earlier. I tried a bunch of different kernels, each with the (at the moment) most recent version (highest number after the dash):

2.6.35-25 - works
2.6.36-1 - works
2.6.37-12 - crashes most of the time
2.6.38-7 - crashes most of the time

I also tried some other versions, including the mentioned 2.6.38-3, but no luck there for me.

There's another bug, which may or may not be related, and which got apparently fixed with kernel 2.6.37, maybe thereby introducing this problem with the crashes? This older problem causes entries like the following in the kernel log when shutting down or rebooting the system.

kernel: [ 36.068256] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
kernel: [ 36.068719] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing CF42 (len 72, WS 0, PS 0) @ 0xCF71
kernel: [ 41.070113] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
kernel: [ 41.070654] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing CB20 (len 62, WS 0, PS 0) @ 0xCB3C
kernel: [ 46.271694] [drm:atom_op_jump] *ERROR* atombios stuck in loop for more than 5secs aborting
kernel: [ 46.272333] [drm:atom_execute_table_locked] *ERROR* atombios stuck executing CB20 (len 62, WS 0, PS 0) @ 0xCB3C

These entries occur only in kernel 2.6.35 once per second, in 2.6.36 every 5 seconds and disappear altogether in 2.6.37

P.S. I don't think the bug has to do with xserver-xorg at all - it's most probably the radeon kernel-module, since the errors occur long before X is starting. Also the bugs go away when I blacklist the radeon module. Unfortunately I cannot switch off the Radeon-graphics card, when the module for it isn't loaded. :-(

Dweia (dweia) wrote :

Unfortunately I discovered yesterday, that I lied. Wehn used without battery (this is a Aspire 3820TG laptop) and connected power-cord, the crash occurs also with kernel 2.6.36-1. Probably the BIOS does something to/with the graphics-cards, when external power is connected. All the previous tests had been running on battery-power.

Couldn't yet test the 2.6.35 kernel, since I removed it already- need to reinstall that and see what happens.

Bryce Harrington (bryce) wrote :

It's starting to sound like this is due to confusion (maybe a regression) in the plumbing layer between X and the kernel, such as module-init-tools or one of the related packages.

I think either apw or cjwatson need to look into this issue. apw's on vacation though.

Chris Halse Rogers (raof) wrote :

This does look a lot like some bad interaction between i915/radeon(/vesafb?). Although we don't seem to have a full dmesg of a -7 kernel it seems like it's not vesafb-based.

It would be useful to have logs - both of good and bad boots - with the “drm.debug=0x0e” kernel argument added to the boot line.

I have taken logs from a set of ubuntu kernels starting up using the requested kernel boot argument "drm.debug=0x0e":

2.6.38-7 ---> fb0: radeondrmfb frame buffer device
2.6.38-8 ---> fb0: inteldrmfb frame buffer device
2.6.39-rc1--> kernel oops (pointing to evergreen something), not caught in the logs. seems to be a new offset than the reported one above

Please see the attached files containng dmesg and kern.log for each kernel. There are some other interesting things in the logs like invalid DSDT and stuff that I will look into further.

Bryce Harrington (bryce) wrote :

afoglia - I've forwarded this bug upstream to http://bugs.freedesktop.org/show_bug.cgi?id=36003 - please subscribe yourself to this bug, in case they need further information or wish you to test something. Thanks ahead of time!

Johan, I attached your logs to the upstream bug report. Generally upstream prefers that the logs come from the original reporter, so I'm not sure if they will accept the bug report.

Changed in xserver-xorg-video-ati (Ubuntu):
status: Confirmed → Triaged
Bryce Harrington (bryce) wrote :

Upstream would like to see if setting the video to the radeon/descrete setting in the BIOS configuration makes it function properly.

If so, this may be a known issue in the new vga_switcheroo functionality, ala bug https://bugzilla.kernel.org/show_bug.cgi?id=30052

Changed in xserver-xorg-video-ati (Ubuntu):
status: Triaged → Incomplete
Dweia (dweia) wrote :

Chris Halse Rogers wrote in #22: "This does look a lot like some bad interaction between i915/radeon"

I agree - I did some more testing and placed an entry for the radeon-module into /etc/initramfs-tools/modules, and voilá - no crash! (after booting into X I can't get back to the text-console, but that's possibly another issue).

What proves the "bad interaction" even more is: when also placing "i915" into /etc/initramfs-tools/modules BEFORE "radeon", booting isn't possible at all any more, but when placing i915 AFTER radeon, booting is possible, but I got a black screen (as in totally black, that is: no backlight) until X starts up.

I'll try to get logs of four different initrd-configurations (though I doubt I'll be able to record the one where the crash occurs already during running of initrd...)

P.S. off-topic, I knew a cjwatson once - hi Kamion ;-)

Dweia (dweia) wrote :

Bryce Harrington wrote in #25 "Upstream would like to see if setting the video to the radeon/descrete setting in the BIOS configuration makes it function properly."

Answer: Yes it does. I tried that a while ago already, but can't (don't want to) use that for regular running, because the radeon-card makes the laptop-battery ruin out too fast. I read also that there's a patched/hacked BIOS somewhere that allows to switch off the radeon-card via BIOS, but if it can be solved with software I'd prefer that ;)

Changed in xserver-xorg-driver-ati:
importance: Unknown → High
status: Unknown → Confirmed

@Bryce: Yes, booting with only discrete or integrated enabled does solve the problem for me as well.

@Dweia: I patched the CMOS for my 3820TG and unlocked the Intel menu. Now I can choose to only have the IGD activated and PEG (radeon) completely shut off drawing zero power. Same thing as going through vga_switcheroo or using the acpi calls but less hassle.

Btw, should we work on the bug on here on launchpad or keep the discussion on freedesktop working directly with the AMD devs?

Dweia (dweia) wrote :

Sorry, I got sidetracked while getting a set of logs. However, some (yet slightly vague) findings may be useful - even if debugging gets maybe even harder:

Firstly: the computer (BIOS or whatever) behaves differently when external power is connected or only battery used, and secondly: it behaves differently depending on the last state of the vgaswitcheroo BEFORE the reboot. I need to do more testing regarding the former (probably frequency of crashes higher with external power), but the latter seemed to me pretty consistently only crashing after "echo OFF > /sys/kernel/debug/vgaswitcheroo/switch".

I did yesterday a kernel-update to 2.6.38-8, I'll try to reproduce the findings and will try to see if anything changed in the behaviour.

Vangel Ajanovski (ajanovski) wrote :

If I add
radeon.modeset=0
in the boot line when starting, the crash does not happen and the system continues with is using integrated Intel.

Timo Aaltonen (tjaalton) wrote :

marking the bug as confirmed

Changed in xserver-xorg-video-ati (Ubuntu):
status: Incomplete → Confirmed

I've updated the title and description based on recent findings.

Hybrid graphics switching support is still fairly embryonic upstream and I don't feel it is yet stable or reliable enough yet for us to support in Ubuntu, so I am setting the importance of the X task here to Wishlist.

However, even aside from switching graphics, the kernel should not be failing with this particular hardware configuration, even if it is not able to properly switch; it should pick one driver or the other and not load both, even if it just has to pick at random. So I'm leaving the kernel task here open, in hopes that some fix can at least paper over the crash.

description: updated
summary: [Radeon HD 5650 and 5470] Driver crash during recovery boot and in
- normal boot (Regression from 2.6.38-3 to -4)
+ normal boot (Hybrid graphics)
Changed in xserver-xorg-video-ati (Ubuntu):
importance: High → Wishlist
status: Confirmed → Triaged
summary: - [Radeon HD 5650 and 5470] Driver crash during recovery boot and in
- normal boot (Hybrid graphics)
+ [Radeon HD 5650 and 5470] Kernel BUG during recovery boot and in normal
+ boot (Hybrid graphics)
Bryce Harrington (bryce) wrote :

@JFo, this hardware results in the kernel triggering a BUG. Please add this to the kernel team's list of bugs to investigate.

Changed in linux (Ubuntu):
status: New → Triaged
importance: Undecided → High
status: Triaged → New
assignee: nobody → Jeremy Foshee (jeremyfoshee)
wedens (frigid20) wrote :

i have same problems on radeon 5650
log attached

Jeremy Foshee (jeremyfoshee) wrote :

added to the hot bugs listing for team review.

~JFo

tags: added: kernel-key
Changed in linux (Ubuntu):
assignee: Jeremy Foshee (jeremyfoshee) → nobody
status: New → Triaged
Bryce Harrington (bryce) on 2011-05-03
tags: added: oneiric
Bryce Harrington (bryce) wrote :

[I've marked this bug for inclusion in our oneiric bug queue. While technically this bug has not been re-confirmed against oneiric, I feel it is worth continued development attention. We will need to ask that it be re-confirmed once oneiric is further along, perhaps once we get closer to alpha.]

Cabalbl4 (i-vohmin) wrote :

can confirm this bug on hybrid radeon 6550 and intel card on acer aspire 3820TG
natty kernel 2.6.38-8-generic.

Cabalbl4 (i-vohmin) wrote :

issue seems to be gone with kernel 2.6.38-9-generic from proposed.

Cabalbl4 (i-vohmin) wrote :

No, it still exist on 2.6.38-9-generic when power cable is unplugged (

Seth Forshee (sforshee) on 2011-05-18
Changed in linux (Ubuntu):
assignee: nobody → Seth Forshee (sforshee)
status: Triaged → Incomplete
Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed
Seth Forshee (sforshee) on 2011-05-27
Changed in linux (Ubuntu):
status: Incomplete → In Progress
status: In Progress → Incomplete
Seth Forshee (sforshee) on 2011-06-03
Changed in linux (Ubuntu):
status: Incomplete → In Progress
Seth Forshee (sforshee) on 2011-06-20
Changed in linux (Ubuntu):
status: In Progress → Incomplete
Seth Forshee (sforshee) on 2011-06-21
Changed in linux (Ubuntu):
status: Incomplete → In Progress
Seth Forshee (sforshee) on 2011-06-21
Changed in linux (Ubuntu):
status: In Progress → Incomplete
Bryce Harrington (bryce) on 2011-07-18
Changed in xserver-xorg-video-ati (Ubuntu):
status: Triaged → Invalid
106 comments hidden view all 186 comments
Martin Stjernholm (msub) wrote :

Bisect 013 booted ok 24 times out of 30. 3 of the remaining times failed with an afaics unrelated oops in azx_interrupt in the snd_hda_intel driver. The last 3 times there were oops'es which didn't manage to get sufficiently logged on the tty to see where they were (the computer froze either before logging the relevant line, or after having scrolled it off the screen). I think it's most likely that those were in snd_hda_intel as well, since some other boots with that oops froze before logging the oops entirely.

So my conclusion is that bisect 013 is not affected by the evergreen_cp_resume bug. I therefore didn't test 012 since it should work by implication under the bisect assumption.

Seth Forshee (sforshee) wrote :

bisect014 is now available.

http://people.canonical.com/~sforshee/lp727620/bisect/

# bad: [2703c21a82301f5c31ba5679e2d56422bd4cd404] drm/nv50/gr: move to exec engine interfaces
git bisect bad 2703c21a82301f5c31ba5679e2d56422bd4cd404

Now testing commit 000703f44c77b152cd966eaf06f4ab043274ff46.

Martin Stjernholm (msub) wrote :

After 30 boots with bisect 014 I got 3 oopses but none mentioning evergreen_cp_resume, so I deem it not affected.

josejuan05 (josejuan05) wrote :

I have had 20 boots with one oops, but not evergreen_cp_resume on bisect 014. I second Martin.

Seth Forshee (sforshee) wrote :

bisect015 is now available.

http://people.canonical.com/~sforshee/lp727620/bisect/

# bad: [000703f44c77b152cd966eaf06f4ab043274ff46] mxm/wmi: add MXMX interface entry point.
git bisect bad 000703f44c77b152cd966eaf06f4ab043274ff46

Now testing commit 63f7d9828bf55cc8ee6f460830c5285fe06bef3e.

josejuan05 (josejuan05) wrote :

I have booted bisect015 twice, and both times got evergreen_cp_restart oops.

This commit is affected by the bug.

Seth Forshee (sforshee) wrote :

bisect016 is now available.

http://people.canonical.com/~sforshee/lp727620/bisect/

# good: [63f7d9828bf55cc8ee6f460830c5285fe06bef3e] drm/radeon/kms: add support for thermal chips on combios asics
git bisect good 63f7d9828bf55cc8ee6f460830c5285fe06bef3e

Now testing commit 99b38b4acc0d7dbbab443273577cff60080fcfad.

josejuan05 (josejuan05) wrote :

On five boots of bisect016 I have two evergreen_cp_start errors.

Seth Forshee (sforshee) wrote :

bisect017 is now available.

http://people.canonical.com/~sforshee/lp727620/bisect/

# good: [99b38b4acc0d7dbbab443273577cff60080fcfad] platform/x86: add MXM WMI driver.
git bisect good 99b38b4acc0d7dbbab443273577cff60080fcfad

Now testing commit 3448a19da479b6bd1e28e2a2be9fa16c6a6feb39.

josejuan05 (josejuan05) wrote :

On bisect017 I've had 19 successful boots, and one that ended with the gart_unbind error (not evergreen_cp_...)

Seth Forshee (sforshee) wrote :

bisect018 is now available. This should be the last bisect build, then we'll need to apply the commit identified as fixing the issue onto natty and see if that fixes the problem there.

http://people.canonical.com/~sforshee/lp727620/bisect/

# bad: [3448a19da479b6bd1e28e2a2be9fa16c6a6feb39] vgaarb: use bridges to control VGA routing where possible.
git bisect bad 3448a19da479b6bd1e28e2a2be9fa16c6a6feb39

Now testing commit 8116188fdef5946bcbb2d73e41d7412a57ffb034.

josejuan05 (josejuan05) wrote :

Out of three boots, the first two failed on an evergreen_cp_ error.

josejuan05 (josejuan05) wrote :

Oh. Clarification: Out of three boots of bisect018, the first two failed on an evergreen_cp_ error.

Seth Forshee (sforshee) wrote :

The bisect identified this as the commit that fixes the problem:

3448a19 vgaarb: use bridges to control VGA routing where possible.

A natty build with this patch applied is available at the link below. Please test to see whether or not the bug is reproducible in this build. Thanks!

http://people.canonical.com/~sforshee/lp727620/linux-2.6.38-11.48~lp727620v201108261554/

josejuan05 (josejuan05) wrote :

I have 20 consecutive successful boots on this commit. I have no boot failures yet.

Download full text (5.0 KiB)

Cool! it seems to be working for me. I even got a little taste of
the Plymouth boot screen which I haven't seen in a long time :)
I'll continue testing and let you know if I experience the B[lack]SOD.

On Fri, Aug 26, 2011 at 8:19 PM, josejuan05 <email address hidden>wrote:

> I have 20 consecutive successful boots on this commit. I have no boot
> failures yet.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/727620
>
> Title:
> [Radeon HD 5650 and 5470] Kernel BUG during recovery boot and in
> normal boot (Hybrid graphics)
>
> Status in The Linux Kernel:
> Confirmed
> Status in X.org XServer - ATI gfx chipset driver:
> Confirmed
> Status in “linux” package in Ubuntu:
> Incomplete
> Status in “xserver-xorg-video-ati” package in Ubuntu:
> Invalid
>
> Bug description:
> [Problem]
> On hybrid graphics hardware with this ATI chip and another (e.g. Intel), a
> failure occurs resulting in a black screen and errors from the radeon kernel
> module, as shown below.
>
> [Cause]
> From upstream developer:
>
> "The switcheroo code needs more work to switch properly on some
> systems it seems. There are a set acpi methods required to
> activate/deactivate the respective gpus. The drivers need to load and
> initialize active hw. If the hw is not active when the driver loads,
> then the hw is not set up properly and it won't work. Probably some
> ordering issues in how the switcheroo acpi methods are called."
>
> [Workarounds]
> Several options:
>
> 1. If your BIOS includes functionality to disable the Intel card, use
> BIOS settings to select which chip to load.
>
> 2. Disable KMS by adding `radeon.modeset=0` in the boot line. Note
> that the default radeon gallium driver only works with KMS, so YMMV.
>
> [Original Report]
> I'm running natty, and every since the upgrade to 6.14.0 I've been unable
> to consistently boot. After some discussion in the forums, I tried
> repeatedly to boot into recovery mode. In most cases, I got a black screen.
> One time though, when I was able to successfully increase the brightness, I
> saw some errors from the radeon module. I took a photo (available at
> http://i.imgur.com/P0bQ0.jpg), and here's the stack and call trace, as
> best as I can read it:
>
> Stack:
> ffff880149eb8000 ffff880149eb8000 0000000000000011 0000000000000911
> 00000000fffffff4 ffff88014b6c7800 ffff88014b0f7b58 ffffffffa022aba0
> ffff8801460f7b58 ffff880149eb8000 0000000000000000 0000000000410028
> Call Trace:
> [<ffffffffa022aba0>] evergreen_cp_resume+0x3a0/0x630 [radeon]
> [<ffffffffa022c8b7>] evergreen_startup+0x157/0x260 [radeon]
> [<ffffffffa01fe8a0>] ? r600_pcie_gart_init+0x60/0x70 [radeon]
> [<ffffffffa022dbec>] evergreen_init+0x1ac/0x2d0 [radeon]
> [<ffffffffa01a5a69>] radeon_device_init+0x409/0x490 [radeon]
> [<ffffffffa01a7142>] radeon_driver_load_kms+0xb2/0x1a0 [radeon]
> [<ffffffffa007fb2e>] drm_get_pci_dev+0x18e/0x300 [drm]
> [<ffffffff8115426f>] ? kmem_cache_alloc_trace+0xff/0x120
> [<ffffffffa023790e>] radeon_pci_probe+0xb2/0xba [radeon]
> [<ffffffff812fea7f>] local_pci_probe+0x5f/0xd0
> [<fffff...

Read more...

Martin Stjernholm (msub) wrote :

I've tested the patched natty kernel and have over 30 boots without the evergreen_cp_resume oops, so the bisected patch indeed appears to be the right one. Does it explain the race?

Seth Forshee (sforshee) wrote :

I've been looking at the patch we identified as fixing the problem, but I can't work out any causal relationship between what it does and the GPU being on when nouveau probes. I've inquired about it on the upstream bugzilla to see if I'm missing something. But I'm beginning to suspect that the patch alters the timing of things enough to prevent the problem from being triggered.

Changed in linux (Ubuntu):
status: Incomplete → In Progress
aproposnix (rimez) wrote :

Hi Seth,

One bad experience I have had since installing the patched kernel. After countless good boots I decided to add the following lines to /etc/tc.local in order to use switcheroo:

chown "username" /sys/kernel/debug/vgaswitcheroo/switch # change "username" with your user name
echo OFF > /sys/kernel/debug/vgaswitcheroo/switch

Everything seemed to work for the first 5 or so boots but then I started getting the blackscreens again. The message was similar but not the same. Unfortunately I didn't manage to get a screen of it.

Once I got back into my desktop I removed the lines from rc.local and then the issue disappeared.

I've gone ahead and added the lines again to see if I can recreate the issue but so far nothing.

Not sure if this info is of any use to you at all but I thought I would share just in case it would.

AceLan Kao (acelankao) wrote :

Seth,

I can confirm the commit
   3448a19 vgaarb: use bridges to control VGA routing where possible.
fixed this issue.
And this issue is a hwcert block issue, so is there anything that I can help to make the SRU process faster?

tags: added: blocks-hwcert
josejuan05 (josejuan05) wrote :

@harry
I don't know if it means anything, but in newer kernels you may be unable to use that command, since /sys/kernel/debug may not be owned by you. A quick and dirty (if dangerous) solution would be to change the first line to
chown "username(:group)" /sys/kernel/debug/ -R
where username is again your username and :group is the optional argument for the group ownership of the folder

Back to the bug, though, I did note one related oops in 50 boots (looking through my logs). I did note when it happened - it did cause a boot failure, but I did not give the evergreen error. Rather I got the gart_set_page error. However, it didn't cause a kernel dump like in the bisects. I only believe that the gart_set_page error to be related because it does not show up in kernels which were susceptible to the evergreen_cp_start oops. This still does not explain the race condition.

FWIW I did some plotting out of the bisects on paper and found that if there was a commit that fixed the gart_set error it was between bisects 014 and 017.

Seth Forshee (sforshee) wrote :

AceLan: The problem right now is that I suspect that the patch doesn't actually do anything to directly fix the problem. I.e., that the patch fixes this oops is just a side-effect of burning more time before the driver tries to access the hardware or something like that. I'm not sure though so I'd like to get confirmation from upstream whether or not the patch is a real fix for the problem, but so far I haven't received a response.

AceLan Kao (acelankao) wrote :

Seth,

To exam your assumption, I reverted that commit and replaced the vga_arbiter_check_bridge_sharing() function call by some delays, but I still encountered the issue.
The error message is the same.
Do you have any suggestion to do the test?

==================
diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c
index ace2b16..5b93935 100644
--- a/drivers/gpu/vga/vgaarb.c
+++ b/drivers/gpu/vga/vgaarb.c
@@ -500,6 +500,8 @@ static bool vga_arbiter_add_pci_device(struct pci_dev *pdev)
                vga_default = pci_dev_get(pdev);
 #endif

+ msleep(1000);
+
        /* Add to the list */
        list_add(&vgadev->list, &vga_list);
        vga_count++;
===================

josejuan05 (josejuan05) wrote :

If gart_ and evergreen_ are related errors (my presumption), I can confirm an actual failure (complete with debug dump) on the patched kernel.

It took about 100 or so boots for this to show up.

Changed in xserver-xorg-driver-ati:
importance: High → Critical
Changed in xserver-xorg-driver-ati:
status: Confirmed → Fix Released
Seth Forshee (sforshee) wrote :

Has anyone tested this yet since oneiric released? I'd like to get confirmation that the problem is fixed there. Thanks!

dsainty (dsainty) wrote :

In Oneiric it seems to do better (in respect to default handling of the hardware), in at least it boots to the integrated card, rather than a black screen.

I haven't had any success switching to the discrete card via vgaswitcheroo though.

Seth Forshee (sforshee) wrote :

Moving status to Fix Released based on positive test results with Oneiric noted in comment #172.

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Elia (elia-baragiola) wrote :

one question i have ubuntu 10.04 LTS
the fix will be released on this version or not?

for fix the bug , What should I do? :D
thanks you !

Ihorko (ihorchyhin) wrote :

I have not such good experience with Oneiric on my HP G62-a35er. First of all by default startup brightness is 0% (see bug #873191). The second is that I tried to turn power on discrete card by adding corresponding command to /etc/rc.local and once some kernel oops occurred at startup (now I moved this command to startup with delay 10 seconds because of some problems with snd_hda_intel too). I can attach part of that fail log next week because of I have only mobile broadband connection from time to time on my laptop.

Cabalbl4 (i-vohmin) wrote :

The bug is back now for me (( The thing I have done before to make it gone - pass "nosplash" instead of "quiet splash" in grub and enable option GRUB_TERMINAL=console and GRUB_GFXMODE=1024x768
Now i have reverted these options - and the bug is back for me on Natty 2.6.38-10-generic.
So it seems to be graphical-mode related. (Maybe plymouth - dependable?)

Now reverting back to working options.

Cabalbl4 (i-vohmin) wrote :

No, the later fix is not working anymore. Now blacklisting radeon to modprobe it manually when switching.

Daniel Buchner (danieljb2) wrote :

I just ran into this bug after installing the latest 11.10 release on a new ENVY, looks like the fix here didn't work :(

Elia (elia-baragiola) wrote :

bug persist :(
someone have update the bios ? now can fix the graphic card?

tags: added: hybrid-graphics
Vangel Ajanovski (ajanovski) wrote :

HP dm4t (ATI 5470), after several installs last weekend - bug no longer present in Xubuntu 11.10 stock, also not present after upgrade to kernel 3.2.0-10. It was still present with LinixMint Debian Edition stock (at that moment 2.6 kernel), but after upgrade to latest kernel it was fixed.
In fact I have not seen this for some months now, but I did a clean reinstall just in order to check it.

LinkedIn
------------

Bug,

I'd like to add you to my professional network on LinkedIn.

- Klaus

Klaus Reichl
Technology Expert at Thales
Austria

Confirm that you know Klaus Reichl:
https://www.linkedin.com/e/-23x794-h2wwfi1a-25/isd/7319909931/SX7S-eb-/?hs=false&tok=1GQFJgiCAvq5g1

--
You are receiving Invitation to Connect emails. Click to unsubscribe:
http://www.linkedin.com/e/-23x794-h2wwfi1a-25/7rYzEsOVuzif46I3SgQu1HdVsk49QSkYAnntYjn/goo/727620%40bugs%2Elaunchpad%2Enet/20061/I2488425681_1/?hs=false&tok=0nTcgEXXAvq5g1

(c) 2012 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA.

Zentai Andras (andras-zentai) wrote :

Really clever way to invite all bug subscribers to your LinkedIn network... ;)

Cabalbl4 (i-vohmin) wrote :

Switched to 12.04. Bug seems to be gone.

kolwas (kolwas) wrote :

<Switched to 12.04. Bug seems to be gone.>
On my machine it is not true, I don't know if there was some updates but now ubuntu starts in 50%

Download full text (5.0 KiB)

Hi all,

I really apoligize for that, still don't know what was going on.

Sorry,
Klaus
--
Klaus Reichl <email address hidden>
Danhausergasse 8/16 +43 6991 84 137 94
1040 Wien

On Tue, Jun 5, 2012 at 9:33 PM, Zentai Andras <email address hidden>wrote:

> Really clever way to invite all bug subscribers to your LinkedIn
> network... ;)
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/727620
>
> Title:
> [Radeon HD 5650 and 5470] Kernel BUG during recovery boot and in
> normal boot (Hybrid graphics)
>
> Status in The Linux Kernel:
> Confirmed
> Status in X.org XServer - ATI gfx chipset driver:
> Fix Released
> Status in “linux” package in Ubuntu:
> Fix Released
> Status in “xserver-xorg-video-ati” package in Ubuntu:
> Invalid
>
> Bug description:
> [Problem]
> On hybrid graphics hardware with this ATI chip and another (e.g. Intel),
> a failure occurs resulting in a black screen and errors from the radeon
> kernel module, as shown below.
>
> [Cause]
> From upstream developer:
>
> "The switcheroo code needs more work to switch properly on some
> systems it seems. There are a set acpi methods required to
> activate/deactivate the respective gpus. The drivers need to load and
> initialize active hw. If the hw is not active when the driver loads,
> then the hw is not set up properly and it won't work. Probably some
> ordering issues in how the switcheroo acpi methods are called."
>
> [Workarounds]
> Several options:
>
> 1. If your BIOS includes functionality to disable the Intel card, use
> BIOS settings to select which chip to load.
>
> 2. Disable KMS by adding `radeon.modeset=0` in the boot line. Note
> that the default radeon gallium driver only works with KMS, so YMMV.
>
> [Original Report]
> I'm running natty, and every since the upgrade to 6.14.0 I've been unable
> to consistently boot. After some discussion in the forums, I tried
> repeatedly to boot into recovery mode. In most cases, I got a black
> screen. One time though, when I was able to successfully increase the
> brightness, I saw some errors from the radeon module. I took a photo
> (available at http://i.imgur.com/P0bQ0.jpg), and here's the stack and
> call trace, as best as I can read it:
>
> Stack:
> ffff880149eb8000 ffff880149eb8000 0000000000000011 0000000000000911
> 00000000fffffff4 ffff88014b6c7800 ffff88014b0f7b58 ffffffffa022aba0
> ffff8801460f7b58 ffff880149eb8000 0000000000000000 0000000000410028
> Call Trace:
> [<ffffffffa022aba0>] evergreen_cp_resume+0x3a0/0x630 [radeon]
> [<ffffffffa022c8b7>] evergreen_startup+0x157/0x260 [radeon]
> [<ffffffffa01fe8a0>] ? r600_pcie_gart_init+0x60/0x70 [radeon]
> [<ffffffffa022dbec>] evergreen_init+0x1ac/0x2d0 [radeon]
> [<ffffffffa01a5a69>] radeon_device_init+0x409/0x490 [radeon]
> [<ffffffffa01a7142>] radeon_driver_load_kms+0xb2/0x1a0 [radeon]
> [<ffffffffa007fb2e>] drm_get_pci_dev+0x18e/0x300 [drm]
> [<ffffffff8115426f>] ? kmem_cache_alloc_trace+0xff/0x120
> [<ffffffffa023790e>] radeon_pci_probe+0xb2/0xba [radeon]
> [<ffffffff812fea...

Read more...

Cabalbl4 (i-vohmin) wrote :

As far as I understand it has time racing condition between intel and radeon modules. And it may be related with vgaswitcheroo too. On earlyer versions of ubuntu changing the waiting time and the gfx mode in grub seemed to affect the bug somehow. Some combination of that even maked it disappear :)

Displaying first 40 and last 40 comments. View all 186 comments or add a comment.