I was pointed to this by a friend of mine (Mayank Rungta) whom I helped crack a radeon driver OOPs on suspend: (bug 820746)

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/820746/

For some reason, he wanted me to take a look at this issue as well which has been inactive for sometime :)

Detailed RCA and some action items for the guy who reproduced this. Its evident that he had 3 connectors (or displays) attached to the radeon driver during boot time and might be also suspending with them. (LVDS laptop display + VGA connector display + HDMI connector attached). Need to know how it was reproduced or whether laptop lid was closed/suspended with all 3 connectors attached. So I can ask my friend with a similar hardware and radeon driver (same guy who reproduced 820746) to reproduce this.

Read ahead for the full story:
=====================

Again using the objdump disassembly of the radeon driver (radeon.ko.out) attached from bug 820746 (same 2.6.38), I managed to crack the place that's causing the OOPs.

Reverse engineering the OOPs to the assembly and mapping the assembly to C code, the panic was triggered by this instruction on radeon driver suspend in radeon_suspend_kms:

radeon_suspend_kms.c:

  /* turn off display hw */
list_for_each_entry(connector, &dev->mode_config.connector_list, head) {
drm_helper_connector_dpms(connector, DRM_MODE_DPMS_OFF);
}

  In the above list_head iteration of connector_list for the radeon drm_device on SUSPEND, the dev->mode_config.connector_list.next is NULL. 

Or in other words, the DRM device connector list is _corrupted_. Its mostly certain that the device connector was detached or destroyed while suspend is trying to switch off display on all your connectors. 

dev->mde_config.connector_list.next is NULL
 the panic or faulting instruction EIP was triggered by a NULL in register EBX 
  EBX value is 0xfffffea8
  which is nothing but:   NULL pointer - 0x158,
 which is nothing but: ~0U - 0x157.

  The OOPs EIP is at:
  radeon_suspend_kms+0x78

which from objdump disassembly maps to:

which is radeon_suspend_kms + 19888
  19888: 8b 83 58 01 00 00 mov 0x158(%ebx),%eax
  bingo:
  as thats a list_entry macro trying to iterator "struct drm_connector" or drm connector list. The drm connector list head field is at offset 0x158 which has to be subtracted from the list_head pointer to arrive at the drm_connector.

So at panic time, the radeon driver OOps while trying to suspend display on each of the connected devices.
But the connector list was corrupted.

Also the OOPs hexdump exactly matches the objdump dissassembly hexdump at the time of the panic:
81 eb 58 01 00 00 <8b> 83 58 01 00 00 0f
<8b> (angular brackets) is the faulting instruction or the "mov".

This matches the list_head walk for the drm connector from the objdump disassembly of radeon_suspend_kms function:
  19882:   81 eb 58 01 00 00       sub    $0x158,%ebx
   19888:   8b 83 58 01 00 00       mov    0x158(%ebx),%eax ->PANIC EIP is here.


Now that we know the C code and the reason of the Oops or the null pointer field, we have to trace backwards in code and see how the drm connector list can be corrupted or can have NULL as a list element or a corrupted connector element in the drm_connector list.

I cross-checked that there is only one place where the connector can be destroyed which is in drm_mode_config_cleanup which is called only on radeon unload. And this kind of corruption can typically happen if the code tries to use list_entry_for_each instead of list_entry_for_each_safe while detaching each of the entries in the list -> in this case the radeon device drm connectors.

But its seen that the connector destroy or detach for radeon: radeon_connector_destroy or radeon_dp_connector_destroy invoke drm_connector_cleanup which correctly removes the connector from the list with list_del before freeing it. So it isn't obvious as well since the code does seem to be safe w.r.t removing or detaching each of the displays/connectors attached to the display.(radeon in this case)

So I am not sure if we are hitting a race condition with suspend trying to switch off display on each of your connectors whilst the radeon driver is getting unloaded parallely. (race condition since the switch off or suspend code doesn't take the dev->mode config mutex for the walk). So its possible that its a race.

I found from your boot time dmesg that you had 3 displays attached. (laptop LVDS + VGA + HDMI).
So please let us know the reproduction scenario and whether you tried to suspend with all 3 connections or you pulled one of the connectors (disabled) and then suspended/hibernated your laptop or you were shutting down (not sure)

The panic or the problem is because of a corruption in the management of the connectors to the display from the drm/radeon driver code. So it would help us narrow down the cause to the culprit or help us re-create and go further.

Just to let you know that I am not a  video driver expert by any means but at least I am possessed with a good ability to debug and hence volunteered to take a stab at this issue based on my friends request. So even if the true cause behind the connector corruption in the display driver isn't found, don't take it to heart :)

[   11.572380] [drm] Radeon Display Connectors
[   11.572384] [drm] Connector 0:
[   11.572387] [drm]   LVDS
[   11.572390] [drm]   DDC: 0x7f68 0x7f68 0x7f6c 0x7f6c 0x7f70 0x7f70 0x7f74 0x7f74
[   11.572393] [drm]   Encoders:
[   11.572395] [drm]     LCD1: INTERNAL_UNIPHY2
[   11.572397] [drm] Connector 1:
[   11.572399] [drm]   VGA
[   11.572402] [drm]   DDC: 0x7e40 0x7e40 0x7e44 0x7e44 0x7e48 0x7e48 0x7e4c 0x7e4c
[   11.572405] [drm]   Encoders:
[   11.572407] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
[   11.572409] [drm] Connector 2:
[   11.572411] [drm]   HDMI-A
[   11.572412] [drm]   HPD1
[   11.572415] [drm]   DDC: 0x7e50 0x7e50 0x7e54 0x7e54 0x7e58 0x7e58 0x7e5c 0x7e5c
[   11.572418] [drm]   Encoders:
[   11.572420] [drm]     DFP1: INTERNAL_UNIPHY