Occasional ATI Radeon 9800 Pro and Intel D815EEA2 mainboard ACPI S3 State Resume Freeze

Bug #1019863 reported by fpgahardwareengineer
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

Hi,

I will like to report occasional instability of ATI Radeon 9800 Pro and Intel D815EEA2 mainboard when resuming from ACPI S3 state.
Roughly 9 out of 10 times, the computer resumes from ACPI S3 State fine, but once in a while, it will freeze.
This problem is pretty difficult to reproduce (happens randomly), but again, it happens about roughly 9 out of 10 tries.

System Configuration:

- Intel Pentium III 1 GHz
- Intel D815EEA2 mainboard
  * BIOS Version P21-0039
  * Power -> ACPI -> ACPI: S3 State
  * Power -> ACPI -> Video RePOST: Off

- 512 MB PC133 SDRAM
  * 256MB module
  * 128MB module
  * 128MB module
- ATI Radeon 9800 Pro
  * 128MB
- Seagate ST320414A 20GB PATA hard drive
- Pioneer DVR-105 PATA DVD-RW drive
- Hitachi CDR-7930 PATA CD-ROM drive
- USB keyboard
- USB mouse

Note that graphics performance of ATI Radeon 9800 Pro display device driver is very good that I don't have any complaints (Very good compared to older generation graphics cards, that is.).
I don't know any graphics drawing related issues either.
It is only that ACPI S3 State resume doesn't work 100% reliably.
Please fix this problem.

Regards,

fpgahardwareengineer

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: xorg 1:7.6+12ubuntu1
ProcVersionSignature: Ubuntu 3.2.0-26.41-generic-pae 3.2.19
Uname: Linux 3.2.0-26-generic-pae i686
.tmp.unity.support.test.0:

ApportVersion: 2.0.1-0ubuntu8
Architecture: i386
CompizPlugins: [core,composite,opengl,compiztoolbox,decor,vpswitch,snap,mousepoll,resize,place,move,wall,grid,regex,imgpng,session,gnomecompat,animation,fade,unitymtgrabhandles,workarounds,scale,expo,ezoom,unityshell]
CompositorRunning: compiz
Date: Sun Jul 1 12:41:55 2012
DistUpgraded: Fresh install
DistroCodename: precise
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes, whatever it takes to get this fixed in Ubuntu
GpuHangFrequency: Very infrequently
GpuHangReproducibility: Seems to happen randomly
GpuHangStarted: Today
GraphicsCard:
 Advanced Micro Devices [AMD] nee ATI Radeon R350 [Radeon 9800 Pro] [1002:4e48] (prog-if 00 [VGA controller])
   Subsystem: Advanced Micro Devices [AMD] nee ATI Device [1002:0002]
   Subsystem: Advanced Micro Devices [AMD] nee ATI Device [1002:0003]
InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Release i386 (20120423)
Lsusb:
 Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 002: ID 413c:1002 Dell Computer Corp. Keyboard Hub
 Bus 002 Device 002: ID 093a:2510 Pixart Imaging, Inc. Optical Mouse
 Bus 001 Device 003: ID 413c:2002 Dell Computer Corp. SK-8125 Keyboard
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-26-generic-pae root=UUID=553e796c-d5d8-409a-99ab-d58ccd562ee9 ro quiet splash vt.handoff=7
SourcePackage: xorg
Symptom: display
Title: Xorg freeze
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 11/06/2002
dmi.bios.vendor: Intel Corp.
dmi.bios.version: EA81520A.86A.0039.P21.0211061753
dmi.board.name: D815EEA2
dmi.board.vendor: Intel Corporation
dmi.board.version: AAA45156-206
dmi.chassis.type: 2
dmi.modalias: dmi:bvnIntelCorp.:bvrEA81520A.86A.0039.P21.0211061753:bd11/06/2002:svn:pn:pvr:rvnIntelCorporation:rnD815EEA2:rvrAAA45156-206:cvn:ct2:cvr:
version.compiz: compiz 1:0.9.7.8-0ubuntu1
version.libdrm2: libdrm2 2.4.32-1ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 8.0.2-0ubuntu3.1
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 8.0.2-0ubuntu3.1
version.xserver-xorg-core: xserver-xorg-core 2:1.11.4-0ubuntu10.2
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.7.0-0ubuntu1.2
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:6.14.99~git20111219.aacbd629-0ubuntu2
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.17.0-1ubuntu4
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:0.0.16+git20111201+b5534a1-1build2

Revision history for this message
fpgahardwareengineer (mypersonalmailbox1) wrote :
Revision history for this message
fpgahardwareengineer (mypersonalmailbox1) wrote :

Let me correct the following sentence.

"This problem is pretty difficult to reproduce (happens randomly), but again, it happens about roughly 9 out of 10 tries."

It should read, "This problem is pretty difficult to reproduce (happens randomly), but again, the freeze happens about roughly 1 out of 10 tries."

Regards,

fpgahardwareengineer

bugbot (bugbot)
affects: xorg (Ubuntu) → xserver-xorg-video-intel (Ubuntu)
bugbot (bugbot)
tags: added: resume
Revision history for this message
fpgahardwareengineer (mypersonalmailbox1) wrote :

bugbot,

This posting belongs to xserver-xorg-video-ati, not xserver-xorg-video-intel because I am using an ATI graphics card.

Regards,

fpgahardwareengineer

Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Invalid
penalvch (penalvch)
no longer affects: xorg (Ubuntu)
no longer affects: xserver-xorg-video-ati (Ubuntu)
tags: added: needs-upstream-testing
affects: xserver-xorg-video-intel (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Invalid → Incomplete
Revision history for this message
penalvch (penalvch) wrote :

fpgahardwareengineer, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please provide the information following https://wiki.ubuntu.com/DebuggingKernelSuspend ?

As well, could you please test for this with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the kernel in the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.7-rc1-quantal

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

If you are unable to test the mainline kernel, please comment as to why specifically you were unable to test it and add the following tags:
kernel-unable-to-test-upstream
kernel-unable-to-test-upstream-VERSION-NUMBER

Please let us know your results. Thank you for your understanding.

Helpful Bug Reporting Tips:
https://help.ubuntu.com/community/ReportingBugs

tags: added: suspend
removed: freeze
Revision history for this message
fpgahardwareengineer (mypersonalmailbox1) wrote :

(In reply to #5)

Hi Christopher,

For Ubuntu 12.04 (Precise Pangolin), which version of latest upstream kernel shoul I test with?
Should I try,

 ...
[DIR] v3.4-rc7-precise/ 13-May-2012 02:54 -
...
[DIR] v3.7-rc2-quantal/ 20-Oct-2012 19:49 -
[DIR] v3.7-rc2-raring/ 22-Oct-2012 08:44 -

This information is from http://kernel.ubuntu.com/~kernel-ppa/mainline, dated 10/23/2012.
Will my Ubuntu 12.04 work with Linux kernel 3.7?

Regards,

fpgahardwareenginer

Revision history for this message
penalvch (penalvch) wrote :
tags: added: kernel-fixed-upstream-v3.7-rc7-raring
removed: needs-upstream-testing
Revision history for this message
fpgahardwareengineer (mypersonalmailbox1) wrote :

Hi,

I will like to report that I tested Intel D815EEA2 mainboard with ATI Radeon 9800 Pro AGP graphics card with v3.7-rc7-raring version of the Linux kernel on Ubuntu 12.04 LTS 32-bit, and after performing 7 resumes with Firefox running on the background, I did not observe a single freeze related to ACPI S3 State resume.
However, if I went back to the standard kernel of Ubuntu 12.04 LTS 32-bit, then I saw several freezes when performing an ACPI S3 State resume.
It seems like "something" was fixed between Linux 3.2 and 3.7 kernels.
It is possible that I was just "lucky" with Linux 3.7 kernel, but typically, resume typically fails on third or fourth time with Linux 3.2 kernel.
Please change the status of this bug so that it doesn't turn into "Invalid" status.

Regards,

fpgahardwareengineer

summary: - Occasional ATI Radeon 9800 Pro and Intel D815EEA2 ACPI S3 State Resume
- Freeze
+ Occasional ATI Radeon 9800 Pro and Intel D815EEA2 mainboard ACPI S3
+ State Resume Freeze
tags: added: kernel-bug-exists-upstream-v3.7-rc7-raring
removed: kernel-fixed-upstream-v3.7-rc7-raring
Revision history for this message
fpgahardwareengineer (mypersonalmailbox1) wrote :

(In reply to comment #8)

Hi,

Okay, I just had a crash with Linux 3.7 kernel.
Whether it is Linux 3.2 or 3.7 kernel, it seems like page fault is one of the cause of the crash after ACPI S3 State resume.
However, Ubuntu is a lot more stable with Linux 3.7 kernel, but when it crashed, the crash pattern was similar to when it crashes with Linux 3.2 kernel.
Also with Linux 3.2 kernel, even after resuming from ACPI S3 State once, the system tends to get unstable (i.e., I get random "Ubuntu has experienced internal error" type errors. The module that crashes is pretty much random.).
    Is this bug specific to Pentium III or Intel 815 chipset?
I will rule out ICH2 southbridge because it was also used with early Pentium 4 systems, and those are fairly stable when it comes to ACPI S3 State resume.
I have a couple of Intel and ASUS mainboards with Intel 850 chipset, and I have never seen this types of issues with ACPI S3 State resume.
I will change the tag to reflect the latest development.
I also have one other Intel 815E chipset mainboard from Intel (Intel D815EEA mainboard) so I will take a look at that one to see if I can reproduce the similar crash patterns.
Intel did have many, many updates related to ACPI S3 State resume issues with D815EEA2 mainboard in the past.

Regards,

fpgahardwareengineer

Revision history for this message
fpgahardwareengineer (mypersonalmailbox1) wrote :

Hi,

I uploaded kern.log to freedesktop.org's bugzilla.

https://bugs.freedesktop.org/show_bug.cgi?id=54583

If someone can link the above bug report to this launchpad bug report, I will appreciate.

Regards,

fpgahardwareengineer

Revision history for this message
penalvch (penalvch) wrote :

fpgahardwareengineer, could you please test the newest mainline kernel http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.8-rc1-raring/ ?

tags: added: kernel-bug-exists-upstream-v3.7-rc7
removed: kernel-bug-exists-upstream-v3.7-rc7-raring
tags: added: needs-upstream-testing
bugbot (bugbot)
tags: added: freeze
penalvch (penalvch)
no longer affects: xserver-xorg-video-ati (Ubuntu)
tags: removed: freeze
Revision history for this message
fpgahardwareengineer (mypersonalmailbox1) wrote :

Hi Christopher,

I don't have access to this mainboard for 2 more weeks due to travel, but before I departed, I tested the board with a different set of PC133 SDRAM DIMMs.
Interestingly, it did not cause a freeze with these DIMM modules.
I put my computer into standby about 10 times, and the computer came out of standby 10 times in a row.
I ran Firefox on the background to put more "stress" on the system.
Based on my judgement, it seems like with the previous DIMM modules, the DRAM is getting corrupted after coming out of ACPI S3 State.
It seems like the slow refresh mode is not working properly on at least one of the previous DIMM module.
By the way, with the previous DIMM modules, they do pass Memtest86+ without any errors, but I have learned from my experience troubleshooting PC related problems so often that some bad DIMM modules do cause memory cell corruption if the DIMM modules enter slow refresh mode.
The memory cell corruption issue manifests itself after waking up the computer after the contents of the memory is corrupted.
For the previous DIMM modules used, it was something like this,

- PNY PC133 256MB module (DS)
- generic PC133 128MB module (SS, Micron Technology DRAM chip)
- generic PC133 128MB module (SS)

"generic" DIMM module means it is not obvious from the DIMM that who manufactured the DIMM.
For the configuration that didn't cause any issues, this was the configuration.

- Toshiba PC133 256MB module (DS, Toshiba DRAM chip)
- Toshiba PC133 256MB module (DS, Toshiba DRAM chip)

SS means "Single Sided."
DS means "Double Sided."
I do have more PC133 SDRAM DIMMs so I will test the mainboard with more DIMMs to see if this bug is specific to one bad module.
I will update the above list when I get back.
At this point, it is increasingly likely that a bad DIMM this mysterious bug.
If this is the case, I am very sorry for wasting the resources of the Linux developers, but unfortunately, it is often difficult to catch slow refresh related memory cell corruption hardware failures.
Memtest86+ does not have the means to catch this kind of problem, and I have seen this type of bug with a DDR SDRAM DIMM a few months ago.

Regards,

fpgahardwareengineer

Revision history for this message
fpgahardwareengineer (mypersonalmailbox1) wrote :

Correction:

At this point, it is increasingly likely that a bad DIMM this mysterious bug. -> At this point, it is increasingly likely that a bad DIMM caused this mysterious bug.

Revision history for this message
penalvch (penalvch) wrote :

fpgahardwareengineer, this bug report is being closed due to your last comment https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1019863/comments/13 regarding this being due to hardware failure. For future reference you can manage the status of your own bugs by clicking on the current status in the yellow line and then choosing a new status in the revealed drop down box. You can learn more about bug statuses at https://wiki.ubuntu.com/Bugs/Status. Thank you again for taking the time to report this bug and helping to make Ubuntu better. Please submit any future bugs you may find.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
fpgahardwareengineer (mypersonalmailbox1) wrote :

Hi,

I have not commented on this bug for some time, but I will like to provide the final analysis so that it can be closed up for good.
After not seeing any crashes/freezes when I used Toshiba 256 MB SDRAM modules (Toshiba module with Toshiba DRAM devices), I tried playing around with different combinations of the original 2 (two) 128 MB SDRAM modules and 1 (one) 256 MB SDRAM module.
It turns out one of the no brand 128 MB SDRAM module with Mosel Vitelic DRAM devices seems to cause crashes/freezes when I put the computer into ACPI S3 State and wake it up.
It appears that the self refresh mode of this 128 MB SDRAM module with Mosel Vitelic DRAM device causes memory content corruption, and this leads Ubuntu to freeze/crash at some point (i.e., After putting the computer in and out of ACPI S3 State so many times.).
Just for the record, the marking of the DRAM device says V54C3128804VAT7 with a date code 0213 (13th week of 2002).
I assume I just got a bad memory module.
    I didn't notice this until now, and insisted that memtest86+ didn't catch any memory read/write error, but if I think about it, memtest86+ is powerless as a tool against this type of a failure is because it doesn't have the means to put the memory into self refresh mode and bring it out to check the content of the memory array.
While this case is specific to SDRAM, this type of failure can occur on any DRAM type since SDRAM because all of them now have self refresh mode.
So if there is anyone reading this analysis, remember that the memory array can get corrupted if for some reason self refresh mode no longer works correctly (i.e., due to aging), and this can lead to seemingly random crashes/freezes after waking the computer up from ACPI S3 State.
I know that this case is already closed and I am now happy to finally close this case as well.

Regards,

fpgahardwareengineer

Changed in linux (Ubuntu):
status: Invalid → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.