system halted while idled for a long time.

Bug #1002170 reported by TienFu Chen
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Medium
linux (Ubuntu)
Fix Released
Medium
Jesse Sung
Precise
Fix Released
Undecided
Jesse Sung
Quantal
Fix Released
Medium
Jesse Sung

Bug Description

system: Lenovo ThinkCentre S510, 201108-8941
display can't function normally. See attached image(Photo 12-5-18 17 38 38.jpg).
It's hard to reproduce it, still trying to find the root cause.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-24-generic 3.2.0-24.37
ProcVersionSignature: Ubuntu 3.2.0-24.37-generic 3.2.14
Uname: Linux 3.2.0-24-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
ApportVersion: 2.0.1-0ubuntu5
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: ubuntu 1425 F.... pulseaudio
 /dev/snd/controlC1: ubuntu 1425 F.... pulseaudio
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Card0.Amixer.info:
 Card hw:0 'PCH'/'HDA Intel PCH at 0xfe500000 irq 46'
   Mixer name : 'Intel CougarPoint HDMI'
   Components : 'HDA:10ec0269,17aa307b,00100100 HDA:80862805,80862805,00100000'
   Controls : 21
   Simple ctrls : 9
Card1.Amixer.info:
 Card hw:1 'Camera'/'Vimicro Corp. Integrated Camera at usb-0000:00:1a.0-1.1, high speed'
   Mixer name : 'USB Mixer'
   Components : 'USB0ac8:c448'
   Controls : 2
   Simple ctrls : 1
Card1.Amixer.values:
 Simple mixer control 'Mic',0
   Capabilities: cvolume cvolume-joined cswitch cswitch-joined penum
   Capture channels: Mono
   Limits: Capture 0 - 48
   Mono: Capture 38 [79%] [11.00dB] [on]
CurrentDmesg:
 [ 14.048567] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
 [ 24.820147] eth0: no IPv6 routers present
 [ 28.780122] audit_printk_skb: 36 callbacks suppressed
 [ 28.780124] type=1400 audit(1337581825.065:24): apparmor="DENIED" operation="open" parent=1 profile="/usr/lib/telepathy/mission-control-5" name="/usr/share/gvfs/remote-volume-monitors/" pid=1679 comm="mission-control" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
Date: Mon May 21 02:38:11 2012
HibernationDevice: RESUME=UUID=206c9433-fac6-4ed9-a0d7-40b6490f07db
InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Release amd64 (20120425)
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
MachineType: LENOVO 09876543211234567890
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.2.0-24-generic root=UUID=21d42de8-29a1-4b75-bdec-7821cb02e610 ro quiet splash initcall_debug vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-24-generic N/A
 linux-backports-modules-3.2.0-24-generic N/A
 linux-firmware 1.79
RfKill:

SourcePackage: linux
StagingDrivers: mei rts5139
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 06/22/2011
dmi.bios.vendor: LENOVO
dmi.bios.version: 9PKT20AUS
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: To be filled by O.E.M.
dmi.board.vendor: LENOVO
dmi.board.version: To be filled by O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnLENOVO:bvr9PKT20AUS:bd06/22/2011:svnLENOVO:pn09876543211234567890:pvrLenovoProduct:rvnLENOVO:rnTobefilledbyO.E.M.:rvrTobefilledbyO.E.M.:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.name: 09876543211234567890
dmi.product.version: Lenovo Product
dmi.sys.vendor: LENOVO

Revision history for this message
TienFu Chen (ctf) wrote :
description: updated
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
TienFu Chen (ctf)
description: updated
summary: - system halted whilen resuming from suspend
+ system halted while resuming from suspend
Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu):
assignee: Anthony Wong (anthonywong) → Jesse Sung (wenchien)
TienFu Chen (ctf)
summary: - system halted while resuming from suspend
+ system halted while idled for a long time.
description: updated
Jesse Sung (wenchien)
Changed in linux (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
Jesse Sung (wenchien) wrote :

S510 with 12.04 may halt in less than 24 hours. But if we disable RC6, it works quite well. Now ctf enables RC6 again, let's see how it goes this time.

Revision history for this message
Jesse Sung (wenchien) wrote :

After enabling RC6, S510 halts after a 30-minute idle.

Revision history for this message
In , Jesse Sung (wenchien) wrote :

Ref: https://bugs.launchpad.net/bugs/1002170

Hi,

Lenovo ThinkCentre S510 (SandyBridge i5-2500S) may hang after idling for some period of time. This may be related to rc6 since that if I add i915.i915_enable_rc6=0 to boot parameter then this issue is gone.

This issue also happens on kernel 3.4.

A screenshot when system hangs:
https://launchpadlibrarian.net/105661463/Photo%2012-5-18%2017%2038%2038.jpg

If there's any other information needed, please kindly let me know.

Thank you.

Revision history for this message
Jesse Sung (wenchien) wrote :

Tested with mainline build kernel 3.4 in kernel-ppa, this issue still exists.

Revision history for this message
Jesse Sung (wenchien) wrote :
Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Please attach dmesg with drm.debug=0xe added to your kernel commandline. Also, how dead is the system? I.e. does network/ssh still work, does the magic SysRq to reboot still work, or is it a true hard-hang? And can you try to wire up netconsole so that we could have a peak at the last breaths of the system before it goes down?

Revision history for this message
In , Jesse Sung (wenchien) wrote :

Created attachment 63062
netconsole output

Hi Daniel,

Please find the attached file for netconsole output.

When it hangs, neither SysRq magic nor network/ssh works. From ssh terminal I can tell that it died after 57 minutes, but the last entry in the log is at 1787.315652, so there's no log when the system goes down.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Can you also attach dmesg so that we have all the interesting lines from boot-up with drm.debug=0xe, too?

Revision history for this message
In , Jesse Sung (wenchien) wrote :

Created attachment 63064
dmesg

dmesg is attached.

Thank you.

Revision history for this message
In , Jesse Sung (wenchien) wrote :

Hi Daniel,

Is there anything I can do to get more info about this issue?

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

I'm running a bit low on ideas, but one thing would be to stop all drm clients (i.e. X) and check whether it still hangs. We still need to load the drm/i915.ko driver, because only when we load and enable rc6 can the cpu die actually reach the lowest power state, i.e. I want to check whether this might be an issue outside of the gpu, only brought to light due to the low power state.

Revision history for this message
In , Jesse Sung (wenchien) wrote :

Daniel,

Tried with a normal boot, and stopped all X related processes. System hangs after 15 hours.

Revision history for this message
In , Chris Wilson (ickle) wrote :

Ok, what happens if the i915 is never loaded at all? Try something like adding
blacklist i915 to modprobe.conf, or append i915.noload to your kernel commandline.

Revision history for this message
In , Jesse Sung (wenchien) wrote :

Hi Chris,

By adding i915 into blacklist and using text mode, system runs without any problem and has "2 days, 19:15" uptime so far.

TienFu Chen (ctf)
tags: added: quantal
Revision history for this message
Jesse Sung (wenchien) wrote :

Please try http://people.canonical.com/~jesse/lp1002170/ and see if this issue still exists.
Thank you.

Revision history for this message
In , Jesse Sung (wenchien) wrote :

Created attachment 65942
disable rc6 for some models

Hi Daniel and Chris,

Since there's another snb machine does not work well when rc6 is enabled ( https://launchpad.net/bugs/1008867 ), maybe we can just disable rc6 for these machines to make them at least work?

Revision history for this message
In , bwidawsk (bwidawsk) wrote :

By any chance, does this patch help?
https://patchwork.kernel.org/patch/1363021/

Changed in linux:
importance: Unknown → Medium
status: Unknown → In Progress
Ara Pulido (ara)
tags: added: blocks-hwcert
removed: blocks-hwcerts
Revision history for this message
In , Jesse Sung (wenchien) wrote :

Hi Ben,

No, this patch does not help. System hangs after 2 days and 6 hours.

Revision history for this message
In , Jesse Sung (wenchien) wrote :

Hi,

What do you think of the patch in #c10 ? Should I send it to mailing list also?

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

I think it'd be much better to figure out the root cause and fix it - since likely these rc6 issues don't have anything to do with these models specifically, we just haven't figured out yet what the real problem is.

Revision history for this message
In , Jesse Sung (wenchien) wrote :

Hi Daniel,

Then I guess it's better to have a new bug entry for lp1008867. :)
https://launchpad.net/bugs/1008867
I'll create one later.

Also, please could you suggest what I can do to get useful info for finding out the root cause?

Thank you.

Changed in linux:
status: In Progress → Confirmed
Jesse Sung (wenchien)
Changed in linux (Ubuntu):
status: In Progress → Confirmed
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Quantal):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu Precise):
status: New → Fix Committed
Revision history for this message
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel for Precise in -proposed solves the problem (3.2.0-32.51). Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-precise' to 'verification-done-precise'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-precise
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.5.0-16.24

---------------
linux (3.5.0-16.24) quantal-proposed; urgency=low

  [ Andy Whitcroft ]

  * SAUCE: ata_piix: add a disable_driver option
    - LP: #994870

  [ Christian König ]

  * (pre-stable) drm/radeon: make 64bit fences more robust v3 (3.5 stable)
    - LP: #1029582

  [ David Henningsson ]

  * SAUCE: ALSA: hda - use both input paths on Conexant auto parser
    - LP: #1037642
  * SAUCE: ALSA: hda - fix control names for multiple speaker out on
    IDT/STAC
    - LP: #1046734

  [ Herton Ronaldo Krzesinski ]

  * SAUCE: ALSA: hda/via - don't report presence on HPs with no presence
    support
    - LP: #1052499
  * SAUCE: ext4: fix crash when accessing /proc/mounts concurrently
    - LP: #1053019
  * SAUCE: ALSA: hda/realtek - Fix detection of ALC271X codec
    - LP: #1006690

  [ Kyle Fazzari ]

  * SAUCE: input: Cypress PS/2 Trackpad fix disabling tap-to-click
    - LP: #1048816

  [ Leann Ogasawara ]

  * [Config] Disable CONFIG_DRM_AST
    - LP: #1053290

  [ Stefan Bader ]

  * [Config] Disable the Cirrus QEMU drm driver
    - LP: #1038055

  [ Upstream Kernel Changes ]

  * Revert "KVM: VMX: Fix KVM_SET_SREGS with big real mode segments"
    - LP: #1045027
  * x86, efi: Handover Protocol
  * drm/i915: HDMI - Clear Audio Enable bit for Hot Plug
    - LP: #1056729
  * UBUNTU SAUCE: apparmor: fix IRQ stack overflow
    - LP: #1056078
  * drm/nouveau: fix booting with plymouth + dumb support
    - LP: #1043518
  * ALSA: hda - Add DeviceID for Haswell HDA
    - LP: #1057698
  * ALSA: hda - add Haswell HDMI codec id
    - LP: #1057698
  * ALSA: hda - Fix driver type of Haswell controller to AZX_DRIVER_SCH
    - LP: #1057698
  * ALSA: hda_intel: Add Device IDs for Intel Lynx Point-LP PCH
    - LP: #1011438, #1057698

  [ Wang Xingchao ]

  * SAUCE: ALSA: hda - Add another pci id for Haswell board
    - LP: #1057698

  [ Wen-chien Jesse Sung ]

  * SAUCE: drm/i915: Explicitly disable RC6 for certain models
    - LP: #1002170, #1008867
 -- Leann Ogasawara <email address hidden> Thu, 27 Sep 2012 13:55:52 -0700

Changed in linux (Ubuntu Quantal):
status: Fix Committed → Fix Released
Jesse Sung (wenchien)
Changed in linux (Ubuntu Precise):
assignee: nobody → Jesse Sung (wenchien)
Revision history for this message
TienFu Chen (ctf) wrote :

Bug is fixed with kernel 3.5.0-16.25 on Quantal and 3.2.0-32.51 on Precise.

Revision history for this message
TienFu Chen (ctf) wrote :

Continue comment #25, test time is over 12 hours.

Revision history for this message
Luis Henriques (henrix) wrote :

As per comments #25 and #26 (and IRC chat), I'm tagging this bug as verified in Precise.

tags: added: verification-done-precise
removed: verification-needed-precise
Revision history for this message
Adam Conrad (adconrad) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (18.4 KiB)

This bug was fixed in the package linux - 3.2.0-32.51

---------------
linux (3.2.0-32.51) precise-proposed; urgency=low

  [Luis Henriques]

  * Release Tracking Bug
    - LP: #1056036

  [ Keng-Yu Lin ]

  * SAUCE: Intel xhci: Only switch the switchable ports
    - LP: #1034814

  [ Kyle Fazzari ]

  * SAUCE: input: Cypress PS/2 Trackpad fix disabling tap-to-click
    - LP: #1048816

  [ Seth Forshee ]

  * SAUCE: Input: synaptics - Adjust threshold for treating position values
    as negative
    - LP: #1046512

  [ Stefan Bader ]

  * Revert "SAUCE: Force xsave off on older Xen hypervisors"
    - LP: #1044550

  [ Upstream Kernel Changes ]

  * Revert "HID: wiimote: fix invalid power_supply_powers call"
    - LP: #1048605
  * Revert "drm/radeon: fix bo creation retry path"
    - LP: #1049899
  * HID: wiimote: fix invalid power_supply_powers call
    - LP: #1048605
  * HID: add ASUS AIO keyboard model AK1D
    - LP: #1027789, #1049899
  * nfs: tear down caches in nfs_init_writepagecache when allocation fails
    - LP: #1049899
  * NFS: Use kcalloc() when allocating arrays
    - LP: #1049899
  * NFSv4.1 fix page number calculation bug for filelayout decode buffers
    - LP: #1049899
  * fix page number calculation bug for block layout decode buffer
    - LP: #1049899
  * pnfs: defer release of pages in layoutget
    - LP: #1049899
  * ext4: avoid kmemcheck complaint from reading uninitialized memory
    - LP: #1049899
  * fuse: verify all ioctl retry iov elements
    - LP: #1049899
  * Bluetooth: Fix legacy pairing with some devices
    - LP: #1049899
  * xhci: Increase reset timeout for Renesas 720201 host.
    - LP: #1049899
  * xhci: Add Etron XHCI_TRUST_TX_LENGTH quirk.
    - LP: #1049899
  * USB: ftdi_sio: Add VID/PID for Kondo Serial USB
    - LP: #1049899
  * USB: option: Add Vodafone/Huawei K5005 support
    - LP: #1049899
  * USB: add USB_VENDOR_AND_INTERFACE_INFO() macro
    - LP: #1049899
  * USB: support the new interfaces of Huawei Data Card devices in option
    driver
    - LP: #1049899
  * usb: serial: mos7840: Fixup mos7840_chars_in_buffer()
    - LP: #1049899
  * usb: gadget: u_ether: fix kworker 100% CPU issue with still used
    interfaces in eth_stop
    - LP: #1049899
  * ARM: 7483/1: vfp: only advertise VFPv4 in hwcaps if CONFIG_VFPv3 is
    enabled
    - LP: #1049899
  * ARM: 7488/1: mm: use 5 bits for swapfile type encoding
    - LP: #1049899
  * ARM: 7489/1: errata: fix workaround for erratum #720789 on UP systems
    - LP: #1049899
  * drm/i915: ignore eDP bpc settings from vbt
    - LP: #1049899
  * ALSA: hda - fix Copyright debug message
    - LP: #1049899
  * sched: fix divide by zero at {thread_group,task}_times
    - LP: #1049899
  * ath9k: fix decrypt_error initialization in ath_rx_tasklet()
    - LP: #1049899
  * drm/nvd0/disp: mask off high 16 bit of negative cursor x-coordinate
    - LP: #1049899
  * drm/i915: reorder edp disabling to fix ivb MacBook Air
    - LP: #1049899
  * audit: don't free_chunk() after fsnotify_add_mark()
    - LP: #1049899
  * audit: fix refcounting in audit-tree
    - LP: #1049899
  * vfs: canonicalize create mode in build_open_flags()
    - LP: #1049899
  * PCI: EHCI: Fix crash d...

Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
Revision history for this message
In , Chris Wilson (ickle) wrote :

*** Bug 53626 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Created attachment 71524
implement Hiz w/a for msaa

Kernel patch, please test.

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Also: Is this an SNB GT1? Please spec the exact model and pci id of the VGA device.

Revision history for this message
In , Jani-nikula (jani-nikula) wrote :

Please test and provide the requested info.

Revision history for this message
In , Limoto (limoto94) wrote :

Hello, I have a similiar issue. My system ocassionally hangs with the same screen corruption, plays audio in a loop for about a second and then the laptop fan revs up.

It happens more often when playing some flash videos. Sometime the system hangs twice a days, sometimes after a week. I'm going to try disabling rc6 after next crash.

It's a MSI CR640 Sandy Bridge laptop with i3-2310M and:
00:02.0 VGA compatible controller [0300]: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0116] (rev 09)

Thank you

Changed in linux:
status: Confirmed → Incomplete
Revision history for this message
In , Jesse Sung (wenchien) wrote :

It is an Intel(R) Core(TM) i5-2500S CPU @ 2.70GHz, and the VGA device is
00:02.0 VGA compatible controller [0300]: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0102] (rev 09) (prog-if 00 [VGA controller])
 Subsystem: Lenovo Device [17aa:307b]
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
 Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0
 Interrupt: pin A routed to IRQ 45
 Region 0: Memory at fe000000 (64-bit, non-prefetchable) [size=4M]
 Region 2: Memory at d0000000 (64-bit, prefetchable) [size=256M]
 Region 4: I/O ports at f000 [size=64]
 Expansion ROM at <unassigned> [disabled]
 Capabilities: <access denied>
 Kernel driver in use: i915
 Kernel modules: i915

Output of lspci and the content of cpuinfo can be found at
https://launchpadlibrarian.net/105660635/ProcCpuinfo.txt
https://launchpadlibrarian.net/105660632/Lspci.txt

I'll test the patch next week and report the result.

Thank you.

Revision history for this message
In , Jesse Sung (wenchien) wrote :

The patch in comment 17 should be the right one. With it applied, system stays alive after two days.

Thanks!

Revision history for this message
In , Daniel-ffwll (daniel-ffwll) wrote :

Awesome that this works out. Patch is merged into 3.8-rc2 as

commit 4283908ef7f11a72c3b80dd4cf026f1a86429f82
Author: Daniel Vetter <email address hidden>
Date: Fri Dec 14 23:38:28 2012 +0100

    drm/i915: Implement WaDisableHiZPlanesWhenMSAAEnabled

I'm writing the mail to the stable kernel team right now so that it gets applied to older kernels. Thanks for reporting this issue.

Changed in linux:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.