kernel hard freezes unless processor.max_cstates=1 is used

Bug #1159065 reported by Mkchan
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

My system hard freezes, no response to MagicKeys + REISUB, no error messages in /var/log/syslog or kern.log. Cannot ping or ssh.
This happens consistently after running for 24-48 hours. When I use process.max_cstates=1 I have no problems with freezes, but I would like my cpus to enter in deeper cstates for power saving.
I've tried kernel versions 3.2, 3.5, and 3.6.9
Hardware:
  Intel i7-3930k, Motherboard: DX79TO

Partial output of dmesg:

[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.6.9-030609-generic root=UUID=61abb01f-8268-421b-a83f-e0fcea737f56 ro crashkernel=384M-2G:64M,2G-:128M quiet splash notsc acpi_enforce_resources=lax clocksource=hpet hpet=force acpi=hpet intel_idle.max_cstate=0 processor.max_cstate=1 idle=mwait
[ 0.000000] tsc: Kernel compiled with CONFIG_X86_TSC, cannot disable TSC completely
[ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.000000] __ex_table already sorted, skipping sort
[ 0.000000] xsave: enabled xstate_bv 0x7, cntxt size 0x340
[ 0.000000] Checking aperture...
[ 0.000000] No AGP bridge found
[ 0.000000] Calgary: detecting Calgary via BIOS EBDA area
[ 0.000000] Calgary: Unable to locate Rio Grande table in EBDA - bailing!
[ 0.000000] Memory: 16209188k/17301504k available (6894k kernel code, 601880k absent, 490436k reserved, 6299k data, 956k init)
[ 0.000000] SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0, CPUs=16, Nodes=1
[ 0.000000] Hierarchical RCU implementation.
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.000000] RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=16.
[ 0.000000] NR_IRQS:16640 nr_irqs:1216 16
[ 0.000000] Extended CMOS year: 2000
[ 0.000000] Console: colour VGA+ 80x25
[ 0.000000] console [tty0] enabled
[ 0.000000] allocated 67108864 bytes of page_cgroup
[ 0.000000] please try 'cgroup_disable=memory' option if you don't want memory cgroups
[ 0.000000] hpet clockevent registered
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.004000] tsc: Detected 3199.972 MHz processor
[ 0.004000] Calibrating delay loop... 6361.08 BogoMIPS (lpj=12722176)
[ 0.028000] pid_max: default: 32768 minimum: 301
[ 0.028000] Security Framework initialized
[ 0.028000] AppArmor: AppArmor initialized
[ 0.028000] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
[ 0.028000] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[ 0.032000] Mount-cache hash table entries: 256
[ 0.032000] Initializing cgroup subsys cpuacct
[ 0.032000] Initializing cgroup subsys memory
[ 0.032000] Initializing cgroup subsys devices
[ 0.032000] Initializing cgroup subsys freezer
[ 0.032000] Initializing cgroup subsys blkio
[ 0.032000] Initializing cgroup subsys perf_event
[ 0.032000] Initializing cgroup subsys hugetlb
[ 0.032000] CPU: Physical Processor ID: 0
[ 0.032000] CPU: Processor Core ID: 0
[ 0.032000] mce: CPU supports 18 MCE banks
[ 0.032000] CPU0: Thermal monitoring enabled (TM1)
[ 0.032000] process: using mwait in idle threads
[ 0.032000] Last level iTLB entries: 4KB 512, 2MB 0, 4MB 0
[ 0.032000] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32
[ 0.032000] tlb_flushall_shift is 0x5
[ 0.032000] ACPI: Core revision 20120711
[ 0.040000] ftrace: allocating 28053 entries in 110 pages
[ 0.052000] dmar: Host address width 40
[ 0.052000] dmar: DRHD base: 0x000000fe901000 flags: 0x0
[ 0.052000] dmar: IOMMU 0: reg_base_addr fe901000 ver 1:0 cap d2008c10ef0462 ecap f0207a
[ 0.052000] dmar: DRHD base: 0x000000fe900000 flags: 0x1
[ 0.052000] dmar: IOMMU 1: reg_base_addr fe900000 ver 1:0 cap d2078c106f0462 ecap f020fe
[ 0.052000] dmar: RMRR base: 0x000000d8dd9000 end: 0x000000d8dd9fff
[ 0.052000] dmar: RMRR base: 0x000000d8ddb000 end: 0x000000d8ddbfff
[ 0.052000] Switched APIC routing to physical flat.
[ 0.052000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.092000] smpboot: CPU0: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz stepping 07
[ 0.092000] Performance Events: PEBS fmt1+, 16-deep LBR, SandyBridge events, Intel PMU driver.
[ 0.092000] ... version: 3
[ 0.092000] ... bit width: 48
[ 0.092000] ... generic registers: 4
[ 0.092000] ... value mask: 0000ffffffffffff
[ 0.092000] ... max period: 000000007fffffff
[ 0.092000] ... fixed-purpose events: 3
[ 0.092000] ... event mask: 000000070000000f
[ 0.092000] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[ 0.092000] smpboot: Booting Node 0, Processors #1
[ 0.104000] TSC synchronization [CPU#0 -> CPU#1]:
[ 0.104000] Measured 487730108 cycles TSC warp between CPUs, turning off TSC clock.
[ 0.008000] tsc: Marking TSC unstable due to check_tsc_sync_source failed
[ 0.104006] #2 #3 #4 #5 #6 #7 #8 #9 #10 #11
[ 0.216013] Brought up 12 CPUs
[ 0.216013] smpboot: Total of 12 processors activated (76333.05 BogoMIPS)
[ 0.220013] devtmpfs: initialized
[ 0.224014] EVM: security.selinux
[ 0.224014] EVM: security.SMACK64
[ 0.224014] EVM: security.capability
[ 0.224014] PM: Registering ACPI NVS region [mem 0xd84a9000-0xd858cfff] (933888 bytes)
[ 0.224014] PM: Registering ACPI NVS region [mem 0xd89c7000-0xd8ac6fff] (1048576 bytes)
[ 0.224014] PM: Registering ACPI NVS region [mem 0xdb704000-0xdb7befff] (765952 bytes)
[ 0.224014] dummy:

2)
Description: Ubuntu 12.04.2 LTS
Release: 12.04

3) Not to crash after 24 hours
4) Hard freezes
---
ApportVersion: 2.0.1-0ubuntu17.1
Architecture: amd64
DistroRelease: Ubuntu 12.04
InstallationMedia: Ubuntu 12.04.1 LTS "Precise Pangolin" - Release amd64 (20120823.1)
MarkForUpload: True
NonfreeKernelModules: nvidia
Package: linux (not installed)
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
Tags: precise running-unity
Uname: Linux 3.6.9-030609-generic x86_64
UnreportableReason: This is not an official Ubuntu package. Please remove any third party package and try again.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo vboxusers

Revision history for this message
Mkchan (mkchan) wrote :
Brad Figg (brad-figg)
affects: linux-meta (Ubuntu) → linux (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1159065

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: precise
Mkchan (mkchan)
tags: added: apport-collected running-unity
description: updated
Mkchan (mkchan)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.9 kernel[0] (Not a kernel in the daily directory) and install both the linux-image and linux-image-extra .deb packages.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc4-raring/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Mkchan (mkchan)
description: updated
Mkchan (mkchan)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Mkchan (mkchan) wrote :

tag kernel-bug-exists-upstream

Mkchan (mkchan)
tags: added: kernel-bug-exists-upstream
Revision history for this message
David Bartley (dtbartle) wrote :

I seem to have the same problem with a 4770k and ASUS Z87-PLUS. Also happends in the 3.10 kernel. What information do you need to further diagnose this?

Revision history for this message
Aikawa Takuya (riumutu) wrote :

I have the same problem on Acer Aspire 5750 (Intel Core i5 2410M / Intel HD Graphics 3000).
No sings of overworking, every month for me (maybe equivalent to 24-48 hours).
The system version is 12.04.2 LTS and I tried normal 3.2 kernel and 3.5 Quantal HWE kernel; the same happened with both of them.

Interestingly, hard freeze used to be happening on another Acer Aspire V3 (Intel Core i5 3210M / Intel HD Graphics 4000), too,
and after upgrading it to 12.10, this trouble disappeared (at least for a number of months).

Revision history for this message
David Bartley (dtbartle) wrote :

In my case this ended up being a bad power supply.

Revision history for this message
penalvch (penalvch) wrote :

Mkchan, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available (not the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.13-rc2

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: needs-kernel-logs
Revision history for this message
Mkchan (mkchan) wrote :

This is fixed as of the 3.12 kernel version.Thanks

Changed in linux (Ubuntu):
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.