Bug #1453298 “8086:0f31 Xubuntu freeze once a day” : Bugs : xserver-xorg-video-intel package : Ubuntu

Revision history for this message

In freedesktop.org Bugzilla #88012, Fritsch-b (fritsch-b) wrote on 2015-01-04:

#35

We experienced strange full system freezes on Asrock Q1900 hardware with our OpenELEC 5.0 release. No errors were visible via netconsole, the whole system just fully hung.

We then started to bisect between kernel 3.13 and 3.18 stable. It was verified before that 3.19-rc2 is also affected.

Commit: 31685c258e0b0ad6aa486c5ec001382cf8a64212 drm/i915/vlv: WA for Turbo and RC6 to work together

was found to be the first bad commit in that bisect.

A manual workaround was to set the max cstate to C1 (via BIOS), which workarounded this bug.

We currently have > 10 users that are affected by this bug (mostly Asrock Q1900 users).

You can see the complete bisecting steps here: https://github.com/OpenELEC/OpenELEC.tv/issues/3726#issuecomment-68626603

I will ask that user to subscribe to this tracker. As we freeze very hard, it's not possible to add logfiles as the netconsole stays empty for us.

Revision history for this message

In freedesktop.org Bugzilla #88012, Dnv (dnv) wrote on 2015-01-04:

#36

Created attachment 111723
dmesg output from boot till crash (drm.debug=0xe debug ignore_loglevel)

ASRock Q2900-ITX is affected, too.
Log is crated by using netconsole.

Revision history for this message

In freedesktop.org Bugzilla #88012, bwidawsk (bwidawsk) wrote on 2015-01-04:

#37

Created attachment 111734
Be more careful with punit reads

It's a bit of a long shot, but let's see what happens.
I have only compile tested this patch.

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-01-04:

#38

Created attachment 111739
111723: dmesg output from boot to hung (drm.debug=0xe debug ignore_loglevel)

Good day, I did the bisect, see attached my dmesg.

System: Zotac CI320 Nano, FW Version 2K141128, Intel HD Graphics, Intel Celeron N2930 (quad-core, 1.83 GHz)

Revision history for this message

In freedesktop.org Bugzilla #88012, Chris Wilson (ickle) wrote on 2015-01-05:

#39

I had some patches to improve the vlv rps: http://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=bug88012

They incorporated the change Ben suggested and reduce the number of interrupts required by the manual RPS tuning, as well as making it much more responsive to gfx workload (not that byt has that great a range). It doesn't explain a system hang though...

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-01-05:

#40

Created attachment 111767
dmesg output from boot to hung (drm.debug=0xe debug ignore_loglevel)

I build the Kernel (3.18.1-bw1+) from Peters git with Ben Widawski experimental patch. Unfortunately I had the freeze / hung again after ~10 minutes of running a movie. Attached is the dmesg log via netconsole until the System freeze.
If you need more Information or Logs - of course I will support as mutch is possible.

Revision history for this message

In freedesktop.org Bugzilla #88012, Fritsch-b (fritsch-b) wrote on 2015-01-05:

#41

@Juergen Froehler:

Please give ickle's branch a try, I forked it on my github (as freedesktop's git was really slow in the past):

git clone https://github.com/fritsch/linux.git
git checkout bug88012
make localmodconfig
make-kpkg --append-to-version "-ickle1" --initrd linux-headers linux-image

And give it a good test.

Btw. As your base OS is Ubuntu 14.04, you might need to upgrade the linux-firmware (or ignore warnings about it a bit).

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-01-06:

#42

Created attachment 111800
ickle1 - dmesg output from boot to hung (drm.debug=0xe debug ignore_loglevel)

Hello,
first I updated the Ubuntu firmware to linux-firmware_1.140 and build the new Kernel based on ~ickle (3.19.0-rc2-ickle1+). The System hung was after 5 min runtime. The Logfile was created via netconsole from boot > freeze.

Revision history for this message

In freedesktop.org Bugzilla #88012, Chris Wilson (ickle) wrote on 2015-01-06:

#43

Have you tried i915.enable_rc6=0? Or maybe using intel_pstate?

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-01-06:

#44

Created attachment 111836
ickle1 - dmesg output with trace on the end (drm.debug=0xe debug ignore_loglevel)

today morning I had this nice one, but this happend bevor I run any movie. I have to say that I want to do an refernce test (to see if the workaround still works) and limited in Bios the C State to C3.

Kernel: 3.19.0-rc2-ickle1+ (the one I build last night from ickle git)

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-01-06:

#45

(In reply to Chris Wilson from comment #8)
> Have you tried i915.enable_rc6=0? Or maybe using intel_pstate?

not this time, but I will do testing it now and give feedback soon.

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-01-06:

#46

Created attachment 111842
dmesg with i915.enable_rc6=0 (3.19.0-rc2-ickle1+)

Ok, here the result of the first test with i915.enable_rc6=0
I checked twice to be sure it was disabled
once in the dmesg:
[ 2.626518] [drm] RC6 disabled, disabling runtime PM support
[ well it looks like the same as when I limit 3.799990] [drm:intel_print_rc6_info] Enabling RC6 states: RC6 off

and once in the parameters:
/sys/module/i915/parameters/enable_rc6=0

Well it looks like as the same when I limit in the Bios the C State to C3. There is a trace on the end of the attached Logfile and if I run an mkv it freeze after some minutes.

the next test will be with intel_pstate=disable
actually the settings looks like:
for i in /sys/devices/system/cpu/intel_pstate/*; do echo $i=$(cat $i); done
/sys/devices/system/cpu/intel_pstate/max_perf_pct=100
/sys/devices/system/cpu/intel_pstate/min_perf_pct=100
/sys/devices/system/cpu/intel_pstate/no_turbo=0

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-01-06:

#47

Created attachment 111844
no freeze - dmesg with intel_pstate=disable (3.19.0-rc2-ickle1+)

This time I did a test with intel_pstate=disable. I had no freeze during a 45 minute run of a file which usually freeze. Anyway I attached the dmesg of it if you like to verify.

Revision history for this message

In freedesktop.org Bugzilla #88012, bwidawsk (bwidawsk) wrote on 2015-01-07:

#48

Do other governors also cause a hang? For instance:

for g in /sys/devices/system/cpu/cpu[0-9]/cpufreq/scaling_governor; do echo powersave > $g; echo cpu$i: $(cat $g); ((i++)); done

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-01-07:

#49

(In reply to Ben Widawsky from comment #13)
> Do other governors also cause a hang? For instance:
>
> for g in /sys/devices/system/cpu/cpu[0-9]/cpufreq/scaling_governor; do
> echo powersave > $g; echo cpu$i: $(cat $g); ((i++)); done

Hello Ben, will do the test tonight and give then feedback

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-01-07:

#50

Created attachment 111932
no freeze - dmesg with governor=powersave (3.19.0-rc2-ickle1+)

Hello Ben,
Here the result of my test with governor=powersave.
first I set the governor to powersave and checked it after reboot:

for g in /sys/devices/system/cpu/cpu[0-9]/cpufreq/scaling_governor; do echo $g=$(cat $g); done
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor=powersave
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor=powersave
/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor=powersave
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor=powersave

Booted Kernel regular (without i915.enable_rc6=0 and without intel_pstate=disable)

Kernel build from ickle git: 3.19.0-rc2-ickle1+ #1 SMP Tue Jan 6 00:57:18 CET 2015 x86_64 x86_64 x86_64 GNU/Linux

I had run some files over a time of almost 2 hour now without a freeze. In the attached logfile there is just one Kernel trace (perhaps interesting for Ickle), but it seems to have no impact during the test. The CPU was during the run mostly like:
cat /proc/cpuinfo | grep "cpu MHz"
cpu MHz : 499.741
cpu MHz : 499.741
cpu MHz : 499.741
cpu MHz : 499.741

Well I will do another test now with the "regular" Kernel 3.17.7 and governor=powersave just to see if it freeze or also run more "stable".

Revision history for this message

In freedesktop.org Bugzilla #88012, bwidawsk (bwidawsk) wrote on 2015-01-07:

#51

(In reply to Juergen Froehler from comment #15)

> Well I will do another test now with the "regular" Kernel 3.17.7 and
> governor=powersave just to see if it freeze or also run more "stable".

Can you also confirm you are unable to hit this without GPU, and just CPU stress tests (I do not have any recommendations for which test)? I see a a similar sounding problem, but it is very intermittent for me.

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-01-07:

#52

(In reply to Ben Widawsky from comment #16)
> (In reply to Juergen Froehler from comment #15)
>
> > Well I will do another test now with the "regular" Kernel 3.17.7 and
> > governor=powersave just to see if it freeze or also run more "stable".
>
> Can you also confirm you are unable to hit this without GPU, and just CPU
> stress tests (I do not have any recommendations for which test)? I see a a
> similar sounding problem, but it is very intermittent for me.

What I can say and what I have most intensive tested on the generic Kernels (3.13.0 > 3.17.7 and also on the ickle Kernel 3.19.rc2 was to disable HW Acceleration (VAAPI) in Kodi and running movies over several hours without a freeze/hung. The hung happens only when HW Acceleration in Kodi is enabled.

In the meantime I was running 3.17.7-generic with governor=powersave for over an hour now without a freeze, but the logfile runs quickly full with the aggresiv drm:valleyview_set_rps but no freeze yet... will let it run some time more

I did also some days ago a long memtest run over 6 PASS 0 Errors which took over ~8 hours to be sure there is no HW issue.

Well for a CPU stress test, I will look around if there is something I can use without killing it

Revision history for this message

In freedesktop.org Bugzilla #88012, DDD (3ddd) wrote on 2015-01-08:

#53

Maybe Prime95 can be user for CPU Stress tests?
http://www.mersenne.org/download/#stresstest

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-01-08:

#54

(In reply to DDD from comment #18)
> Maybe Prime95 can be user for CPU Stress tests?
> http://www.mersenne.org/download/#stresstest

I plan the CPU stress test for tonight and will give feedback. Found several good Information for stress testing in the Ubuntu Wiki.

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-01-08:

#55

CPU stress test: using stress with a runtime of 1200 seconds which should be a nice cpu burn test if our intend is to figgure out if there is an CPU issue. Under normal circumstances with running Kodi 14 on this box I never have seen this high cpu usage over this time periode.

---
stress --cpu 4 --timeout 1200
stress: info: [4387] dispatching hogs: 4 cpu, 0 io, 0 vm, 0 hdd
stress: info: [4387] successful run completed in 1200s

CPU > max speed (switched off Turbo mode in Bios to avoid killing my CPU)
cat /proc/cpuinfo | grep "cpu MHz"
cpu MHz : 1832.600
cpu MHz : 1832.600
cpu MHz : 1832.600
cpu MHz : 1832.600

top during test run
%Cpu(s):100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
4388 root 20 0 7316 100 0 R 100.0 0.0 19:49.05 stress
4389 root 20 0 7316 100 0 R 100.0 0.0 19:48.16 stress
4390 root 20 0 7316 100 0 R 100.0 0.0 19:52.30 stress
4391 root 20 0 7316 100 0 R 97.4 0.0 19:48.88 stress

sensors
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +58.0Â°C (high = +105.0Â°C, crit = +105.0Â°C)
Core 1: +58.0Â°C (high = +105.0Â°C, crit = +105.0Â°C)
Core 2: +62.0Â°C (high = +105.0Â°C, crit = +105.0Â°C)
Core 3: +61.0Â°C (high = +105.0Â°C, crit = +105.0Â°C)

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-01-12:

#56

good day together,

over the weekend I had some time to do several tests and I like to share my findings.

1. I did several different CPU & memory stress tests and all went fine, therefor i think the CPU & memory itself is fine.

2. kernels tested between 3.13 - 3.16 runs stable no freeze

3. I tested several mainline generic Kernels between 3.17 > 3.19RC2 & the Ickle 3.19RC2 with governor=powersave & C6/7 enabled in Bios.

the findings are: it runs more stable, freeze are very sporadic happens. I was not able to figure out under which circumstances it happens or not. Sometimes the test files runs over 2 hours without a freeze, sometimes 4 freezes in 1 hour. The Logfiles give no hint about the freeze.

I am very sorry that I was not able to get more Logfile information out of the System, but if you have more test Scenarios - I am glad to support.

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-02-04:

#57

good day together,

I kindly ask all, if there is something we can do to push this Topic a bit forward. As I already wrote I will support as far as possible.

kind regards

Revision history for this message

In freedesktop.org Bugzilla #88012, Jani-nikula (jani-nikula) wrote on 2015-02-05:

#58

So the regressing commit is

commit 31685c258e0b0ad6aa486c5ec001382cf8a64212
Author: Deepak S <email address hidden>
Date: Thu Jul 3 17:33:01 2014 -0400

drm/i915/vlv: WA for Turbo and RC6 to work together.

Deepak, Ville, do you have any ideas?

Revision history for this message

In freedesktop.org Bugzilla #88012, Chris Wilson (ickle) wrote on 2015-02-05:

#59

That bisect appears to be a red herring though.

Revision history for this message

In freedesktop.org Bugzilla #88012, Adf-lists (adf-lists) wrote on 2015-02-05:

#60

(In reply to Chris Wilson from comment #24)
> That bisect appears to be a red herring though.

As someone who is also affected by this I can well believe that.

Though I wasn't bisecting Kernel at the time (and still haven't) I've had runs of 12 hours without a lock - the same setup locked < 2H next day.

Having test quite hard since this bug was filed with released kernels I am 99.9% sure the issue is between 3.16.7 and 3.17.0.

I also think that gpu load is needed - I use LFS and have compiled plenty on bad kernels + mprime torture test and never locked.

Do you have any guesses of what the bad commit could be between those - if you have then people could test that and someone will hopefully call bad quickly then extended test on the one before.

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-02-05:

#61

(In reply to Chris Wilson from comment #24)
> That bisect appears to be a red herring though.

well to be honest - yes can be a red herring, because to make the decision if the Biscet step was good or bad wasn't easy, but I have tested each step at least >2 hours, but mostly the freeze was much earlier . However, what I 100% can say is, I am running my device with Ubuntu 14.04.1 & Kernel 3.16.7-031607-generic now since 20 days as my daily beast without any freeze/hang and of course with VAAPI HW Acceleration enabled in kodi and C6/7 Idle state enabled in Bios - therefore I believe it is not a Hardware issue. With a 3.17x Kernel no way... it start freeze with the same settings under 1 hour.

If someone has a suggestion or Idea how we can narrow down this issue - I am glad to support and test.

Revision history for this message

In freedesktop.org Bugzilla #88012, Dnv (dnv) wrote on 2015-02-06:

#62

(In reply to Chris Wilson from comment #24)
> That bisect appears to be a red herring though.

With the help of peter, i built 2 kernels about 30 days ago.
First one with git reset --hard 31685c258e0b0ad6aa486c5ec001382cf8a64212
Second one by a followed git revert
31685c258e0b0ad6aa486c5ec001382cf8a64212

The first one crashed every time i was testing it, the second one was running fine for a few hours and didn't crash at all. If you want to, i can test the second one as my standard kernel to be surer, that this commit is the right one.

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-02-10:

#63

I just tested latest mainline Kernel 3.19.0 to confirm the freeze still exist. unfortunately this issue exist now since 3.17.x.

kind regards Juergen

Revision history for this message

In freedesktop.org Bugzilla #88012, Devilstrike (devilstrike) wrote on 2015-02-14:

#64

Is not only the q1900/j1900 also the j1800 same problem from time to time, just hangs without any errors.

Revision history for this message

In freedesktop.org Bugzilla #88012, 1dreambox (1dreambox) wrote on 2015-03-05:

#65

same shit j1900 shit intel never again

Revision history for this message

In freedesktop.org Bugzilla #88012, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2015-03-05:

#66

Juergen sounds certain that this commit affects this issue, and I can believe it.

The punit provides several services, including CPU and GPU power management, and the code in question changes how we interact with the Punit to a degree.

So it's possible a BIOS upgrade (which would include a new Punit firmware) might help.

It's also possible that we're not validating the result of Deepak's code enough and end up feeding some bad values to the Punit as as result of the new calculations.

Or the simple fact that we're reading a new Punit reg fairly frequently is enough to cause trouble. In that case, throttling the vlv_c0_residency reads of the CZ timestamp may be enough to avoid this. (I don't think the C0 count reads should cause trouble, but it's possible they trigger additional punit activity as well, just by being enabled for read out in the control reg.)

Deepak, Ben, or Chris, any other ideas?

Revision history for this message

In freedesktop.org Bugzilla #88012, Adf-lists (adf-lists) wrote on 2015-03-05:

#67

I don't know about others but for me using Asrock Q1900dc-itx I put the latest bios (1.20) on as soon as I got it - there is nothing newer as of today.

I hadn't tried a new kernel since mid Jan (a nightly) but did today and todays nightly and fixes don't boot getting

ahci failed to stop engine then oops.

Haven't had time to see when it changed. Can boot with pci=nocrs.

Revision history for this message

In freedesktop.org Bugzilla #88012, Deepak-s-8 (deepak-s-8) wrote on 2015-03-06:

#68

Hi Jesse,

I am suspecing the voltage change after GPU frequencey request.

Can we try below options.
1. Keep the frquency at min (RPn) & run the workload. This will ensure we run at contant GPU voltage.
a) cat /sys/class/drm/card0/gt_RPn_freq_mhz
b) echo "value from above cmd" >/sys/class/drm/card0/gt_max_freq_mhz

2) Switch back to legacy turbo.
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 9baecb7..0dac413 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -4292,12 +4292,7 @@ void intel_irq_init(struct drm_i915_private *dev_priv)
INIT_WORK(&dev_priv->rps.work, gen6_pm_rps_work);
INIT_WORK(&dev_priv->l3_parity.error_work, ivybridge_parity_work);

- /* Let's track the enabled rps events */
- if (IS_VALLEYVIEW(dev_priv) && !IS_CHERRYVIEW(dev_priv))
- /* WaGsvRC0ResidencyMethod:vlv */
- dev_priv->pm_rps_events = GEN6_PM_RP_UP_EI_EXPIRED;
- else
- dev_priv->pm_rps_events = GEN6_PM_RPS_EVENTS;
+ dev_priv->pm_rps_events = GEN6_PM_RPS_EVENTS;

INIT_DELAYED_WORK(&dev_priv->gpu_error.hangcheck_work,
i915_hangcheck_elapsed);

Revision history for this message

In freedesktop.org Bugzilla #88012, Adf-lists (adf-lists) wrote on 2015-03-06:

#69

(In reply to Deepak S from comment #33)
> Hi Jesse,
>
> I am suspecing the voltage change after GPU frequencey request.
>
> Can we try below options.
> 1. Keep the frquency at min (RPn) & run the workload. This will ensure we
> run at contant GPU voltage.
> a) cat /sys/class/drm/card0/gt_RPn_freq_mhz
> b) echo "value from above cmd" >/sys/class/drm/card0/gt_max_freq_mhz

This alone does not fix for me - if anything it locked sooner, but then I only did 2 runs.

Will try patch alone soon.

Revision history for this message

In freedesktop.org Bugzilla #88012, Adf-lists (adf-lists) wrote on 2015-03-06:

#70

(In reply to Andy Furniss from comment #32)

> ahci failed to stop engine then oops.
>
> Haven't had time to see when it changed. Can boot with pci=nocrs.

In case anyone else testing on ASrock Qxxxx hits this, I bisected and there is already a bug filed -

https://bugzilla.kernel.org/show_bug.cgi?id=94221

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-03-06:

#71

(In reply to Deepak S from comment #33)
> Hi Jesse,
>
> I am suspecing the voltage change after GPU frequencey request.
>
> Can we try below options.
> 1. Keep the frquency at min (RPn) & run the workload. This will ensure we
> run at contant GPU voltage.
> a) cat /sys/class/drm/card0/gt_RPn_freq_mhz
> b) echo "value from above cmd" >/sys/class/drm/card0/gt_max_freq_mhz

Hi together,

did testing Option 1 - but still the System freeze and no chance to get some relevant output in the log.

~# cat /sys/class/drm/card0/gt_RPn_freq_mhz
167
~# cat /sys/class/drm/card0/gt_max_freq_mhz
854
~# echo "167" >/sys/class/drm/card0/gt_max_freq_mhz ~# cat /sys/class/drm/card0/gt_max_freq_mhz
167

kind regards
Juergen

Revision history for this message

In freedesktop.org Bugzilla #88012, Fritsch-b (fritsch-b) wrote on 2015-03-06:

#72

Here is an OpenELEC build with option 2) integrated: https://dl.dropboxusercontent.com/u/55728161/OpenELEC-Generic.x86_64-devel-20150306172724-r20368-gb822824.tar

Kernel 3.19 is used

Revision history for this message

In freedesktop.org Bugzilla #88012, Adf-lists (adf-lists) wrote on 2015-03-06:

#73

(In reply to Deepak S from comment #33)

> Can we try below options.

> 2) Switch back to legacy turbo.

2 is good for me so far, been running almost 12 hrs.

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-03-06:

#74

(In reply to Peter Frühberger from comment #37)
> Here is an OpenELEC build with option 2) integrated:
> https://dl.dropboxusercontent.com/u/55728161/OpenELEC-Generic.x86_64-devel-
> 20150306172724-r20368-gb822824.tar
>
> Kernel 3.19 is used

This Version runs now >2 hour without an freeze. I let it run now over night and give feedback tomorrow.

kind regards
Juergen

Revision history for this message

In freedesktop.org Bugzilla #88012, Fritsch-b (fritsch-b) wrote on 2015-03-07:

#75

@Deepak S:

What are the disadvantages for other intel processors? Can we savely include this patch in our 3.17.x backports without introducing regressions for other non BYT intel hardware?

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-03-07:

#76

(In reply to Juergen Froehler from comment #39)
> (In reply to Peter Frühberger from comment #37)
> > Here is an OpenELEC build with option 2) integrated:
> > https://dl.dropboxusercontent.com/u/55728161/OpenELEC-Generic.x86_64-devel-
> > 20150306172724-r20368-gb822824.tar
> >
> > Kernel 3.19 is used
>
> This Version runs now >2 hour without an freeze. I let it run now over night
> and give feedback tomorrow.
>
> kind regards
> Juergen

Ok it runs now over 9 hours continuously without a freeze
@Deepak S - it looks like you hit the bull's eye

Revision history for this message

In freedesktop.org Bugzilla #88012, Deepak-s-8 (deepak-s-8) wrote on 2015-03-07:

#77

@Peter Frühberger, The changes is specific to BYT. it should not impact any other platform.

@Jesse, Shall we enable legacy turbo on BYT until we have rootcause on BYT WA?
Also, Chris has submitted a cleaned up patch for "WA for Turbo and RC6 to work together" for review.

Thanks
Deepak

Revision history for this message

In freedesktop.org Bugzilla #88012, AlexN (dark5) wrote on 2015-03-14:

#78

(In reply to Juergen Froehler from comment #41)
> (In reply to Juergen Froehler from comment #39)
> > (In reply to Peter Frühberger from comment #37)
> > > Here is an OpenELEC build with option 2) integrated:
> > > https://dl.dropboxusercontent.com/u/55728161/OpenELEC-Generic.x86_64-devel-
> > > 20150306172724-r20368-gb822824.tar
> > >
> > > Kernel 3.19 is used
> >
> > This Version runs now >2 hour without an freeze. I let it run now over night
> > and give feedback tomorrow.
> >
> > kind regards
> > Juergen
>
> Ok it runs now over 9 hours continuously without a freeze
> @Deepak S - it looks like you hit the bull's eye

Unfortunately I had a freeze after about 4 hours :-(

Revision history for this message

In freedesktop.org Bugzilla #88012, AlexN (dark5) wrote on 2015-03-14:

#79

S(In reply to Alex N from comment #43)
> (In reply to Juergen Froehler from comment #41)
> > (In reply to Juergen Froehler from comment #39)
> > > (In reply to Peter Frühberger from comment #37)
> > > > Here is an OpenELEC build with option 2) integrated:
> > > > https://dl.dropboxusercontent.com/u/55728161/OpenELEC-Generic.x86_64-devel-
> > > > 20150306172724-r20368-gb822824.tar
> > > >
> > > > Kernel 3.19 is used
> > >
> > > This Version runs now >2 hour without an freeze. I let it run now over night
> > > and give feedback tomorrow.
> > >
> > > kind regards
> > > Juergen
> >
> > Ok it runs now over 9 hours continuously without a freeze
> > @Deepak S - it looks like you hit the bull's eye
>
> Unfortunately I had a freeze after about 4 hours :-(

Sorry, just recognized, that my system hasn't been updated correctly!

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-03-15:

#80

Ok, now I use the patched Kernel from Peter 3.19.1-legacy-turbo+ since 1 week and had no freeze. Therefore I would say this works so far as a interim fix until the root cause is found.

If you have new findings or upcoming patches to test out I am glad to support as much as I can.

kind ragrds
Juergen

Revision history for this message

In freedesktop.org Bugzilla #88012, Daniel-ffwll (daniel-ffwll) wrote on 2015-03-18:

#81

Deepak, can you pls submit a proper patch for option 2), maybe restricted to just vlv to intel-gfx? Hanging machines are a pretty serious regression, I'd like to see this resolved.

We'd need to make sure that this isn't an issue on chv ofc, but that can happen after the functional revert.

Revision history for this message

In freedesktop.org Bugzilla #88012, Deepak-s-8 (deepak-s-8) wrote on 2015-03-18:

#82

@Daniel, I will submit the patch & Also, WA not enabled for CHV so there should be any problem.

Btw, Chris has cleaned up patches for "WA for Turbo and RC6 to work" should we try that?

Revision history for this message

In freedesktop.org Bugzilla #88012, Chris Wilson (ickle) wrote on 2015-03-18:

#83

They are worth trying again afterwards. I don't think they avoid the fundamental issue here which appears to be the PCU itself.

Revision history for this message

In freedesktop.org Bugzilla #88012, Adf-lists (adf-lists) wrote on 2015-03-19:

#84

(In reply to Deepak S from comment #47)
> @Daniel, I will submit the patch & Also, WA not enabled for CHV so there
> should be any problem.
>
> Btw, Chris has cleaned up patches for "WA for Turbo and RC6 to work" should
> we try that?

I noticed some new patches went into a nightly (18th).

c4d390d drm/i915: Use down ei for manual Baytrail RPS calculations
168ebd7 drm/i915: Improved w/a for rps on Baytrail

It's a bit early to say anything conclusive, but I have so far not locked running that, but then I only did a few hours yesterday + currently up to 7 today.

Revision history for this message

In freedesktop.org Bugzilla #88012, Adf-lists (adf-lists) wrote on 2015-03-25:

#85

(In reply to Andy Furniss from comment #49)
> (In reply to Deepak S from comment #47)
> > @Daniel, I will submit the patch & Also, WA not enabled for CHV so there
> > should be any problem.
> >
> > Btw, Chris has cleaned up patches for "WA for Turbo and RC6 to work" should
> > we try that?
>
> I noticed some new patches went into a nightly (18th).
>
> c4d390d drm/i915: Use down ei for manual Baytrail RPS calculations
> 168ebd7 drm/i915: Improved w/a for rps on Baytrail
>
> It's a bit early to say anything conclusive, but I have so far not locked
> running that, but then I only did a few hours yesterday + currently up to 7
> today.

I've done many hours of running since and I am still stable.

Revision history for this message

In freedesktop.org Bugzilla #88012, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2015-03-25:

#86

Ok, looks like we worked around this one then with the commits mentioned. Thanks a lot for testing Juergen.

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-03-29:

#87

Thank you all for supporting
here my personal summary after long time test period:

mainline kernel between 3.13 -> 3.16 do not have the freeze issue
every mainline Kernel between 3.17.x -> 3.19.2 the freeze appear fast & frequently
mainline Kernel 3.19.3 (without legacy turbo fix) - rarely random freeze (I had just one in 4 days - still early to say more) but less as before
patched Kernel 3.19.x + legacy turbo fix - running rock solid = no freeze over long time period

therefore the Kernel with the legacy turbo fix is for me in the moment the best result for daily usage.

I did not test any of the 4.x Kernels yet - if needed I will do.

kind regards
Juergen

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-03-31:

#88

a short update & feedback from my side, perhaps it might be worth knowing. I had time to run the latest mainline Kernel 4.0.0-040000rc5.201503230035 during the last 2 days and my findings are that the freeze still exist.

kind regards
Juergen

Revision history for this message

In freedesktop.org Bugzilla #88012, Adf-lists (adf-lists) wrote on 2015-03-31:

#89

(In reply to Juergen Froehler from comment #53)
> a short update & feedback from my side, perhaps it might be worth knowing. I
> had time to run the latest mainline Kernel 4.0.0-040000rc5.201503230035
> during the last 2 days and my findings are that the freeze still exist.

From what I can see the fixes above that I am still running aren't in drm-intel-fixes so I guess not anything mainline? They are in drm-intel-next-fixes.

Revision history for this message

In freedesktop.org Bugzilla #88012, Adf-lists (adf-lists) wrote on 2015-04-08:

#90

Todays nightly 2015-04-08 locks again.

I've been running nightly from 03-18 without issue till now - tested new kernel as I noticed that some more Baytrail changes went in eg.

Agressive downclocking on Baytrail

I'll try reverting it and running later.

FWIW when I hard lock the picture is always still on screen - just thought I'd mention it.

Revision history for this message

In freedesktop.org Bugzilla #88012, Adf-lists (adf-lists) wrote on 2015-04-08:

#91

(In reply to Andy Furniss from comment #55)
> Todays nightly 2015-04-08 locks again.

> Agressive downclocking on Baytrail
>
> I'll try reverting it and running later.

Still locks with that reverted.

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-04-08:

#92

@ Andy
I still use heavily the patched 3.19.1 kernel from Fritsch as my daily beast without any freeze.
And to confirm - same on my device when the freeze happens within the unpatched Kernels the last pictures is visible - it looks like just "frozen"

Revision history for this message

In freedesktop.org Bugzilla #88012, Adf-lists (adf-lists) wrote on 2015-04-08:

#93

(In reply to Juergen Froehler from comment #57)
> @ Andy
> I still use heavily the patched 3.19.1 kernel from Fritsch as my daily beast
> without any freeze.
> And to confirm - same on my device when the freeze happens within the
> unpatched Kernels the last pictures is visible - it looks like just "frozen"

Yea, I was stable with the patch on here or with the nightly that didn't have the patch but did have the commits I mentioned above.

Something regressed - It seems trying to bisect the nightly tree isn't going to work - the first try was bad and I got "the merge base xxxx is bad this means the bug was fixed between xxxx and yyyy" :-(

Revision history for this message

In freedesktop.org Bugzilla #88012, Adf-lists (adf-lists) wrote on 2015-04-10:

#94

(In reply to Andy Furniss from comment #56)
> (In reply to Andy Furniss from comment #55)
> > Todays nightly 2015-04-08 locks again.
>
> > Agressive downclocking on Baytrail
> >
> > I'll try reverting it and running later.
>
> Still locks with that reverted.

I tried again a bisect on a different branch = drm-intel-next-queued

I managed to arrange not to hit any merges and the bisect did call

8fb55197e64d5988ec57b54e973daeea72c3f2ff
drm/i915: Agressive downclocking on Baytrail

In fact while sitting on that commit for the first time ever I locked without the use of kodi. Just fast scrolling in a maximised xterm from a make modules_install.

Generally the locks were much quicker than I am used to - 5-10 mins with kodi.

Just to confuse things, on the older nightly, as I said above, I still locked with this reverted - on the new branch (which has more new commits since I tested the nightly) I so far haven't locked with it reverted.

Revision history for this message

In freedesktop.org Bugzilla #88012, Mazout360 (mazout360) wrote on 2015-04-13:

#95

Q1900DC-ITX here.
Been having GPU hangs since 3.19 on kodi/chrome, but it stopped right after a self-compiled 4.0.0-rc6 kernel from drm-intel-nightly (right before the 70 patch set by Chris Wilson). I can confirm that the newer >RC7 regressed and the GPU hangs "seems" to happen quicker. I also noticed some serious intermittent stuttering on some videos (ie. CBS.com online) every ~1-2 minutes with the patchset. I can provide logs if required.

Revision history for this message

In freedesktop.org Bugzilla #88012, Adf-lists (adf-lists) wrote on 2015-04-13:

#96

(In reply to Andy Furniss from comment #59)

> Just to confuse things, on the older nightly, as I said above, I still
> locked with this reverted.

I recreated the test on nightly where I thought I still locked with

8fb55197e64d5988ec57b54e973daeea72c3f2ff
drm/i915: Agressive downclocking on Baytrail

reverted and I didn't lock, so it seems I messed up somewhere for that test initially.

So reverting above alone does make me stable on both the nightly I first tested with and drm-intel-next-queued (tested as it was over the weekend).

Revision history for this message

In freedesktop.org Bugzilla #88012, Mazout360 (mazout360) wrote on 2015-04-15:

#97

(In reply to Andy Furniss from comment #61)
> 8fb55197e64d5988ec57b54e973daeea72c3f2ff
> drm/i915: Agressive downclocking on Baytrail
>
> reverted and I didn't lock, so it seems I messed up somewhere for that test
> initially.
>
> So reverting above alone does make me stable on both the nightly I first
> tested with and drm-intel-next-queued (tested as it was over the weekend).

Maybe not. I just tried this (latest drm-intel-next-queued with the commit reverted) and I locked after ~2 hours uptime (couldn't get logs, everything hung up including ssh). Definitely more stable without the commit (less stutter in 1080p video playback), but I had over 50 hours of uptime with 4.0.0-RC6 without any issue. Maybe there's something wrong elsewhere ?

Revision history for this message

In freedesktop.org Bugzilla #88012, Adf-lists (adf-lists) wrote on 2015-04-16:

#98

(In reply to Maxime Bergeron from comment #62)
> (In reply to Andy Furniss from comment #61)
> > 8fb55197e64d5988ec57b54e973daeea72c3f2ff
> > drm/i915: Agressive downclocking on Baytrail
> >
> > reverted and I didn't lock, so it seems I messed up somewhere for that test
> > initially.
> >
> > So reverting above alone does make me stable on both the nightly I first
> > tested with and drm-intel-next-queued (tested as it was over the weekend).
>
> Maybe not. I just tried this (latest drm-intel-next-queued with the commit
> reverted) and I locked after ~2 hours uptime (couldn't get logs, everything
> hung up including ssh). Definitely more stable without the commit (less
> stutter in 1080p video playback), but I had over 50 hours of uptime with
> 4.0.0-RC6 without any issue. Maybe there's something wrong elsewhere ?

Yea, I updated yesterday after seeing this and did manage to lock next-queued.

Possibly not anything recent, though, as it seems whether I lock or not now depends on how I test - 1080i30 (+deint) with some 1080p60 on 60Hz display = lock. I had been testing before with 1080p24 or 1080i25 and retried like this today - it's still running after 9 Hours.

Given the above the next commit I will try reverting in addition to aggressive downclock =

6ad790c0f5ac55fd13f322c23519f0d6f0721864
drm/i915: Boost GPU frequency if we detect outstanding pageflips

and I will run samples where frame/field rate = refresh.

Revision history for this message

In freedesktop.org Bugzilla #88012, Adf-lists (adf-lists) wrote on 2015-04-28:

#99

(In reply to Andy Furniss from comment #63)
> (In reply to Maxime Bergeron from comment #62)
> > (In reply to Andy Furniss from comment #61)
> > > 8fb55197e64d5988ec57b54e973daeea72c3f2ff
> > > drm/i915: Agressive downclocking on Baytrail
> > >
> > > reverted and I didn't lock, so it seems I messed up somewhere for that test
> > > initially.
> > >
> > > So reverting above alone does make me stable on both the nightly I first
> > > tested with and drm-intel-next-queued (tested as it was over the weekend).
> >
> > Maybe not. I just tried this (latest drm-intel-next-queued with the commit
> > reverted) and I locked after ~2 hours uptime (couldn't get logs, everything
> > hung up including ssh). Definitely more stable without the commit (less
> > stutter in 1080p video playback), but I had over 50 hours of uptime with
> > 4.0.0-RC6 without any issue. Maybe there's something wrong elsewhere ?
>
> Yea, I updated yesterday after seeing this and did manage to lock
> next-queued.
>
> Possibly not anything recent, though, as it seems whether I lock or not now
> depends on how I test - 1080i30 (+deint) with some 1080p60 on 60Hz display =
> lock. I had been testing before with 1080p24 or 1080i25 and retried like
> this today - it's still running after 9 Hours.
>
> Given the above the next commit I will try reverting in addition to
> aggressive downclock =
>
> 6ad790c0f5ac55fd13f322c23519f0d6f0721864
> drm/i915: Boost GPU frequency if we detect outstanding pageflips
>
> and I will run samples where frame/field rate = refresh.

Time passes - I had been slowly trying to find a guilty commit, but I gave up as the history for drm-intel-next-queued looks totally different depending where I am so it's hard to find anything.

I can lock on the commit before Agressive downclocking on Baytrail but not with kodi - the only way I found was "make modules_install" which is quite strange - I made a prog that scrolls at variable rates but that didn't work.

Trying to test with make going back in the history didn't get very far as I soon found that history is inconsistent due to the merges so I would test a commit (git reset --hard) fail, look at the history and choose an earlier commit then find that when reset on that the history was totally different and I was testing without the commits that "fixed" the issue in the first place

c4d390d drm/i915: Use down ei for manual Baytrail RPS calculations
168ebd7 drm/i915: Improved w/a for rps on Baytrail

even though the previous history/log had them way down after the new place I wanted to try.

(In reply to Andy Furniss from comment #63)
> (In reply to Maxime Bergeron from comment #62)
> > (In reply to Andy Furniss from comment #61)
> > > 8fb55197e64d5988ec57b54e973daeea72c3f2ff
> > > drm/i915: Agressive downclocking on Baytrail
> > > 
> > > reverted and I didn't lock, so it seems I messed up somewhere for that test
> > > initially.
> > > 
> > > So reverting above alone does make me stable on both the nightly I first
> > > tested with and drm-intel-next-queued (tested as it was over the weekend).
> > 
> > Maybe not. I just tried this (latest drm-intel-next-queued with the commit
> > reverted) and I locked after ~2 hours uptime (couldn't get logs, everything
> > hung up including ssh). Definitely more stable without the commit (less
> > stutter in 1080p video playback), but I had over 50 hours of uptime with
> > 4.0.0-RC6 without any issue. Maybe there's something wrong elsewhere ?
> 
> Yea, I updated yesterday after seeing this and did manage to lock
> next-queued.
> 
> Possibly not anything recent, though,  as it seems whether I lock or not now
> depends on how I test - 1080i30 (+deint) with some 1080p60 on 60Hz display =
> lock. I had been testing before with 1080p24 or 1080i25 and retried like
> this today - it's still running after 9 Hours.
> 
> Given the above the next commit I will try reverting in addition to
> aggressive downclock =
> 
> 6ad790c0f5ac55fd13f322c23519f0d6f0721864
> drm/i915: Boost GPU frequency if we detect outstanding pageflips
> 
> and I will run samples where frame/field rate = refresh.

Time passes -  I had been slowly trying to find a guilty commit, but I gave up as the history for drm-intel-next-queued looks totally different depending where I am so it's hard to find anything.

I can lock on the commit before Agressive downclocking on Baytrail but not with kodi - the only way I found was "make modules_install" which is quite strange - I made a prog that scrolls at variable rates but that didn't work.

Trying to test with make going back in the history didn't get very far as I soon found that history is inconsistent due to the merges so I would test a commit (git reset --hard) fail, look at the history and choose an earlier commit then find that when reset on that the history was totally different and I was testing without the commits that "fixed" the issue in the first place

c4d390d drm/i915: Use down ei for manual Baytrail RPS calculations
168ebd7 drm/i915: Improved w/a for rps on Baytrail

even though the previous history/log had them way down after the new place I wanted to try.

Revision history for this message

In freedesktop.org Bugzilla #88012, Mazout360 (mazout360) wrote on 2015-04-28:

#100

(In reply to Andy Furniss from comment #64)
> Trying to test with make going back in the history didn't get very far as I
> soon found that history is inconsistent due to the merges so I would test a
> commit (git reset --hard) fail, look at the history and choose an earlier
> commit then find that when reset on that the history was totally different
> and I was testing without the commits that "fixed" the issue in the first
> place
>
> c4d390d drm/i915: Use down ei for manual Baytrail RPS calculations
> 168ebd7 drm/i915: Improved w/a for rps on Baytrail
>
> even though the previous history/log had them way down after the new place I
> wanted to try.

Yes indeed it gets complicated with merges.
Personally if I compile virgin/testing/drm-intel as of today, I get a GPU hang on kodi boot (attached dmesg-4.0.0 and crashlog-4.0.0) with a segmentation fault.
Else, if I revert before the patchset including:

8fb55197e64d5988ec57b54e973daeea72c3f2ff
drm/i915: Agressive downclocking on Baytrail

It does work, although with the patchset too but it ends up hanging with >=1080p videos. That's weird as this commit doesn't seem to be linked to the original problem, so it's like if this was simply exacerbating another underlying, older issue that might've been missed. For now I'm running 4.1 from Linus github and it works fine...for now.

Revision history for this message

In freedesktop.org Bugzilla #88012, Mazout360 (mazout360) wrote on 2015-04-28:

#101

Created attachment 115414
Crashlog GPU Hang on drm-intel-nightly 4.0.0

Revision history for this message

In freedesktop.org Bugzilla #88012, Mazout360 (mazout360) wrote on 2015-04-28:

#102

Created attachment 115415
Dmesg - drm-intel-nightly 4.0.0

Revision history for this message

In freedesktop.org Bugzilla #88012, Adf-lists (adf-lists) wrote on 2015-05-04:

#103

I tried a kernel.org 4.1-rc1 tar over the weekend and though I didn't lock with kodi, I could quite easily lock with a few "make modules_install" in a row. I do this after kodi has been running some time. Of course my ddx and measa are new so likely different to other peoples - but I have so far still failed to lock 3.16.7 using the same test method with the same currentish ddx/mesa.

Revision history for this message

In freedesktop.org Bugzilla #88012, Openelec (openelec) wrote on 2015-05-05:

#104

Well I have the next days some free time, therefore I am able to do some tests. On which Tree I should jump to do Kernel testing on my device to get a qualified feedback for the Devs?

Revision history for this message

Ubuntu1988 (ubuntu1988) wrote on 2015-05-08:

#1

Xorg.0.log Edit (20.9 KiB, text/plain)
Dependencies.txt Edit (3.6 KiB, text/plain; charset="utf-8")
JournalErrors.txt Edit (9.4 KiB, text/plain; charset="utf-8")
ProcEnviron.txt Edit (103 bytes, text/plain; charset="utf-8")

Ubuntu1988 (ubuntu1988) on 2015-05-08

description:	updated
description:	updated
description:	updated

Ubuntu1988 (ubuntu1988) on 2015-05-08

description:	updated
description:	updated

Ubuntu1988 (ubuntu1988) on 2015-05-08

description:

updated

Revision history for this message

Brad Figg (brad-figg) wrote on 2015-05-09: Missing required logs.

#2

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1453298

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Ubuntu1988 (ubuntu1988) wrote on 2015-05-09: AlsaInfo.txt

#3

AlsaInfo.txt Edit (38.6 KiB, text/plain)

apport information

tags:	added: apport-collected
description:	updated

Revision history for this message

Ubuntu1988 (ubuntu1988) wrote on 2015-05-09: CRDA.txt

#4

CRDA.txt Edit (238 bytes, text/plain)

apport information

Revision history for this message

Ubuntu1988 (ubuntu1988) wrote on 2015-05-09: CurrentDmesg.txt

#5

CurrentDmesg.txt Edit (64.4 KiB, text/plain)

apport information

Revision history for this message

Ubuntu1988 (ubuntu1988) wrote on 2015-05-09: Dependencies.txt

#6

Dependencies.txt Edit (3.6 KiB, text/plain)

apport information

Revision history for this message

Ubuntu1988 (ubuntu1988) wrote on 2015-05-09: JournalErrors.txt

#7

JournalErrors.txt Edit (7.2 KiB, text/plain)

apport information

Revision history for this message

Ubuntu1988 (ubuntu1988) wrote on 2015-05-09: Lspci.txt

#8

Lspci.txt Edit (7.5 KiB, text/plain)

apport information

Revision history for this message

Ubuntu1988 (ubuntu1988) wrote on 2015-05-09: Lsusb.txt

#9

Lsusb.txt Edit (364 bytes, text/plain)

apport information

Revision history for this message

Ubuntu1988 (ubuntu1988) wrote on 2015-05-09: ProcCpuinfo.txt

#10

ProcCpuinfo.txt Edit (3.6 KiB, text/plain)

apport information

Revision history for this message

Ubuntu1988 (ubuntu1988) wrote on 2015-05-09: ProcEnviron.txt

#11

ProcEnviron.txt Edit (103 bytes, text/plain)

apport information

Revision history for this message

Ubuntu1988 (ubuntu1988) wrote on 2015-05-09: ProcInterrupts.txt

#12

ProcInterrupts.txt Edit (1.8 KiB, text/plain)

apport information

Revision history for this message

Ubuntu1988 (ubuntu1988) wrote on 2015-05-09: ProcModules.txt

#13

ProcModules.txt Edit (4.8 KiB, text/plain)

apport information

Revision history for this message

Ubuntu1988 (ubuntu1988) wrote on 2015-05-09: PulseList.txt

#14

PulseList.txt Edit (21.0 KiB, text/plain)

apport information

Revision history for this message

Ubuntu1988 (ubuntu1988) wrote on 2015-05-09: UdevDb.txt

#15

UdevDb.txt Edit (132.4 KiB, text/plain)

apport information

Revision history for this message

Ubuntu1988 (ubuntu1988) wrote on 2015-05-09: WifiSyslog.txt

#16

WifiSyslog.txt Edit (83.3 KiB, text/plain)

apport information

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed
no longer affects:	linux (Ubuntu)

Revision history for this message

In freedesktop.org Bugzilla #88012, Jesse Barnes (jbarnes-virtuousgeek) wrote on 2015-07-29:

#105

Deepak, any update here?

Revision history for this message

In freedesktop.org Bugzilla #88012, Deepak-s-8 (deepak-s-8) wrote on 2015-07-29:

#106

Hi Jesse,

I thought improved rps patches from Chris helped us to resolve the issue.

Can enable the legacy turbo back and see if it helps?
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 9baecb7..0dac413 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -4292,12 +4292,7 @@ void intel_irq_init(struct drm_i915_private *dev_priv)
INIT_WORK(&dev_priv->rps.work, gen6_pm_rps_work);
INIT_WORK(&dev_priv->l3_parity.error_work, ivybridge_parity_work);

- /* Let's track the enabled rps events */
- if (IS_VALLEYVIEW(dev_priv) && !IS_CHERRYVIEW(dev_priv))
- /* WaGsvRC0ResidencyMethod:vlv */
- dev_priv->pm_rps_events = GEN6_PM_RP_UP_EI_EXPIRED;
- else
- dev_priv->pm_rps_events = GEN6_PM_RPS_EVENTS;
+ dev_priv->pm_rps_events = GEN6_PM_RPS_EVENTS;

INIT_DELAYED_WORK(&dev_priv->gpu_error.hangcheck_work,
i915_hangcheck_elapsed);

based on the comments looks like we are hitting the issue after enabling aggressive downclocking. I will check the patch again to see if we can potential fix.

8fb55197e64d5988ec57b54e973daeea72c3f2ff
drm/i915: Agressive downclocking on Baytrail

Thanks
Deepak

Revision history for this message

dcastro (maildcastro) wrote on 2015-08-15: Re: Xubuntu freeze once a day

#17

Hi,

Exactly same problem here!
And the solution is?????

Regards,
Daniel

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-08-18:

#18

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xserver-xorg-video-intel (Ubuntu):
status:	New → Confirmed

Alberto Salvia Novella (es20490446e) on 2015-08-21

Changed in xserver-xorg-video-intel (Ubuntu):
importance:	Undecided → Critical

Alberto Salvia Novella (es20490446e) on 2015-08-21

Changed in xserver-xorg-video-intel (Ubuntu):
status:	Confirmed → Triaged

Revision history for this message

dcastro (maildcastro) wrote on 2015-08-21:

#19

Hi,

Here a little bit information:

~$ uname -a
Linux mini 3.19.0-26-generic #28-Ubuntu SMP Tue Aug 11 14:16:32 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

~$ Xorg -version

X.Org X Server 1.17.1
Release Date: 2015-02-10
X Protocol Version 11, Revision 0
Build Operating System: Linux 3.2.0-61-generic x86_64 Ubuntu
Current Operating System: Linux mini 3.19.0-26-generic #28-Ubuntu SMP Tue Aug 11 14:16:32 UTC 2015 x86_64
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.19.0-26-generic root=UUID=2ee99c40-1fb2-4acc-a729-01443306486f ro text
Build Date: 19 March 2015 09:26:59AM
xorg-server 2:1.17.1-0ubuntu3 (For technical support please see http://www.ubuntu.com/support)
Current version of pixman: 0.32.6
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.

~$ lspci -s 00:02.0 -v
00:02.0 VGA compatible controller: Intel Corporation Atom Processor Z36xxx/Z37xxx Series Graphics & Display (rev 0e) (prog-if 00 [VGA controller])
        Subsystem: Foxconn International, Inc. Device 0db1
        Flags: bus master, fast devsel, latency 0, IRQ 92
        Memory at d0000000 (32-bit, non-prefetchable) [size=4M]
        Memory at c0000000 (32-bit, prefetchable) [size=256M]
        I/O ports at f080 [size=8]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: <access denied>
        Kernel driver in use: i915

~$ head -n5 /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 55
model name : Intel(R) Celeron(R) CPU J1900 @ 1.99GHz

If i can provide any useful information, please let me known.

Regards,
Daniel

Revision history for this message

dcastro (maildcastro) wrote on 2015-08-21: apport information

#20

ApportVersion: 2.17.2-0ubuntu1.3
Architecture: amd64
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
DistUpgraded: Fresh install
DistroCodename: vivid
DistroRelease: Ubuntu 15.04
DistroVariant: ubuntu
DkmsStatus:
r8168, 8.039.00, 3.19.0-23-generic, x86_64: installed
r8168, 8.039.00, 3.19.0-26-generic, x86_64: installed
ExtraDebuggingInterest: Yes, if not too technical
GraphicsCard:
Intel Corporation Atom Processor Z36xxx/Z37xxx Series Graphics & Display [8086:0f31] (rev 0e) (prog-if 00 [VGA controller])
Subsystem: Foxconn International, Inc. Device [105b:0db1]
InstallationDate: Installed on 2015-08-19 (2 days ago)
InstallationMedia: Ubuntu-GNOME 15.04 "Vivid Vervet" - Release amd64 (20150422)
MachineType: To be filled by O.E.M. To be filled by O.E.M.
Package: xserver-xorg-video-intel 2:2.99.917-1~exp1ubuntu2.2
PackageArchitecture: amd64
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.19.0-26-generic root=UUID=2ee99c40-1fb2-4acc-a729-01443306486f ro text
ProcVersionSignature: Ubuntu 3.19.0-26.28-generic 3.19.8-ckt4
Tags: vivid ubuntu
UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
Uname: Linux 3.19.0-26-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 07/17/2014
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: D72F1P05_x64
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: nT-iBT18/nT-iBT19/nT-iBT29
dmi.board.vendor: Foxconn
dmi.board.version: FAB 1.0
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrD72F1P05_x64:bd07/17/2014:svnTobefilledbyO.E.M.:pnTobefilledbyO.E.M.:pvrTobefilledbyO.E.M.:rvnFoxconn:rnnT-iBT18/nT-iBT19/nT-iBT29:rvrFAB1.0:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.name: To be filled by O.E.M.
dmi.product.version: To be filled by O.E.M.
dmi.sys.vendor: To be filled by O.E.M.
version.compiz: compiz N/A
version.ia32-libs: ia32-libs N/A
version.libdrm2: libdrm2 2.4.60-2
version.libgl1-mesa-dri: libgl1-mesa-dri 10.5.2-0ubuntu1
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 10.5.2-0ubuntu1
version.xserver-xorg-core: xserver-xorg-core 2:1.17.1-0ubuntu3
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.9.0-1ubuntu2
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:7.5.0-1ubuntu2
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917-1~exp1ubuntu2.2
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.11-1ubuntu2build1
xserver.bootTime: Thu Aug 20 17:18:12 2015
xserver.configfile: default
xserver.devices:
input Power Button KEYBOARD, id 6
input Video Bus KEYBOARD, id 7
input Sleep Button KEYBOARD, id 8
input Multimedia Air Mouse Keyboard KEYBOARD, id 9
input Multimedia Air Mouse Keyboard KEYBOARD, id 10
xserver.errors:

xserver.logfile: /var/log/Xorg.0.log
xserver.outputs:

xserver.version: 2:1.17.1-0ubuntu3

ApportVersion: 2.17.2-0ubuntu1.3
Architecture: amd64
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
DistUpgraded: Fresh install
DistroCodename: vivid
DistroRelease: Ubuntu 15.04
DistroVariant: ubuntu
DkmsStatus:
 r8168, 8.039.00, 3.19.0-23-generic, x86_64: installed
 r8168, 8.039.00, 3.19.0-26-generic, x86_64: installed
ExtraDebuggingInterest: Yes, if not too technical
GraphicsCard:
 Intel Corporation Atom Processor Z36xxx/Z37xxx Series Graphics & Display [8086:0f31] (rev 0e) (prog-if 00 [VGA controller])
   Subsystem: Foxconn International, Inc. Device [105b:0db1]
InstallationDate: Installed on 2015-08-19 (2 days ago)
InstallationMedia: Ubuntu-GNOME 15.04 "Vivid Vervet" - Release amd64 (20150422)
MachineType: To be filled by O.E.M. To be filled by O.E.M.
Package: xserver-xorg-video-intel 2:2.99.917-1~exp1ubuntu2.2
PackageArchitecture: amd64
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.19.0-26-generic root=UUID=2ee99c40-1fb2-4acc-a729-01443306486f ro text
ProcVersionSignature: Ubuntu 3.19.0-26.28-generic 3.19.8-ckt4
Tags:  vivid ubuntu
UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
Uname: Linux 3.19.0-26-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
 
_MarkForUpload: True
dmi.bios.date: 07/17/2014
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: D72F1P05_x64
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: nT-iBT18/nT-iBT19/nT-iBT29
dmi.board.vendor: Foxconn
dmi.board.version: FAB 1.0
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrD72F1P05_x64:bd07/17/2014:svnTobefilledbyO.E.M.:pnTobefilledbyO.E.M.:pvrTobefilledbyO.E.M.:rvnFoxconn:rnnT-iBT18/nT-iBT19/nT-iBT29:rvrFAB1.0:cvnToBeFilledByO.E.M.:ct3:cvrToBeFilledByO.E.M.:
dmi.product.name: To be filled by O.E.M.
dmi.product.version: To be filled by O.E.M.
dmi.sys.vendor: To be filled by O.E.M.
version.compiz: compiz N/A
version.ia32-libs: ia32-libs N/A
version.libdrm2: libdrm2 2.4.60-2
version.libgl1-mesa-dri: libgl1-mesa-dri 10.5.2-0ubuntu1
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 10.5.2-0ubuntu1
version.xserver-xorg-core: xserver-xorg-core 2:1.17.1-0ubuntu3
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.9.0-1ubuntu2
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:7.5.0-1ubuntu2
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917-1~exp1ubuntu2.2
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.11-1ubuntu2build1
xserver.bootTime: Thu Aug 20 17:18:12 2015
xserver.configfile: default
xserver.devices:
 input        Power Button         KEYBOARD, id 6
 input        Video Bus            KEYBOARD, id 7
 input        Sleep Button         KEYBOARD, id 8
 input          Multimedia Air Mouse Keyboard KEYBOARD, id 9
 input          Multimedia Air Mouse Keyboard KEYBOARD, id 10
xserver.errors:
 
xserver.logfile: /var/log/Xorg.0.log
xserver.outputs:
 
xserver.version: 2:1.17.1-0ubuntu3

tags:

added: ubuntu

Revision history for this message

dcastro (maildcastro) wrote on 2015-08-21: BootLog.txt

#21

BootLog.txt Edit (6.8 KiB, text/plain)

apport information

Revision history for this message

dcastro (maildcastro) wrote on 2015-08-21: CurrentDmesg.txt

#22

CurrentDmesg.txt Edit (58.8 KiB, text/plain)

apport information

Revision history for this message

dcastro (maildcastro) wrote on 2015-08-21: Dependencies.txt

#23

Dependencies.txt Edit (3.6 KiB, text/plain)

apport information

Revision history for this message

dcastro (maildcastro) wrote on 2015-08-21: DpkgLog.txt

#24

DpkgLog.txt Edit (1.5 MiB, text/plain)

apport information

Revision history for this message

dcastro (maildcastro) wrote on 2015-08-21: JournalErrors.txt

#25

JournalErrors.txt Edit (9.2 KiB, text/plain)

apport information

Revision history for this message

dcastro (maildcastro) wrote on 2015-08-21: Lspci.txt

#26

Lspci.txt Edit (25.7 KiB, text/plain)

apport information

Revision history for this message

dcastro (maildcastro) wrote on 2015-08-21: Lsusb.txt

#27

Lsusb.txt Edit (450 bytes, text/plain)

apport information

Revision history for this message

dcastro (maildcastro) wrote on 2015-08-21: ProcCpuinfo.txt

#28

ProcCpuinfo.txt Edit (3.6 KiB, text/plain)

apport information

Revision history for this message

dcastro (maildcastro) wrote on 2015-08-21: ProcEnviron.txt

#29

ProcEnviron.txt Edit (303 bytes, text/plain)

apport information

Revision history for this message

dcastro (maildcastro) wrote on 2015-08-21: ProcInterrupts.txt

#30

ProcInterrupts.txt Edit (2.0 KiB, text/plain)

apport information

Revision history for this message

dcastro (maildcastro) wrote on 2015-08-21: ProcModules.txt

#31

ProcModules.txt Edit (4.8 KiB, text/plain)

apport information

Revision history for this message

dcastro (maildcastro) wrote on 2015-08-21: UdevDb.txt

#32

UdevDb.txt Edit (142.4 KiB, text/plain)

apport information

Revision history for this message

dcastro (maildcastro) wrote on 2015-08-21: XorgLog.txt

#33

XorgLog.txt Edit (18.3 KiB, text/plain)

apport information

Revision history for this message

dcastro (maildcastro) wrote on 2015-08-21: XorgLogOld.txt

#34

XorgLogOld.txt Edit (18.3 KiB, text/plain)

apport information

Bug Watch Updater (bug-watch-updater) on 2015-08-21

Changed in xserver-xorg-video-intel:
importance:	Unknown → Medium
status:	Unknown → Confirmed

Revision history for this message

In freedesktop.org Bugzilla #88012, J-M Latino (jm-fis) wrote on 2015-09-17:

#107

Have been subjected to this segment fault after performing a kernel update last week to 3.13.0-64-generic. The seg fault would particularly be prevalent under Kodi 15.x when viewing video material. My dmseg: http://pastebin.com/Gc7R4X5u

Installed 'Fritsch' custom kernel (incorporating his 'legacy turbo fix' 3.19.2-legacy+edid+) from his post at http://forum.kodi.tv/showthread.php?tid=238447, and that fixed the issue for me.

Revision history for this message

In freedesktop.org Bugzilla #88012, Fritsch-b (fritsch-b) wrote on 2015-09-17:

#108

Wait: A segmentation fault is something completely different than what is discussed in this bugreport. From your forum post I figured you get full freeze of your system. Does it full freeze or do you get a segfault?

If segfault -> post that log, then your bug is something else.

Revision history for this message

In freedesktop.org Bugzilla #88012, J-M Latino (jm-fis) wrote on 2015-09-17:

#109

My apologies for my use of incorrect terminology ... yes, 'full system freezes' was the term I should have used!

Revision history for this message

In freedesktop.org Bugzilla #88012, Mazout360 (mazout360) wrote on 2015-09-22:

#110

My comment wont be very helpful... I tried kernel 4.1.6 from kernel.org and it doesnt freeze, but kernel 4.2 does (both selfcompiled). I then tried the legacy turbo patch on 4.2 and although it seems to last longer it does end with a full system freeze during video playback. Both used on the same system with baytrail i915 with nightly mesa/drivers.

Revision history for this message

In freedesktop.org Bugzilla #88012, Nanawel (nanawel) wrote on 2015-09-22:

#111

@Maxime Bergeron:
It is helpful, I was recently wondering if I should switch back to the latest kernel on my Archlinux. Now I know I should better stick with the 3.14 LTS. Thanks.

Revision history for this message

In freedesktop.org Bugzilla #88012, Zhou Yi Chao (broken-zhou) wrote on 2015-09-22:

#112

I can confirm that "legacy turbo" patch doesn't work for me either. Using that patch with ck-kernel, my system still freezes under high CPU-load frequently. The last kernel works for me is the 3.18.x branch.

Revision history for this message

In freedesktop.org Bugzilla #88012, Bernhard-mlecnik (bernhard-mlecnik) wrote on 2015-10-06:

#113

Hi, I can confirm the freeze also on a BYT (Zotac Pico 320, Intel Atom Z3735F). I first used the patched Kernel 3.19.1 (with legacy turbo fix) and got rarely random freezes like 1 per week, but after some updates I got several daily freezes again. I used then the patched Kernel 3.19.2 (with legacy turbo fix + edid) and it freezes less but still 3-4 freezes a week. Thanks for your great work on this!

Revision history for this message

In freedesktop.org Bugzilla #88012, Viktor Kojouharov (vkojouharov) wrote on 2015-10-08:

#114

With kernel 4.1, the system was relatively stable. I would probably get 1 freeze in a couple of weeks. Upgrading to 4.2 causes massive freezes, both during playback and if kodi just shows its first screen. Freezes happen within an hour of playback, and maybe within 12 if nothing is being played.

Revision history for this message

In freedesktop.org Bugzilla #88012, Wot (wdriessen) wrote on 2015-10-17:

#115

I'll just chime in, as I also notice a huge difference between kernel 4.1 and 4.2. On my Shuttle XS36V4 (Celeron J1900), running arch completely up-to-date, I can hardly watch 15 minutes of video before it all locks up. After reverting the kernel back to 4.1.6-1, the system is quite stable again.

Revision history for this message

In freedesktop.org Bugzilla #88012, Adf-lists (adf-lists) wrote on 2015-10-19:

#116

My Asrock Q1900DC was originally bought to be a headless router/pvr/nas which it now is - so no more testing of this lock from me for a long time (or so I thought).

When putting it to its new duties I put a vanilla 4.1.1 on it (didn't patch as being headless I don't get any i915 interrupts). All was good - uptime 100 days varied CPU loads no issues.

USB has some xhci isoc pstate issues which were worked around by disabling USB3 in bios to force ehci driver. This issue was low level packet loss from dvb tuners not locks.

Recently needed to re-locate and while doing so updated to 4.1.10 = hard lock after 7 days uptime. The kernel was not the only difference as I attached a usb printer and so have usb module and cups running now, though the printer had been off for days when it locked.

Anyway I am back on 4.1.1 now (with printer) and will have to see how long it lasts to be sure whether the kernel or the printer (or the move!) was the cause.

Revision history for this message

In freedesktop.org Bugzilla #88012, Michal-9 (michal-9) wrote on 2015-10-21:

#117

I'm seeing very similar symptoms on my Celeron N2940 sysstem. I'm using Arch distro with kernel 4.2.3-1. The system freezes from time to time when playing videos, especialy when using HW acceleration. It usually happens when playing videos from mpv with VAAPI HW decoding enabled or flash videos in Firefox with HW decoding enabled in flash config file. System freezes usually in 10 to 30 minutes of playback. Playing videos with no HW decoding means less freezes of my system, but not avoiding them. It just happens less frequent and from time to time, even when no videoplayback is running in my LXDE.

My best guess is that this behavior started somewhere between kernel version 4.0.7 and 4.1.6. Unfortunatelly, I can't be more specific, as it took me more than two months stresstesting CPU and memory before I pinpointed this problem as most probably connected with heavy GFX usage.

I already tried a few options with no luck, like i915.reset=0/1, i915.enable_rc6=0/1 and i915.semaphores=0/1. I couldn't feel any difference, except with enable_rc6=9. System was even less stable then.

Using drm.debug=1 did not produce any intereseting messages before freeze. It's filled with "random" I915_GEM_BUSY, I9!5_GEM_EXECBUFFER2 and I915_GEM_MADVISE messages up to freeze. Of course, I can post the log if someone feels it's interested anyway.

I'm willing to offer more help with debuging this issue.

Revision history for this message

In freedesktop.org Bugzilla #88012, ladiko (ladiko) wrote on 2015-10-21:

#118

Unrelated to this bug but people who return to 3.16 or 3.13 on Ubuntu may use 3.13.0-65 or 3.16.0-50 due to this bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1503655

Revision history for this message

In freedesktop.org Bugzilla #88012, ladiko (ladiko) wrote on 2015-10-22:

#119

I just saw that the other issue seems to be fixed recently, sorry for the disturbance. Just interesting how long this bug exists unfixed while the other one has been fixed quite fast.

Revision history for this message

In freedesktop.org Bugzilla #88012, Vladimir-jicha (vladimir-jicha) wrote on 2015-10-22:

#120

I also don't understand that such serious bug hasn't been fixed yet. Does anybody at Intel even care about it?

Revision history for this message

In freedesktop.org Bugzilla #88012, SM Fahmid (smfahmid2009) wrote on 2015-10-26:

#121

Also affecting Debian Jessie's stock kernel, 3.16.0-4-amd64.

I am running on a Thinkpad T430 with i5-3320M 2.6GHz Ivybridge CPU. Under sustained load, the whole computer will freeze (no ssh, no keyboard inputs, no nothing), within a period of 4 to 4.5 hours.

Using YouTube or running any sort of video media will escalate the problem, however, it can freeze up randomly, between 30 mins to over 6 hours.

After compiling kernel 3.15.10 from kernel.org, the issue is gone. This seems to have started only from version 3.16 and onwards. Version 4.2.3, the latest kernel, STILL has not fixed this issue.

Intel, please do something about this. Some people might have need for the latest kernel (I do not, at the moment, but I'd rather not stick with an outdated kernel).

Revision history for this message

In freedesktop.org Bugzilla #88012, Deepak-s-8 (deepak-s-8) wrote on 2015-10-26:

#122

@Jesse, Shall we enable legacy turbo on BYT until we have rootcause on BYT WA?
Also, Chris has submitted a cleaned up patch for "WA for Turbo and RC6 to work together" for review.

Revision history for this message

In freedesktop.org Bugzilla #88012, Michal-9 (michal-9) wrote on 2015-10-27:

#123

Just a quick confirmation.

I haven't seen no freeze while watching video in more than 4 hours now, when I tried Using the kernel option intel_pstate=disable.

Revision history for this message

In freedesktop.org Bugzilla #88012, ladiko (ladiko) wrote on 2015-10-28:

#124

And how much warmer does the CPU get?

Revision history for this message

In freedesktop.org Bugzilla #88012, Michal-9 (michal-9) wrote on 2015-10-28:

#125

It is quite hard for me to compare. Before I found this "magic" kernel parameter, my notebook was usualy frozen before CPU could get any warmer.. Since yesterday, I haven't seen more than 45C on all cores, while working in office apps or watching a movie. I guess, this temperature is not an issue on Celeron N2940.

Anyway - if I have to choose between reliable working notebook with a bit warmer CPU and randomly freezing notebook with calm CPU, I choose the first one for sure. ;-)

Revision history for this message

In freedesktop.org Bugzilla #88012, ladiko (ladiko) wrote on 2015-10-28:

#126

actually you have three options:

* current kernel --> freezes
* current kernel + pstates-parameter -> warmer cpu
* kernel 3.16 --> no issues

Revision history for this message

In freedesktop.org Bugzilla #88012, Chris Rainey (ckrzen) wrote on 2015-10-29:

#127

3.16 working well on my DELL Inspiron 3646:

I've had little to no trouble ... even stressing the system using:

glmark2 --run-forever

I got my 3.16 kernel here:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.16.7-ckt18-utopic/

I'm currently using Ubuntu 15.10 with the 3.16 kernel.

Hope this helps !!

Revision history for this message

In freedesktop.org Bugzilla #88012, Kkrawczyk-it (kkrawczyk-it) wrote on 2015-10-30:

#129

Currently I use kernel 3.16.0-4 (Debian Jessie default) and since that change (before I had kernel 4.2) I do not experience any system freeze. For now my computer (ASROCK SBC-211P, BYT CPU) is working second day without any crash. Before, I had Kernel 4.2 on which I had system freeze after 2min from system boot and playing video in VLC.

Revision history for this message

In freedesktop.org Bugzilla #88012, Timayers99 (timayers99) wrote on 2015-10-30:

#130

Me too... random freezes with Gentoo on a Biostar J1900MH2. Used as a HTPC/mythfrontend, so any kernel too old to provide audio over HDMI is not OK. I have been testing different BIOS settings and kernel configs. Currently running 4.2.4

Revision history for this message

In freedesktop.org Bugzilla #88012, Jbmacbrodie-m (jbmacbrodie-m) wrote on 2015-10-30:

#131

Same here with random freezes. Tried intel_pstate=disabled which works. However, cutting the max GPU frequency to about 50% also works for me. Video seems smoother compared to pstate=disabled. YMMV

Running 64 bit Mint 17.2/Cinnamon on ASUS T100-CHI linux-4.2.5 w/Ubuntu base and T100 specific patches. (Intel Atom Z3775, ValleyView Gen7)

Also seeing freezes on Dell Inspiron Laptop (Intel N3540) with various Ubuntu kernels from 4.3-rc7 back to 3.18.21, though usually much less than once a day.

CHI without workarounds usually freezes within minutes to several hours. With GPU capped, it runs as long as I let it - usually a few days.

Revision history for this message

In freedesktop.org Bugzilla #88012, ladiko (ladiko) wrote on 2015-10-30:

#132

How to limit the GPU frequency?

Revision history for this message

In freedesktop.org Bugzilla #88012, Jbmacbrodie-m (jbmacbrodie-m) wrote on 2015-10-30:

#133

Same here with random freezes. Tried intel_pstate=disabled which works. However, cutting the max GPU frequency to about 50% also works for me. Video seems smoother compared to pstate=disabled. YMMV

Running 64 bit Mint 17.2/Cinnamon on ASUS T100-CHI linux-4.2.5 w/Ubuntu base and T100 specific patches. (Intel Atom Z3775, ValleyView Gen7)

Also seeing freezes on Dell Inspiron Laptop (Intel N3540) with various Ubuntu kernels from 4.3-rc7 back to 3.18.21, though usually much less than once a day.

CHI without workarounds usually freezes within minutes to several hours. Software rendering improves run time/reduces freeze rate. However, with GPU capped, it runs as long as I let it - usually a few days.

Revision history for this message

In freedesktop.org Bugzilla #88012, Jbmacbrodie-m (jbmacbrodie-m) wrote on 2015-10-30:

#134

To cap frequency I read the max (779 for mine) from

cat /sys/class/drm/card0/gt_max_freq_mhz

To set pick a lower value (as root)

echo 423 > /sys/kernel/debug/dri/0/i915_max_freq

I tried lower values in 100 Mhz steps until I found stability (to 423 in my case).

I think you could just put back to the gt_max but this worked for me.

This value resets each boot, and the driver rounds the value to something close.

Revision history for this message

In freedesktop.org Bugzilla #88012, Timayers99 (timayers99) wrote on 2015-11-01:

#135

I have found a setting that controls the random freezes, at least on my board. Disabling "IGD Turbo Enable" under NorthBridge options in the BIOS. Otherwise, the BIOS is set to the defaults.

Different kernel .configs had no effect. I have enabled all Baytrail options and boot from an EFI stub.

Revision history for this message

In freedesktop.org Bugzilla #88012, Luka-karinja (luka-karinja) wrote on 2015-11-02:

#136

Lowering i915_max_freq, even setting it to min still freezes my T100TAF (Atom Z3735).
I haven't experienced any freezes with pstate=disabled, but performance is really affected

Revision history for this message

In freedesktop.org Bugzilla #88012, Jbmacbrodie-m (jbmacbrodie-m) wrote on 2015-11-02:

#137

Given Luka Karinja's results, I checked my kernel args to see if something else could account for my results. I found - i915.i915_enable_rc6=1 i915.lvds_downclock=1 i915.semaphores=1 i915.i915_enable_fbc=1.

rc6=1 seems to be known to add instability, perhaps the freq cap offset that. I've stripped the args (except boot, splash, quiet) will be running new tests.

Kernel args I've been using the last few weeks on T100CHI.

boot=pci,force acpi=force rcutree.rcu_idle_gp_delay=1 libahci.ignore_sss=1 splash quiet acpi_enforce_resources=lax i915.i915_enable_rc6=1 i915.lvds_downclock=1 i915.semaphores=1 i915.i915_enable_fbc=1 drm.vblankoffdelay=1 pcie_aspm=force acpi=force rcutree.rcu_idle_gp_delay=1 libahci.ignore_sss=1 splash quiet acpi_enforce_resources=lax drm.vblankoffdelay=1 pcie_aspm=force

Revision history for this message

In freedesktop.org Bugzilla #88012, ladiko (ladiko) wrote on 2015-11-02:

#138

I still get rare freezes on ubuntu with linux-image-generic-lts-utopic (kernel 3.16.0). Does pstates=disabled only effect Intel-CPUs or AMDs as well? I am searching for a general setup that doesnt effect AMD-cpus but Intel Baytrail only.

Revision history for this message

In freedesktop.org Bugzilla #88012, Jani-nikula (jani-nikula) wrote on 2015-11-03:

#140

(In reply to John from comment #102)
> Given Luka Karinja's results, I checked my kernel args to see if something
> else could account for my results. I found - i915.i915_enable_rc6=1
> i915.lvds_downclock=1 i915.semaphores=1 i915.i915_enable_fbc=1.

i915.i915_enable_rc6 and i915.i915_enable_fbc have been renamed i915.enable_rc6 and i915.enable_fbc, respectively, since v3.15 so those have had no impact.

These days all of those are considered debug options, and we taint the kernel if they've been set.

Revision history for this message

In freedesktop.org Bugzilla #88012, Jbmacbrodie-m (jbmacbrodie-m) wrote on 2015-11-03:

#141

(In reply to Jani Nikula from comment #105)
<snip>
> i915.i915_enable_rc6 and i915.i915_enable_fbc have been renamed
> i915.enable_rc6 and i915.enable_fbc, respectively, since v3.15 so those have
> had no impact.
>
> These days all of those are considered debug options, and we taint the
> kernel if they've been set.

Appreciate the info. Retested: no args, no cap -> froze < 2 hours, reboot froze within 2 minutes. Frequency cap only, still running (25+ hrs.)

But it looks like I've been just rehashing comments 33-36, which also didn't work for everyone. Only difference is 50% cap vs. minimum cap. Improvement?

Revision history for this message

In freedesktop.org Bugzilla #88012, Kkrawczyk-it (kkrawczyk-it) wrote on 2015-11-04:

#142

Every kernel above 3.16.x just fails.

3.16.x - no freeze
> 3.16.x - freezes no later than six hours after video launch.

I checked many kernel versions: 3.16.x, 3.17.x, 3.18.x, 3.19.x, 4.0.x, 4.1.x, 4.2.x and latest 4.3. None of described above kernel parameters works.

For tests I used ASROCK SBC-211P (Baytrail-E3800).

Revision history for this message

In freedesktop.org Bugzilla #88012, Laszlo-fiat (laszlo-fiat) wrote on 2015-11-08:

#143

(In reply to John from comment #99)
> To cap frequency I read the max (779 for mine) from
>
> cat /sys/class/drm/card0/gt_max_freq_mhz
>
> To set pick a lower value (as root)
>
> echo 423 > /sys/kernel/debug/dri/0/i915_max_freq

I have a Z3735F baytrail tablet running Debian 8 with a 1 month old linux-next kernel.

I've lowered the i915_max_freq to 345 MHz, and achieved stability that way.
No freezes since then. The Z3735F GPU has a base freq of 311 MHz, so I am pretty close to that.

I have also patched the kernel source with a few baytrail sdhci related patches from: https://github.com/hadess/rtl8723bs/tree/master/patches

Revision history for this message

In freedesktop.org Bugzilla #88012, Adf-lists (adf-lists) wrote on 2015-11-08:

#144

(In reply to Andy Furniss from comment #81)

<snip>

> Recently needed to re-locate and while doing so updated to 4.1.10 = hard
> lock after 7 days uptime. The kernel was not the only difference as I
> attached a usb printer and so have usb module and cups running now, though
> the printer had been off for days when it locked.
>
> Anyway I am back on 4.1.1 now (with printer) and will have to see how long
> it lasts to be sure whether the kernel or the printer (or the move!) was the
> cause.

Still up OK after 20 days back on 4.1.1.

Strange that 4.1.10 seems to be a regression, there don't seem to be any obvious power related i915 commits between the two. Though as I am headless I am not getting and i915 interrupts anyway, which makes me thing that there is some different CPU/IO related regression. In all the testing I did before when using GPU I never locked by just stressing CPU/IO until maybe just before I stopped testing when I could get "make modules_install" to reliably lock (as noted in a previous comment).

Revision history for this message

In freedesktop.org Bugzilla #88012, Michal-9 (michal-9) wrote on 2015-11-08:

#145

(In reply to Andy Furniss from comment #109)
> (In reply to Andy Furniss from comment #81)
>
> Still up OK after 20 days back on 4.1.1.
>
> Strange that 4.1.10 seems to be a regression, there don't seem to be any
> obvious power related i915 commits between the two. Though as I am headless
> I am not getting and i915 interrupts anyway, which makes me thing that there
> is some different CPU/IO related regression. In all the testing I did before
> when using GPU I never locked by just stressing CPU/IO until maybe just
> before I stopped testing when I could get "make modules_install" to reliably
> lock (as noted in a previous comment).

To make it even more strange - As I reported earlier, on kernel 4.2.3 my system was unusable. I've downgraded to LTS kernel 4.1.12 and had not a single issue since than. I'm running 4.1.12 sucessufully for more than a week now - not a single freeze. I don't even need any pstate=disable command args any more, which was necessary on 4.2.3 to survive more than few minutes. I haven't tested 4.1.10 though.

penalvch (penalvch) on 2015-11-09

summary:

- Xubuntu freeze once a day
+ 8086:0f31 Xubuntu freeze once a day

Revision history for this message

In freedesktop.org Bugzilla #88012, Jbmacbrodie-m (jbmacbrodie-m) wrote on 2015-11-11:

#146

The notes for 4.2.6 claim to fix one problem that causes GPU locks. When I added the incremental patch set, the longest it ran was about an hour (usually it froze within 5 minutes.) I had just stopped a 6 day run (24/7) on my (ASUS baytrail) T100 specific 4.2.5 kernel (no args, 50% GPU cap) (with sdhci patches) The freezes in 4.2.6 now seem to be independent of GPU frequency for my setup.

Revision history for this message

In freedesktop.org Bugzilla #88012, SweX (swexru) wrote on 2015-11-11:

#148

I've got freezes on baytrail tablet ASUS Vivotab note 8 (m80ta). But for me it looks unrelated to i915. Even with nomodeset and rmmod i915 system hang after some random time. From minutes to several hours.

Revision history for this message

In freedesktop.org Bugzilla #88012, Cffwet (cffwet) wrote on 2015-11-13:

#149

I have system freezes on ASRock Q1900-ITX with a kernel 3.19.31-generic on an Ubuntu distro. I upgraded to kernel 4.2.0-16-generic last month and recently to 4.2.0-18-generic. The system freezes got worse (less than 10 min watching videos).

I disabled hardware acceleration in all software with this option, like in my browsers. Further I edited the file /etc/default/acpi-support: I disabled suspend/hibernate handling in acpi-support by changing the line "SUSPEND_METHODS="dbus-pm dbus-hal pm-utils" to "SUSPEND_METHODS="none".

I don't get any freezes anymore, now for 24h for both kernels 3.19.31-generic and 4.2.0-18-generic with a lot of video playing. I didn't tested on kernel 4.2.0-16-generic.

I tested disabling hardware acceleration without changing the acpi-support file. And I tested disabling suspend/hibernate handling with hardware acceleration. In both cases I still got freezes but it seems less frequent. I needed both options disabled to get rid of all the freezes.

Revision history for this message

Paco (patrick-kowalzick) wrote on 2015-11-15:

#147

My debian stretch was freezing several times a day. After disabling NoAccel and DRI I had no more freezes.

$ cat /etc/X11/xorg.conf.d/20-intel.conf
Section "Device"
   Identifier "Intel Graphics"
   Driver "intel"
   Option "NoAccel" "True"
   Option "DRI" "False"
EndSection

$ uname -r
4.2.0-1-amd64

Other ressources:
https://wiki.archlinux.org/index.php/Intel_graphics#X_freeze.2Fcrash_with_intel_driver

Revision history for this message

In freedesktop.org Bugzilla #88012, Carl-wolfgang (carl-wolfgang) wrote on 2015-11-17:

#150

Download full text (4.2 KiB)

On a zotac ci320 nano with ubuntu trusty server 14.04.3 LTS and kernel
from openelec forum 3.19.1-legacy-turbo+ with yavdr
unstable installed and va-api-glx in softhddevice vdr plugin a kernel oops
left the following trace, maybe usefull because freezes normally don't leave
a trace in the logs,..

Nov 17 22:12:55 nano4 kernel: [ 4740.991238] ------------[ cut here ]------------
Nov 17 22:12:55 nano4 kernel: [ 4740.991365] WARNING: CPU: 3 PID: 134 at drivers/gpu/drm/i915/intel_pm.c:4492 valleyview_set_rps+0x167/0x1a0 [i915]()
Nov 17 22:12:55 nano4 kernel: [ 4740.991375] WARN_ON(val > dev_priv->rps.max_freq_softlimit)
Nov 17 22:12:55 nano4 kernel: [ 4740.991383] Modules linked in: msr(E) autofs4(E) rc_tt_1500(OE) ts2020(OE) m88ds3103(OE) i2c_mux(E) arc4(E) intel_rapl(E) intel_powerclamp(E) snd_hda_codec_hdmi(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) kvm(E) crct10dif_pclmul(E) crc32_pclmul(E) dvb_usb_dw2102(OE) dvb_usb(OE) ghash_clmulni_intel(E) iwlmvm(E) cryptd(E) dvb_core(OE) snd_soc_rt5640(E) mac80211(E) media(OE) snd_hda_intel(E) snd_soc_rl6231(E) snd_hda_controller(E) snd_intel_sst_acpi(E) snd_intel_sst_core(E) snd_soc_sst_mfld_platform(E) snd_hda_codec(E) snd_soc_core(E) serio_raw(E) snd_compress(E) iwlwifi(E) btusb(E) snd_pcm_dmaengine(E) snd_hwdep(E) cfg80211(E) snd_pcm(E) snd_seq_midi(E) snd_seq_midi_event(E) ir_lirc_codec(OE) ir_xmp_decoder(OE) lirc_dev(OE) ir_mce_kbd_decoder(OE) mei_txe(E) iosf_mbi(E) ir_sharp_decoder(OE) mei(E) lpc_ich(E) shpchp(E) ir_sanyo_decoder(OE) snd_rawmidi(E) ir_sony_decoder(OE) ir_jvc_decoder(OE) ir_rc6_decoder(OE) ir_rc5_decoder(OE) snd_seq(E) ir_nec_decoder(OE) snd_seq_device(E) snd_timer(E) rc_rc6_mce(OE) nuvoton_cir(OE) rc_core(OE) 8250_fintek(E) snd(E) rfcomm(E) bnep(E) dw_dmac(E) dw_dmac_core(E) i2c_hid(E) hid(E) rfkill_gpio(E) soundcore(E) bluetooth(E) snd_soc_sst_acpi(E) 8250_dw(E) spi_pxa2xx_platform(E) i2c_designware_platform(E) i2c_designware_core(E) pwm_lpss_platform(E) mac_hid(E) pwm_lpss(E) i915(E) video(E) drm_kms_helper(E) nfsd(E) drm(E) auth_rpcgss(E) nfs_acl(E) i2c_algo_bit(E) nfs(E) lockd(E) grace(E) sunrpc(E) fscache(E) nct6775(E) hwmon_vid(E) coretemp(E) lp(E) parport(E) nls_iso8859_1(E) psmouse(E) r8169(E) mii(E) ahci(E) libahci(E) sdhci_acpi(E) sdhci(E)
Nov 17 22:12:55 nano4 kernel: [ 4740.991767] CPU: 3 PID: 134 Comm: kworker/3:2 Tainted: G OE 3.19.1-legacy-turbo+ #1
Nov 17 22:12:55 nano4 kernel: [ 4740.991778] Hardware name: Motherboard by ZOTAC ZBOX-CI320NANO series/ZBOX-CI320NANO series, BIOS B219P026 05/19/2015
Nov 17 22:12:55 nano4 kernel: [ 4740.991859] Workqueue: events intel_gen6_powersave_work [i915]
Nov 17 22:12:55 nano4 kernel: [ 4740.991871] ffffffffc06cb3c8 ffff88003655fcc8 ffffffff8179acb0 0000000000000000
Nov 17 22:12:55 nano4 kernel: [ 4740.991890] ffff88003655fd18 ffff88003655fd08 ffffffff81073a7a ffff88003655fcf8
Nov 17 22:12:55 nano4 kernel: [ 4740.991908] ffff880078550000 00000000000000d6 00000000000000d6 ffff880077acd000
Nov 17 22:12:55 nano4 kernel: [ 4740.991927] Call Trace:
Nov 17 22:12:55 nano4 kernel: [ 4740.991968] [<ffffffff8179acb0>] dump_stack+0x45/0x57
Nov 17 22:12:55 nano4 kernel: [ 4740.991993] [<ffffffff81073a7a>] warn_slow...

On a zotac ci320 nano with ubuntu trusty server 14.04.3 LTS and kernel 
from openelec forum 3.19.1-legacy-turbo+ with yavdr
unstable installed and va-api-glx in softhddevice vdr plugin a kernel oops
left the following trace, maybe usefull because freezes normally don't leave
a trace in the logs,..

Nov 17 22:12:55 nano4 kernel: [ 4740.991238] ------------[ cut here ]------------
Nov 17 22:12:55 nano4 kernel: [ 4740.991365] WARNING: CPU: 3 PID: 134 at drivers/gpu/drm/i915/intel_pm.c:4492 valleyview_set_rps+0x167/0x1a0 [i915]()
Nov 17 22:12:55 nano4 kernel: [ 4740.991375] WARN_ON(val > dev_priv->rps.max_freq_softlimit)
Nov 17 22:12:55 nano4 kernel: [ 4740.991383] Modules linked in: msr(E) autofs4(E) rc_tt_1500(OE) ts2020(OE) m88ds3103(OE) i2c_mux(E) arc4(E) intel_rapl(E) intel_powerclamp(E) snd_hda_codec_hdmi(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) kvm(E) crct10dif_pclmul(E) crc32_pclmul(E) dvb_usb_dw2102(OE) dvb_usb(OE) ghash_clmulni_intel(E) iwlmvm(E) cryptd(E) dvb_core(OE) snd_soc_rt5640(E) mac80211(E) media(OE) snd_hda_intel(E) snd_soc_rl6231(E) snd_hda_controller(E) snd_intel_sst_acpi(E) snd_intel_sst_core(E) snd_soc_sst_mfld_platform(E) snd_hda_codec(E) snd_soc_core(E) serio_raw(E) snd_compress(E) iwlwifi(E) btusb(E) snd_pcm_dmaengine(E) snd_hwdep(E) cfg80211(E) snd_pcm(E) snd_seq_midi(E) snd_seq_midi_event(E) ir_lirc_codec(OE) ir_xmp_decoder(OE) lirc_dev(OE) ir_mce_kbd_decoder(OE) mei_txe(E) iosf_mbi(E) ir_sharp_decoder(OE) mei(E) lpc_ich(E) shpchp(E) ir_sanyo_decoder(OE) snd_rawmidi(E) ir_sony_decoder(OE) ir_jvc_decoder(OE) ir_rc6_decoder(OE) ir_rc5_decoder(OE) snd_seq(E) ir_nec_decoder(OE) snd_seq_device(E) snd_timer(E) rc_rc6_mce(OE) nuvoton_cir(OE) rc_core(OE) 8250_fintek(E) snd(E) rfcomm(E) bnep(E) dw_dmac(E) dw_dmac_core(E) i2c_hid(E) hid(E) rfkill_gpio(E) soundcore(E) bluetooth(E) snd_soc_sst_acpi(E) 8250_dw(E) spi_pxa2xx_platform(E) i2c_designware_platform(E) i2c_designware_core(E) pwm_lpss_platform(E) mac_hid(E) pwm_lpss(E) i915(E) video(E) drm_kms_helper(E) nfsd(E) drm(E) auth_rpcgss(E) nfs_acl(E) i2c_algo_bit(E) nfs(E) lockd(E) grace(E) sunrpc(E) fscache(E) nct6775(E) hwmon_vid(E) coretemp(E) lp(E) parport(E) nls_iso8859_1(E) psmouse(E) r8169(E) mii(E) ahci(E) libahci(E) sdhci_acpi(E) sdhci(E)
Nov 17 22:12:55 nano4 kernel: [ 4740.991767] CPU: 3 PID: 134 Comm: kworker/3:2 Tainted: G           OE  3.19.1-legacy-turbo+ #1
Nov 17 22:12:55 nano4 kernel: [ 4740.991778] Hardware name: Motherboard by ZOTAC ZBOX-CI320NANO series/ZBOX-CI320NANO series, BIOS B219P026 05/19/2015
Nov 17 22:12:55 nano4 kernel: [ 4740.991859] Workqueue: events intel_gen6_powersave_work [i915]
Nov 17 22:12:55 nano4 kernel: [ 4740.991871]  ffffffffc06cb3c8 ffff88003655fcc8 ffffffff8179acb0 0000000000000000
Nov 17 22:12:55 nano4 kernel: [ 4740.991890]  ffff88003655fd18 ffff88003655fd08 ffffffff81073a7a ffff88003655fcf8
Nov 17 22:12:55 nano4 kernel: [ 4740.991908]  ffff880078550000 00000000000000d6 00000000000000d6 ffff880077acd000
Nov 17 22:12:55 nano4 kernel: [ 4740.991927] Call Trace:
Nov 17 22:12:55 nano4 kernel: [ 4740.991968]  [<ffffffff8179acb0>] dump_stack+0x45/0x57
Nov 17 22:12:55 nano4 kernel: [ 4740.991993]  [<ffffffff81073a7a>] warn_slowpath_common+0x8a/0xc0
Nov 17 22:12:55 nano4 kernel: [ 4740.992013]  [<ffffffff81073af6>] warn_slowpath_fmt+0x46/0x50
Nov 17 22:12:55 nano4 kernel: [ 4740.992111]  [<ffffffffc0620467>] valleyview_set_rps+0x167/0x1a0 [i915]
Nov 17 22:12:55 nano4 kernel: [ 4740.992202]  [<ffffffffc0621ecf>] intel_gen6_powersave_work+0xb2f/0x11b0 [i915]
Nov 17 22:12:55 nano4 kernel: [ 4740.992223]  [<ffffffff8108c6cf>] process_one_work+0x14f/0x400
Nov 17 22:12:55 nano4 kernel: [ 4740.992241]  [<ffffffff8108ce68>] worker_thread+0x118/0x510
Nov 17 22:12:55 nano4 kernel: [ 4740.992259]  [<ffffffff8108cd50>] ? rescuer_thread+0x3d0/0x3d0
Nov 17 22:12:55 nano4 kernel: [ 4740.992278]  [<ffffffff81092252>] kthread+0xd2/0xf0
Nov 17 22:12:55 nano4 kernel: [ 4740.992298]  [<ffffffff81092180>] ? kthread_create_on_node+0x180/0x180
Nov 17 22:12:55 nano4 kernel: [ 4740.992319]  [<ffffffff817a26fc>] ret_from_fork+0x7c/0xb0
Nov 17 22:12:55 nano4 kernel: [ 4740.992339]  [<ffffffff81092180>] ? kthread_create_on_node+0x180/0x180
Nov 17 22:12:55 nano4 kernel: [ 4740.992352] ---[ end trace 6a13023d6ab83790 ]---

Revision history for this message

caprico (caprico4) wrote on 2015-11-18:

#151

I'm living with the same bug with i915 driver:

- Happens in all Ubuntu versions with Kernel higher than version 3.16.x
- Screen freezes randomly, often when watching videos or scrolling through webpages with a lot of graphics. For testing I ran a Youtube Playlist and the system always froze after around 15-30min
- No input is possible when screen crashes, power-off button is the only solution
- I tried to load the kernel with i915.enable_rc6=0 and disabled intel_pstate, no success
- newer kernels like 4.2.0-16 and 4.3.0.040300 didn't solve the issue (seems they cause the crash to happen even faster)

Temporary solution:
-Install kernel 3.16.7 --> all issues dissapeared and so far I couldn't observe any freezes

Revision history for this message

In freedesktop.org Bugzilla #88012, peppedx (peppedx) wrote on 2015-11-19:

#152

It happens also to me (almost once a day) using on a fresh Ubuntu 15.10

-> Atom(TM) CPU E3845 @ 1.91GHz

-> Linux rehab-desktop 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

-> Intel® Graphics Stack Release 2015Q3 for Linux*

But it alse also on a Mint 17.2 (14.04 based) with 3.16 and 3.19 kernels using either SNA and UXA accel method.

Revision history for this message

In freedesktop.org Bugzilla #88012, Jsievikorte (jsievikorte) wrote on 2015-11-21:

#153

Hi All,

Came across this when hunting random freezes / crashes on Acer B115 laptop. It started with upgrade to ubuntu 15.04 (14.x worked ok, haven't noted the kernel versions).

http://ubuntuforums.org/showthread.php?t=2284615&p=13313066#post13313066 My original post in here.

OpenSuSE with kernel 4.0.5 seemed to run fine, but it might be that I looked it at the wrong end, because 15.04 ubuntu crashed only when going to sleep so that is the thing I tried to track down. 15.10 now crashes randomly during desktop use - and same happens in OpenSuSE Tumbleweed with 4.3 kernel.

Crashes seem intermittent, might make days without freeze and then couple of nights back two freezes in a row, second one just couple of minutes after reboot. Only load was chromium showing couple of large web pages when crashes happened. Symptons are quite same described in many posts, no sysrq possible, only power off works.

I did already try intel_pstate=disable and that made the system freeze on screensaver after just few minutes of uptime. After that I've booted with debugging options enabled and fiddled a bit with clock frequency setting, and haven't managed to crash since - but I'm still only three days up. Tried to make it crash by playing couple of games and/or HD videos, no luck so far. But this is to be expected, 15.10 ubuntu could also run couple of weeks - which makes this painful as there seems to be no clear way of reproducing the issue.

Just it makes me think that is there something going on with timings at hardware level? What I did try was to lower the frequency setting just lightly, with quick testing it didn't seem to matter how much I touched it. Also I'm a bit puzzled about the setting, is the /sys/kernel/debug/dri/0/i915_max_freq value in MHz or something else, as in log it says:

[26873.155419] [drm:valleyview_enable_rps] current GPU freq: 312 MHz (198)
[26873.155420] [drm:valleyview_enable_rps] setting GPU freq to 645 MHz (214)

And I think I saw this high values in log, even if I did set the frequency value to less than 400. Anyway, I'll update if I found anything else, this is annoying as it has been going on months now without a clear clue what is wrong with this laptop :)

Hi All,

Came across this when hunting random freezes / crashes on Acer B115 laptop. It started with upgrade to ubuntu 15.04 (14.x worked ok, haven't noted the kernel versions).

http://ubuntuforums.org/showthread.php?t=2284615&p=13313066#post13313066 My original post in here.

OpenSuSE with kernel 4.0.5 seemed to run fine, but it might be that I looked it at the wrong end, because 15.04 ubuntu crashed only when going to sleep so that is the thing I tried to track down. 15.10 now crashes randomly during desktop use - and same happens in OpenSuSE Tumbleweed with 4.3 kernel.

Crashes seem intermittent, might make days without freeze and then couple of nights back two freezes in a row, second one just couple of minutes after reboot. Only load was chromium showing couple of large web pages when crashes happened. Symptons are quite same described in many posts, no sysrq possible, only power off works.

I did already try intel_pstate=disable and that made the system freeze on screensaver after just few minutes of uptime. After that I've booted with debugging options enabled and fiddled a bit with clock frequency setting, and haven't managed to crash since - but I'm still only three days up. Tried to make it crash by playing couple of games and/or HD videos, no luck so far. But this is to be expected, 15.10 ubuntu could also run couple of weeks - which makes this painful as there seems to be no clear way of reproducing the issue.

Just it makes me think that is there something going on with timings at hardware level? What I did try was to lower the frequency setting just lightly, with quick testing it didn't seem to matter how much I touched it. Also I'm a bit puzzled about the setting, is the /sys/kernel/debug/dri/0/i915_max_freq value in MHz or something else, as in log it says:

[26873.155419] [drm:valleyview_enable_rps] current GPU freq: 312 MHz (198)
[26873.155420] [drm:valleyview_enable_rps] setting GPU freq to 645 MHz (214)

And I think I saw this high values in log, even if I did set the frequency value to less than 400. Anyway, I'll update if I found anything else, this is annoying as it has been going on months now without a clear clue what is wrong with this laptop :)

Revision history for this message

In freedesktop.org Bugzilla #88012, Jbmacbrodie-m (jbmacbrodie-m) wrote on 2015-11-25:

#154

I've been running several days without a freeze on my 4.2.6 kernel. I simply added intel_idle.max_cstate=1 to my kernel arguments, no other power arguments, and no more setting GPU frequency caps.

intel_idle.max_cstate=0 was effective too, but my system ran warm (not hot) when idle. At max_cstate=1 the case temperature seems normal to me.

I suspect that the cost of this work-around would be less battery run time. But until the T100CHI has full hardware support in linux (no sound, no bluetooth...), I'm tethered to a powered hub anyway.

I've also tested versions of 4.1.13, 4.2.6, 4.3, even 4.4-rc1 without obvious side-effects. 4.4rc2 did freeze within minutes of booting, but 4.4-rcx has too many regressions (no wifi even on a dongle) to take that freeze seriously.

I also tried max_cstate=2 on my Dell laptop (baytrail) but that seemed to trigger a "not quite" freeze during a kernel build (fan speed malfunction typical of a freeze, but the build finished successfully.) The subsequent power down crashed and the next boot was extremely difficult to start (press hold repeat). I'm not going to try the remaining max cstates 3-6!

This might suggest the freeze lies in handling cstates 2-6 starting after kernel-3.16.7. But that assumes this bandaid lasts more than another week.

Revision history for this message

In freedesktop.org Bugzilla #88012, Jbmacbrodie-m (jbmacbrodie-m) wrote on 2015-11-30:

#155

(In reply to John from comment #117)
> I've been running several days without a freeze on my 4.2.6...<snip>..
>
Update: I found info suggesting cstate limits of 0,1 & 6(default) are valid, maybe 3, but probably not 2.

I had to boot my CHI into the OEM. When I resumed linux, I omitted the cstate kernel argument, as a sanity check. My 4.2.6 froze within 5 minutes (browsing internet eagle cam). Otherwise, still no freezes when I set intel_idle.max_cstate=1. (~10 days so far)

It looks like I can reproduce one type of freeze readily, so if y'all have [baytrail cstate management] 4.2.x patches to beta test, let me know. I can also test 4.1.1x or 4.3.x patches but those freeze rates are less "dependable." I won't test 4.4-rcx until wifi (or USB wifi dongle) starts working again in the stock kernel.

Bug Watch Updater (bug-watch-updater) on 2015-12-02

Changed in xserver-xorg-video-intel:
importance:	Medium → High

Revision history for this message

In freedesktop.org Bugzilla #88012, Chris Rainey (ckrzen) wrote on 2015-12-03:

#156

Confirming that "intel_idle.max_cstate=1" has solved my complete freeze issues on Bay Trail running Linux 4.1.13(Slackware64-current(pre-4.2) formerly running Ubuntu 15.04/15.10 with stock kernels).

Thanx for all the hard-work and long-efforts to see this through!

Revision history for this message

In freedesktop.org Bugzilla #88012, Martin Wallin (guzzard) wrote on 2015-12-04:

#157

I can also confirm that "intel_idle.max_cstate=1" has solved my complete freeze issues on Bay Trail (Celeron J1900) running Linux 4.2.5 (Arch Linux).

Before I got complete freeze when playing video using Kodi or VLC, browsing using Chrome etc. Freeze happened randomly, sometimes within 5 minutes of boot, other the computer would be stable for hours.

With "intel_idle.max_cstate=1" the computer has been stable for more than two days straight now playing videos, music, browsing using Chrome, playing some games etc.

Thanks John for the tip!

Revision history for this message

In freedesktop.org Bugzilla #88012, ladiko (ladiko) wrote on 2015-12-05:

#158

I tried all ubuntu 14.04 LTS kernels from 3.13 over 3.16, 3.19 to 4.2 and got freezes with all of them except for 3.13. All which produced freezes have been tried with all mentioned kernel parameters and verified it with cat /proc/cmdline. Kernel 4.2 + intel_idle.max_cstate=1 froze within 1 day.

We are running almost 200 machines with a identical setup of ubuntu 14.04 + xfce4 + chromium + html5-kiosk web application which includes an ogm video which is played when idle and otherwise some hardware accelerated html5 animations. 50 of the machines were supported by an Celeron J1900, the remains are equiped with older Core 2 Duo / Pentium Dual-Core or Celeron 847 and ~20 AMD E1-2100 or A4-5000. The most stable kernel for us is the default Ubuntu 14.04 kernel 3.13. We're going to buy AMD Kabinis as we dont have any issues there except the higher TDP and higher temperatures in a complete passively cooled system.

Revision history for this message

In freedesktop.org Bugzilla #88012, Jbmacbrodie-m (jbmacbrodie-m) wrote on 2015-12-06:

#159

I was surprised to experience a freeze while running Android_x86 4.4-rc3 on my 2 in 1 laptop. After digging a bit - I found that the android_x86 runs on a custom linux-4.0.8. There wasn't a cstate argument in the command line. Too soon to know if it will help, but I no longer get the "unfortunately," my app "has stopped running" warning when I try to launch an app with wifi off.

As ladiko points out, it is curious that AMD machines seem to be exempt from these freezes. I have a dual boot AMD laptop mainly running Mint (linux 3.16.0-38-generic) for about 6 months. The only problems I've had with it were related to the old hard drive starting to fail. The kernel might be too old to freeze, though.

Revision history for this message

In freedesktop.org Bugzilla #88012, Fritsch-b (fritsch-b) wrote on 2015-12-06:

#160

This bug has nothing to do with AMD machines ... that's just noise. It's still the same for everyone. Forcing the kernel to max cstate 1 or setting that via the bios solves the issue reliable.

We have some good experience with: https://github.com/fritsch/OpenELEC.tv/blob/jarvis-egl/packages/linux/patches/4.3/linux-999-i915-use-legacy-turbo.patch

Besides that - this bug got really, really silent concerning fixes.

Revision history for this message

In freedesktop.org Bugzilla #88012, cedric.fazentieux (cedric-fazentieux) wrote on 2015-12-06:

#161

I've got the pentium n3540 on my asus laptop. I made fresh install this afternoon of ubuntu daily build (16.04).And it use kernel 4.3.0-2. No freeze at this time after one afternoon lighten. I listen music with rhythmbox and navigate on network.

Revision history for this message

In freedesktop.org Bugzilla #88012, Jbmacbrodie-m (jbmacbrodie-m) wrote on 2015-12-06:

#162

My apologies Mr. Frühberger , I see that I've once again re-discovered an already existing work around. In the first post for this bug, you revealed the cstate workaround, almost a year ago.

I've tried your patch on my freeze prone 4.2.6. It did last longer (25 minutes vs. 5 vs.) The patch looks valid all the way back to 3.18, the oldest project directory I have. I suspect on my 4.2.5 kernel, the patch would appear to be freeze-less.

Revision history for this message

In freedesktop.org Bugzilla #88012, Daniel-ffwll (daniel-ffwll) wrote on 2015-12-08:

#163

(In reply to Chris Rainey from comment #119)
> Confirming that "intel_idle.max_cstate=1" has solved my complete freeze
> issues on Bay Trail running Linux 4.1.13(Slackware64-current(pre-4.2)
> formerly running Ubuntu 15.04/15.10 with stock kernels).
>
> Thanx for all the hard-work and long-efforts to see this through!

Hm, sounds like after over a year of random walking multiple people have nailed this to cpu cstates, and the gpu driver changing behaviour slightly was just the canary in the coal mine here.

I tried to read through all comments here (gosh is there a lot of that) and didn't find anything to contradict that.

Given that I filed a new bug report on bugzilla.kernel.org:

https://bugzilla.kernel.org/show_bug.cgi?id=109051

Everyone please jump over there to that bug and fill in with your details/summary.

Thanks, Daniel

Bug Watch Updater (bug-watch-updater) on 2015-12-09

Changed in xserver-xorg-video-intel:
status:	Confirmed → Unknown

Revision history for this message

In freedesktop.org Bugzilla #88012, Mika-kuoppala (mika-kuoppala) wrote on 2015-12-17:

#164

Created attachment 120563
drm/i915/vlv: Take forcewake on media engine writes

Revision history for this message

In freedesktop.org Bugzilla #88012, Luka-karinja (luka-karinja) wrote on 2015-12-17:

#165

(In reply to Mika Kuoppala from comment #127)
> Created attachment 120563 [details] [review]
> drm/i915/vlv: Take forcewake on media engine writes

what kernel version should be used? tried aplying to 4.4rc5 and 4.3.3 with build errors

Revision history for this message

In freedesktop.org Bugzilla #88012, Mika-kuoppala (mika-kuoppala) wrote on 2015-12-18:

#166

Created attachment 120584
drm/i915/vlv: [V4.3 backport] Take forcewake on media engine writes

Revision history for this message

In freedesktop.org Bugzilla #88012, Jbmacbrodie-m (jbmacbrodie-m) wrote on 2015-12-18:

#167

(In reply to Mika Kuoppala from comment #129)
> Created attachment 120584 [details] [review]
> drm/i915/vlv: [V4.3 backport] Take forcewake on media engine writes

Thanks for the backport. Without cstate arg, I had a freeze within a few minutes. With cstate arg and patch no problems. The justification for the patch seems quite reasonable, it just doesn't affect freezing on my setup (ASUS T100-CHI Mint17.2/Cinnamon). I'll try the patch with other kernels for Mint and Manjaro.

Revision history for this message

penalvch (penalvch) wrote on 2016-03-05:

#168

Ubuntu1988, thank you for reporting this and helping make Ubuntu better.

As per https://wiki.ubuntu.com/Releases, Ubuntu 15.04 is EOL as of February 4, 2016.

Is this reproducible in a supported release?

Changed in xserver-xorg-video-intel (Ubuntu):
importance:	Critical → Medium
status:	Triaged → Incomplete

Revision history for this message

In freedesktop.org Bugzilla #88012, Ronnie Burgos (lavero.burgos) wrote on 2016-03-25:

#169

Hello, I've been having this same issue of full system hang/freeze in my Asus Chromebox (Haswell) since I got it.
I've tried multiple xbuntu distros, kodibuntu and OpenElec and in all of them I always had system freezes, mostly while watching videos in Kodi but also while in desktop or watching videos in browser (YouTube, Netflix).
Everytime I've had to go back to Windows, no problem there, right now booting Win 10 off external HDD and GalliumOS (Based on Ubuntu 15.04 with default kernel) from internal SSD.

I too can't believe why this bug hasn't been fixed yet and honestly I don't understand what is the final fix/workaround for this bug.
Some people claim the cstate arg work but for others don't work.

Can someone please provide me a link to latest patched and working kernel version so I can test. I read all comments but its very confusing, there is no clear resolution here.

T.I.A

Revision history for this message

In freedesktop.org Bugzilla #88012, Ronnie Burgos (lavero.burgos) wrote on 2016-03-25:

#170

Freeze while watching video in YouTube, video freezes but audio is in a loop. Total system hang, force reboot necessary.

https://youtu.be/uSXXRf9t1E0

Revision history for this message

In freedesktop.org Bugzilla #88012, Jbmacbrodie-m (jbmacbrodie-m) wrote on 2016-03-25:

#171

(In reply to Veronica from comment #131)
> I too can't believe why this bug hasn't been fixed yet and honestly I don't
> understand what is the final fix/workaround for this bug.
> Some people claim the cstate arg work but for others don't work.
>
> Can someone please provide me a link to latest patched and working kernel
> version so I can test. I read all comments but its very confusing, there is
> no clear resolution here.
>
> T.I.A

The bug has been moved (but not fixed) to https://bugzilla.kernel.org/show_bug.cgi?id=109051 Over 200 additional comments, last 40 have some new ideas.

cstate works for a many, but not all.

Revision history for this message

In freedesktop.org Bugzilla #88012, Ronnie Burgos (lavero.burgos) wrote on 2016-03-25:

#172

(In reply to John from comment #133)
> (In reply to Veronica from comment #131)
> > I too can't believe why this bug hasn't been fixed yet and honestly I don't
> > understand what is the final fix/workaround for this bug.
> > Some people claim the cstate arg work but for others don't work.
> >
> > Can someone please provide me a link to latest patched and working kernel
> > version so I can test. I read all comments but its very confusing, there is
> > no clear resolution here.
> >
> > T.I.A
>
> The bug has been moved (but not fixed) to
> https://bugzilla.kernel.org/show_bug.cgi?id=109051 Over 200 additional
> comments, last 40 have some new ideas.
>
> cstate works for a many, but not all.

Thank you for that link, reading it and will report there after testing in my Chromebox.

Revision history for this message

In freedesktop.org Bugzilla #88012, Jani-nikula (jani-nikula) wrote on 2016-03-29:

#173

(In reply to Daniel Vetter from comment #126)
> (In reply to Chris Rainey from comment #119)
> > Confirming that "intel_idle.max_cstate=1" has solved my complete freeze
> > issues on Bay Trail running Linux 4.1.13(Slackware64-current(pre-4.2)
> > formerly running Ubuntu 15.04/15.10 with stock kernels).
> >
> > Thanx for all the hard-work and long-efforts to see this through!
>
> Hm, sounds like after over a year of random walking multiple people have
> nailed this to cpu cstates, and the gpu driver changing behaviour slightly
> was just the canary in the coal mine here.
>
> I tried to read through all comments here (gosh is there a lot of that) and
> didn't find anything to contradict that.
>
> Given that I filed a new bug report on bugzilla.kernel.org:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=109051
>
> Everyone please jump over there to that bug and fill in with your
> details/summary.
>
> Thanks, Daniel

RESOLVED MOVED again.

Revision history for this message

Sergio Fernández (wikier) wrote on 2016-06-02:

#174

Any news on this bug?

I suffered it on Willy, so I expected it to be fixed in Xenial LTS, so it's a bit disappointing :-/

Affects		Status	Importance	Assigned to
	xf86-video-intel	Unknown	High	freedesktop-bugs #88012
	xserver-xorg-video-intel (Ubuntu)	Incomplete	Medium	Unassigned
Nominated for Vivid by Alberto Salvia Novella

Ubuntu
xserver-xorg-video-intel package

8086:0f31 Xubuntu freeze once a day

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntuxserver-xorg-video-intel package

8086:0f31 Xubuntu freeze once a day

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
xserver-xorg-video-intel package