[i915] GPU hangs on Haswell (Acer 720p)

Bug #1904293 reported by Ferry Toth
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Fedora)
Invalid
High
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Since linux kernel 5.7 Haswell GPU's hang after logging in with sddm. This affects Groovy (linux 5.8) both with Ubuntu and Kubuntu.

A work around is to boot with and earlier (5.6).

The issue is known as:
https://gitlab.freedesktop.org/drm/intel/-/issues/2413

and probable duplicates:
https://gitlab.freedesktop.org/drm/intel/-/issues/2584
https://gitlab.freedesktop.org/drm/intel/-/issues/1805

Apparently the problem is resolved by:
https://patchwork.freedesktop.org/patch/395580/?series=82783&rev=1

Please consider backporting this patch.
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu50.1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: ferry 1098 F.... pulseaudio
 /dev/snd/controlC0: ferry 1102 F.... pipewire-media-
 /dev/snd/seq: ferry 1097 F.... pipewire
CasperMD5CheckResult: skip
CurrentDesktop: KDE
DistroRelease: Ubuntu 20.10
InstallationDate: Installed on 2017-07-13 (1220 days ago)
InstallationMedia: Kubuntu 17.04 "Zesty Zapus" - Release amd64 (20170412)
Lsusb:
 Bus 003 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 002 Device 003: ID 0489:e056 Foxconn / Hon Hai
 Bus 002 Device 002: ID 1bcf:2c67 Sunplus Innovation Technology Inc. HD WebCam
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Acer Peppy
Package: linux (not installed)
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/@boot/vmlinuz-5.6.0-1032-oem root=UUID=17d2cd1d-cc37-446d-ac0b-933def63c867 ro rootflags=subvol=@ quiet splash tpm_tis.force=1 tpm_tis.interrupts=0 modprobe.blacklist=ehci_hcd,ehci-pci mitigations=off vt.handoff=7
ProcVersionSignature: Ubuntu 5.6.0-1032.33-oem 5.6.19
RelatedPackageVersions:
 linux-restricted-modules-5.6.0-1032-oem N/A
 linux-backports-modules-5.6.0-1032-oem N/A
 linux-firmware 1.190.1
Tags: groovy
Uname: Linux 5.6.0-1032-oem x86_64
UpgradeStatus: Upgraded to groovy on 2020-10-25 (20 days ago)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 03/02/2017
dmi.bios.vendor: coreboot
dmi.chassis.type: 3
dmi.chassis.vendor: Acer
dmi.modalias: dmi:bvncoreboot:bvr:bd03/02/2017:svnAcer:pnPeppy:pvr1.0:cvnAcer:ct3:cvr:
dmi.product.name: Peppy
dmi.product.version: 1.0
dmi.sys.vendor: Acer

Revision history for this message
In , rxguyrx (rxguyrx-redhat-bugs) wrote :

Created attachment 1694628
Tail of dmesg showing gpu hang

1. Please describe the problem:
Any time I boot into a kernel in the 5.7 series (RCs up to the current stable) I get a gpu hang (with sway and openbox). I can start the WM, and usually start a terminal, but starting anything else (firefox, caja, a gome, etc.) the gui hangs. I can kill the gui via a VT and restart it, but the problem continiues. To use my computer, I must boot into a kernel in the 5.6 series. I filed a bug upstream https://gitlab.freedesktop.org/drm/intel/-/issues/1805

2. What is the Version-Release number of the kernel:
5.7.0-1.fc33.x86_64 and all 5.7 release candidates before it

3. Did it work previously in Fedora? If so, what kernel version did the issue
   *first* appear? Old kernels are available for download at
   https://koji.fedoraproject.org/koji/packageinfo?packageID=8 :
The first kernel 5.7 RC I tried

4. Can you reproduce this issue? If so, please provide the steps to reproduce
   the issue below:
Always with a 5.7 kernel as illustarted in #1 above.

5. Does this problem occur with the latest Rawhide kernel? To install the
   Rawhide kernel, run ``sudo dnf install fedora-repos-rawhide`` followed by
   ``sudo dnf update --enablerepo=rawhide kernel``:
Yes as stated above.

6. Are you running any modules that not shipped with directly Fedora's kernel?:
No

7. Please attach the kernel logs. You can get the complete kernel log
   for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
   issue occurred on a previous boot, use the journalctl ``-b`` flag.

Revision history for this message
In , rxguyrx (rxguyrx-redhat-bugs) wrote :

Created attachment 1694629
GPU Crash Dump

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

Identical issue here, with all kernel ≥ 5.7 :
Fedora 32 kernel update ;
And Vanilla kernel.

The result is unable to perform kernel updates on my machine (Acer Aspire C720P with Coreboot and Fedora since 6 years) :-(
(same issue on xorg or wayland)

i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:8edcfc7b

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

Created attachment 1701790
sys class drm card0 error

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

Created attachment 1701791
sys class drm card0 error on rc3 vanilla

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

Comment on attachment 1701791
sys class drm card0 error on rc3 vanilla

On rc5 vanilla (sorry for typo, not rc3 but this attachment is rc5 drm dump)
same issue on all fedora kernel stable update, rawhide, and all ≥5.7 rc from kernel.org : all tested (and trying many workarounds)

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

i915.enable_rc6=0 : doesnt seems to be honored anymore ( systool -m i915 -v no report about, and /sys/class/drm/card0/power/rc6_enable always return 1 : maybe/certainly i am doing thing wrong)
i915.enable_guc=0 : doesnt seems to have any effect
i915.enable_hangcheck=0 : completely freeze

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

Created attachment 1701804
dmesg with lots of information (+boot context +i915 context) with disable_power_well=0

Hope this helps upstream
(i am at your disposal for the tests you require)

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

Created attachment 1701810
with 5.7.8 fedora kernel

dmesg i915 +drm/card0/error +i915 options
on stock fedora 32 kernel

(with or without with boot option intel_iommu=on, same issue)

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :
Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

an interesting discussion on the same (or near ?) problem : https://bbs.archlinux.org/viewtopic.php?id=256520&p=2
unfortunately despite the work of Dario and Loqs around the Clear kernel options, issue the same here (with stock 5.7 Fedora Kernel or Vanilla 5.8-rc5) with the intel_iommu=on,igf_off boot option.

I think I have explored all the possibilities that were availables to me, from 5.7 fedora to 5.8 vanilla with many tries and workarounds each time. I hope I don't have to throw this (beautiful) computer in the trash, because it works perfectly (with 6 hours of battery life) and it is not obsolete !

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

Created attachment 1702001
gpu logs on 5.7.9-200 Fedora kernel with default boot options

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

Created attachment 1712424
(for memory : on vanilla kernel) 5.8.3

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

Created attachment 1712425
(vanilla kernel) 5.8.3 without initrd

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

Created attachment 1712426
(vanilla kernel) 5.8.3 + boot options i915_enable_dc=0

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

Created attachment 1712427
(vanilla kernel) 5.8.3 + boot options i915_enable_dc=0 + cstate=1

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

Created attachment 1712428
(vanilla) now without any i915 firmwares

Same bug without any i915 firmwares
(<5.7 works smootlhy without fw, but >5.7 gpu hang again and again, here on a vanilla kernel, same on fedora stable same on fedora rawhide kernel)

Revision history for this message
In , emrecio (emrecio-redhat-bugs) wrote :

Same here, remaining on 5.6 until this bug is fixed in the kernel. Latest attempt was 5.8.10.

Adding more information from lspci:

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])
        Subsystem: Gigabyte Technology Co., Ltd Device d000
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 26
        Region 0: Memory at f7800000 (64-bit, non-prefetchable) [size=4M]
        Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Region 4: I/O ports at f000 [size=64]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee01004 Data: 4021
        Capabilities: [d0] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [a4] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel driver in use: i915
        Kernel modules: i915

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

Created attachment 1718750
5.9.0-rc7 gpu hangs reports

Revision history for this message
In , vitti570 (vitti570-redhat-bugs) wrote :

Emilio, I have your same experience on a desktop with same
VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)

on an ASROCK H81M-DGS motherboard.
And I am stuck with kernel 5.6.

Revision history for this message
In , ondrej.kolin (ondrej.kolin-redhat-bugs) wrote :

I am affected by this bug as well. Had to go back to 5.6.x Fedora 32, both Wayland and Xorg

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)

Related syslog entry says (https://pastebin.com/zh8x74R6, can provide more information interested):
Oct 07 18:25:17 localhost.localdomain kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:8edcfc79, in gnome-shell [1672]
Oct 07 18:25:17 localhost.localdomain kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Oct 07 18:25:17 localhost.localdomain kernel: i915 0000:00:02.0: [drm] gnome-shell[1672] context reset due to GPU hang

Revision history for this message
In , bgmeyaemdy (bgmeyaemdy-redhat-bugs) wrote :

Me too. Various 5.6.* all fine. Various 5.7.* & 5.8.* all fail ..

kernel: i915 0000:00:02.0: GPU HANG: ecode 7:1:85ddfffd, in Xorg [9919]
kernel: i915 0000:00:02.0: Resetting chip for stopped heartbeat on rcs0
kernel: i915 0000:00:02.0: Xorg[9919] context reset due to GPU hang

lspci ..

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])
        DeviceName: Onboard IGD
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7851
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 30
        Region 0: Memory at f7800000 (64-bit, non-prefetchable) [size=4M]
        Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Region 4: I/O ports at f000 [size=64]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee02004 Data: 4026
        Capabilities: [d0] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [a4] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel driver in use: i915
        Kernel modules: i915

Revision history for this message
In , nigel (nigel-redhat-bugs-1) wrote :

Same problem on my iMac.

00:02.0 VGA compatible controller: Intel Corporation Device 0d22 (rev 08) (prog-if 00 [VGA controller])
        Subsystem: Apple Inc. Device 0122
        Flags: bus master, fast devsel, latency 0, IRQ 39
        Memory at 98000000 (64-bit, non-prefetchable) [size=4M]
        Memory at 90000000 (64-bit, prefetchable) [size=128M]
        I/O ports at 2000 [size=64]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [d0] Power Management version 2
        Capabilities: [a4] PCI Advanced Features
        Kernel driver in use: i915
        Kernel modules: i915

Revision history for this message
In , nigel (nigel-redhat-bugs-1) wrote :

I don't know if this helps, maybe eliminates some potential causes. It is WORKING on my laptop, which has different Intel graphics. At least some Intel graphics work :-)

Linux localhost.localdomain 5.8.16-300.fc33.x86_64 #1 SMP Mon Oct 19 13:18:33 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07) (prog-if 00 [VGA controller])
        DeviceName: Intel Kabylake UHD Graphics ULT GT2
        Subsystem: Hewlett-Packard Company Device 83fa
        Flags: bus master, fast devsel, latency 0, IRQ 128
        Memory at b0000000 (64-bit, non-prefetchable) [size=16M]
        Memory at a0000000 (64-bit, prefetchable) [size=256M]
        I/O ports at 4000 [size=64]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: [40] Vendor Specific Information: Len=0c <?>
        Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [d0] Power Management version 2
        Capabilities: [100] Process Address Space ID (PASID)
        Capabilities: [200] Address Translation Service (ATS)
        Capabilities: [300] Page Request Interface (PRI)
        Kernel driver in use: i915
        Kernel modules: i915

Revision history for this message
In , vitti570 (vitti570-redhat-bugs) wrote :

Driver i915 1.6.0 works fine with build 20200114.
After that it fails, as in builds 20200313 and 20200515.
What changed from 20200114 to 20200313?

Revision history for this message
In , m.schrage (m.schrage-redhat-bugs) wrote :

Hi folks,

I suffer myself with this issue also a long time. Stil on a 5.6 kernel.

Saw this issue upstream: https://gitlab.freedesktop.org/drm/intel/-/issues/2413
which is resolved with this patch: https://patchwork.freedesktop.org/patch/395580/?series=82783&rev=1

However I am not in the situation right now to test it myself on Fedora.

Maybe it is related with your can help you.

Revision history for this message
In , vitti570 (vitti570-redhat-bugs) wrote :

I believe the culprit is in a modification of driver i915 version 1.6.0

Build 20200114 is fine, and I am using it on kernel 5.6

Build 20200313 and successive do not work, from kernel 5.7

Revision history for this message
In , nigel (nigel-redhat-bugs-1) wrote :

I have built both F32 & F33 versions of the 5.9.8 kernel with the above patch https://patchwork.freedesktop.org/patch/395580/?series=82783&rev=1

Both are running fine (so far). At least they both boot, which the earlier ones did not. It seems that the patch has fixed my problems on my iMac.

Ferry Toth (ftoth)
tags: added: groovy
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1904293

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Fedora):
importance: Unknown → High
status: Unknown → Confirmed
Revision history for this message
Ferry Toth (ftoth) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Ferry Toth (ftoth) wrote : CRDA.txt

apport information

Revision history for this message
Ferry Toth (ftoth) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Ferry Toth (ftoth) wrote : IwConfig.txt

apport information

Revision history for this message
Ferry Toth (ftoth) wrote : Lspci.txt

apport information

Revision history for this message
Ferry Toth (ftoth) wrote : Lspci-vt.txt

apport information

Revision history for this message
Ferry Toth (ftoth) wrote : Lsusb-t.txt

apport information

Revision history for this message
Ferry Toth (ftoth) wrote : Lsusb-v.txt

apport information

Revision history for this message
Ferry Toth (ftoth) wrote : PaInfo.txt

apport information

Revision history for this message
Ferry Toth (ftoth) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Ferry Toth (ftoth) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Ferry Toth (ftoth) wrote : ProcEnviron.txt

apport information

Revision history for this message
Ferry Toth (ftoth) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Ferry Toth (ftoth) wrote : ProcModules.txt

apport information

Revision history for this message
Ferry Toth (ftoth) wrote : PulseList.txt

apport information

Revision history for this message
Ferry Toth (ftoth) wrote : RfKill.txt

apport information

Revision history for this message
Ferry Toth (ftoth) wrote : UdevDb.txt

apport information

Revision history for this message
Ferry Toth (ftoth) wrote : WifiSyslog.txt

apport information

Revision history for this message
Ferry Toth (ftoth) wrote : acpidump.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Ferry Toth (ftoth) wrote :

Obviously above apport collected with 5.6 kernel as I can't login with 5.8.

Revision history for this message
In , m.schrage (m.schrage-redhat-bugs) wrote :

Last sunday I dit the same thing (built the F33 version 5.9.8-200 with the patch https://patchwork.freedesktop.org/patch/395580/?series=82783&rev=1) on my hardware:

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])
 Subsystem: ASRock Incorporation Device 0402
 Flags: bus master, fast devsel, latency 0, IRQ 29
 Memory at f0000000 (64-bit, non-prefetchable) [size=4M]
 Memory at e0000000 (64-bit, prefetchable) [size=256M]
 I/O ports at f000 [size=64]
 Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
 Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
 Capabilities: [d0] Power Management version 2
 Capabilities: [a4] PCI Advanced Features
 Kernel driver in use: i915
 Kernel modules: i915

Can confirm that this patch fixes the GPU HANG issues on my system.

Revision history for this message
In , nerijus (nerijus-redhat-bugs-1) wrote :

Could please anyone share the builds?

Revision history for this message
In , m.schrage (m.schrage-redhat-bugs) wrote :

(In reply to Nerijus Baliūnas from comment #29)
> Could please anyone share the builds?

My build (without the debug rpms) is here: http://www.mediafire.com/folder/hrtjd0b1logi8/x86_64

Also note that the version numbering of mine is not accurate. It is the same version number as the offical F33.

Revision history for this message
In , nerijus (nerijus-redhat-bugs-1) wrote :

Unfortunately your build did not help here:
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07) (prog-if 00 [VGA controller])
 Subsystem: Lenovo Device 225c
 Flags: bus master, fast devsel, latency 0, IRQ 142
 Memory at 2ffa000000 (64-bit, non-prefetchable) [size=16M]
 Memory at b0000000 (64-bit, prefetchable) [size=256M]
 I/O ports at e000 [size=64]
 Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
 Capabilities: [40] Vendor Specific Information: Len=0c <?>
 Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
 Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
 Capabilities: [d0] Power Management version 2
 Capabilities: [100] Process Address Space ID (PASID)
 Capabilities: [200] Address Translation Service (ATS)
 Capabilities: [300] Page Request Interface (PRI)
 Kernel driver in use: i915
 Kernel modules: i915

/sys/class/drm/card0/error:
GPU HANG: ecode 9:1:85dffffb, in Xwayland [2587]
Kernel: 5.9.8-200.fc33.x86_64 x86_64
Driver: 20200715
Time: 1605790670 s 675207 us
Boottime: 3541 s 672470 us
Uptime: 3539 s 87386 us
Capture: 4298208768 jiffies; 615242 ms ago
Active process (on ring rcs0): Xwayland [2587]
Reset count: 0
Suspend count: 0
Platform: KABYLAKE
Subplatform: 0x0
PCI ID: 0x5917
PCI Revision: 0x07
PCI Subsystem: 17aa:225c
IOMMU enabled?: 1
DMC loaded: yes
DMC fw version: 1.4
RPM wakelock: yes
PM suspended: no
GT awake: yes

Revision history for this message
In , vitti570 (vitti570-redhat-bugs) wrote :

(In reply to Menno from comment #30)

Menno, your build works on my PC but my VGA is same as yours:

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

Created attachment 1731512
5.9.9 + patch screenshot

5.9.9 + patch screenshot : no more error collected / huge thanks to Chris Wilson (and Menno for pointing here !)

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

(In reply to tankey from comment #33)
> Created attachment 1731512 [details]
>
> 5.9.9 + patch screenshot : no more error collected / huge thanks to Chris
> Wilson (and Menno for pointing here !)

text :
[tankey@localhost ~]$ cat /etc/fedora-release
Fedora release 32 (Thirty Two)
[tankey@localhost ~]$
[tankey@localhost ~]$ free -m
              total used free shared buff/cache available
Mem: 1801 371 911 74 518 1152
Swap: 3930 0 3930
[tankey@localhost ~]$
[tankey@localhost ~]$ systemd-analyze
Startup finished in 1.073s (kernel) + 4.198s (userspace) = 5.272s
graphical.target reached after 4.169s in userspace
[tankey@localhost ~]$
[tankey@localhost ~]$ grep "model name" /proc/cpuinfo |uniq
model name : Intel(R) Celeron(R) 2955U @ 1.40GHz
[tankey@localhost ~]$
[tankey@localhost ~]$ su -
Mot de passe :
[root@localhost ~]# cat /sys/class/drm/card0/error
No error state collected
[root@localhost ~]#
[root@localhost ~]# uname -rv
5.9.9-BZ1843274 #1 SMP Fri Nov 20 21:34:42 CET 2020
[root@localhost ~]#
[root@localhost ~]# date
sam. 21 nov. 2020 00:24:37 CET

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

in case of someone want a build for c720p : https://equilibriste.org/index.php/s/txdaszMDyjnC5Dk
please read the readme.txt

Revision history for this message
In , vitti570 (vitti570-redhat-bugs) wrote :

Comment on attachment 1731512
5.9.9 + patch screenshot

Most important is /sbin/lspci|grep VGA

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

(In reply to Vittorio from comment #36)
> Comment on attachment 1731512 [details]
> 5.9.9 + patch screenshot
>
> Most important is /sbin/lspci|grep VGA

one again : Haswell-ULT Integrated Graphics Controller

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

Created attachment 1732801
[stock stable kernel up to date] sys class drm card0 error

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

Created attachment 1732802
[rawhide kernel] drm error

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

Created attachment 1732803
[rawhide] dmesg

Revision history for this message
In , tankey (tankey-redhat-bugs) wrote :

Created attachment 1732804
[rawhide] drm state

Revision history for this message
In , nigel (nigel-redhat-bugs-1) wrote :

I just updated to vanilla 5.9.10-200.fc33.x86_64 and the problem appears resolved. Seems the fix has made it into the kernel

Happy Thanksgiving!!

Revision history for this message
In , vitti570 (vitti570-redhat-bugs) wrote :

After installing kernel-5.9.10-200.fc33.x86_64 the issue is still there.

My VGA is

VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)

Revision history for this message
In , m.schrage (m.schrage-redhat-bugs) wrote :

Same here,

Nov 27 11:30:45 hoppie.home kernel: Linux version 5.9.10-200.fc33.x86_64 (<email address hidden>) (gcc (GCC) 10.2.1 20201016 (Red Hat 10.2.1-6), GNU ld version 2.35-14.fc33) #1 SMP Mon Nov 23 18:12:50 UTC 2020
Nov 27 11:31:52 hoppie.home kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:85ddfffd, in gnome-shell [4166]

lspci -v -s 00:02.0

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])
 Subsystem: ASRock Incorporation Device 0402
 Flags: bus master, fast devsel, latency 0, IRQ 29
 Memory at f0000000 (64-bit, non-prefetchable) [size=4M]
 Memory at e0000000 (64-bit, prefetchable) [size=256M]
 I/O ports at f000 [size=64]
 Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
 Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
 Capabilities: [d0] Power Management version 2
 Capabilities: [a4] PCI Advanced Features
 Kernel driver in use: i915
 Kernel modules: i915

Revision history for this message
In , RMuscaritolo (rmuscaritolo-redhat-bugs) wrote :

Same thing happening to me.

I had to load kernel 5.9.9-200.fc33.x86_64 because 5.9.10-200 was causing my system to hang.

I do not see a journal log for the previous boot.

$ lspci -v -s 0:02
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 515 (rev 07) (prog-if 00 [VGA controller])
        Subsystem: ASUSTeK Computer Inc. Device 1cfd
        Flags: bus master, fast devsel, latency 0, IRQ 125
        Memory at de000000 (64-bit, non-prefetchable) [size=16M]
        Memory at c0000000 (64-bit, prefetchable) [size=256M]
        I/O ports at f000 [size=64]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: i915
        Kernel modules: i915

$ sudo dmidecode -t SYSTEM
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.

Handle 0x0001, DMI type 1, 27 bytes
System Information
        Manufacturer: ASUSTeK COMPUTER INC.
        Product Name: UX305CA
        Version: 1.0
        Serial Number: REDACTED
        UUID: REDACTED
        Wake-up Type: Power Switch
        SKU Number: ASUS-NotebookSKU
        Family: UX

Handle 0x000C, DMI type 32, 20 bytes
System Boot Information
        Status: No errors detected

Revision history for this message
In , RMuscaritolo (rmuscaritolo-redhat-bugs) wrote :

FYI, I just installed 5.9.11-200 and I am no longer affected by this bug.

$ uname -rv
5.9.11-200.fc33.x86_64 #1 SMP Tue Nov 24 18:18:01 UTC 2020

Revision history for this message
In , leon.naumenko (leon.naumenko-redhat-bugs) wrote :

Processor Intel G3260, Integrated video driver only, Fedora 32 (KDE Plasma). Update to Kernel 5.9.11-100.fc32. After i enter my password in login manager (SDDM) i only see a black screen with cursor.

Dec 05 15:05:24 localhost.localdomain kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:85ddfffd, in plasmashell [1477]
Dec 05 15:05:24 localhost.localdomain kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Dec 05 15:05:24 localhost.localdomain kernel: i915 0000:00:02.0: [drm] plasmashell[1477] context reset due to GPU hang
Dec 05 15:05:24 localhost.localdomain kernel: [drm:intel_gt_verify_workarounds [i915]] *ERROR* GT workaround lost on init! (e184=0/0, expected 2000200)

Revision history for this message
In , ondrej.kolin (ondrej.kolin-redhat-bugs) wrote :

Onfortunatly, this bug seems present on kernel-core-5.9.14-100.fc32.x86_64 (more specs are in previous comment), I have to freeze the kernel updates again.

Revision history for this message
In , emrecio (emrecio-redhat-bugs) wrote :

Thanks for all the support/help in this bug report. I've moved on to AMD Ryzen 5 3400G with Radeon Vega Graphics

Revision history for this message
In , dcesari (dcesari-redhat-bugs) wrote :

This is just to inform that the same bug just appeared on CentOS 8 when upgrading from kernel 4.18.0-193.28.1.el8_2.x86_64 to 4.18.0-240.1.1.el8_3.x86_64, dmesg in the two cases says respectively:

 - Initialized i915 1.6.0 20190619
 - Initialized i915 1.6.0 20200114

Why was this buggy driver backported? Need to stick with the old kernel.

Revision history for this message
Ferry Toth (ftoth) wrote :

Still no improvement with ubuntu ppa kernel 5.10.3

Revision history for this message
In , rxguyrx (rxguyrx-redhat-bugs) wrote :

So, this bug has been getting some press lately. I finally built the rawhide kernel with the proposed patch ( https://lists.freedesktop.org/archives/intel-gfx/2021-January/257559.html ).

I put it in my COPR if anyone is interested. I'm using as I type this. https://copr.fedorainfracloud.org/coprs/dturner/TOS/build/1874571/

Thanks everyone!

Revision history for this message
In , rxguyrx (rxguyrx-redhat-bugs) wrote :

The fix for this has been included in the kernel-5.11 rc4, which is now available on koji. I can confirm it works for me (Acer C720P - Haswell).

Revision history for this message
Ferry Toth (ftoth) wrote :

Good news, I am running 5.11.0-5c5 from ubuntu kernel ppa and the problem seems to be resolved .

Revision history for this message
In , domfe (domfe-redhat-bugs) wrote :

Using kernel 5.11.0-0.rc7.149.fc34.x86_64 on Intel G3420.
So far, so good.

thanks

Revision history for this message
In , hdegoede (hdegoede-redhat-bugs) wrote :

There have also been reported some issues with the i915 mitigation stuff in bug 1925346

Testing has shown that to fully fix the issues with the new i915 mitigation stuff on Haswell the following commits are necessary on top of 5.11 :

e627d5923cae ("drm/i915/gt: One more flush for Baytrail clear residuals")
d30bbd62b1bf ("drm/i915/gt: Flush before changing register state")
1914911f4aa0 ("drm/i915/gt: Correct surface base address for renderclear")

Revision history for this message
In , steeve.mccauley (steeve.mccauley-redhat-bugs) wrote :

Similar issue here. I've also tried 5.11.2 but was still seeing significant problems. Adding i915.mitigations=off fixed the problem.

00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])
 DeviceName: Onboard IGD
 Subsystem: Dell Device 05a5
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
 Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0
 Interrupt: pin A routed to IRQ 29
 Region 0: Memory at f7800000 (64-bit, non-prefetchable) [size=4M]
 Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
 Region 4: I/O ports at f000 [size=64]
 Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
 Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
  Address: fee00018 Data: 0000
 Capabilities: [d0] Power Management version 2
  Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
  Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
 Capabilities: [a4] PCI Advanced Features
  AFCap: TP+ FLR+
  AFCtrl: FLR-
  AFStatus: TP-
 Kernel driver in use: i915
 Kernel modules: i915

Installing 5.10.19-200 from updates-testing seems to have fixed the problem for me,

sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-79396b21b2

Revision history for this message
In , hdegoede (hdegoede-redhat-bugs) wrote :

(In reply to Steeve McCauley from comment #55)
> Similar issue here. I've also tried 5.11.2 but was still seeing significant problems

Hmm, where did you get your 5.11.2 build from ?

The 5.11.2-300 from Fedora: https://koji.fedoraproject.org/koji/buildinfo?buildID=1715703

Has the same fixes that were added to 5.10.19-200, so if you are still seeing issues with that specific build (we added some fixes as downstream patches), then I need to go over the 5.10.y changelog to see if there are somehow fixes there which are not in 5.11.2 .

Revision history for this message
In , steeve.mccauley (steeve.mccauley-redhat-bugs) wrote :

It was from kernel.org.

No problems with the fedora 5.10-19-200 so far, so it's looking good.

Sorry for the confusion.

Revision history for this message
In , hdegoede (hdegoede-redhat-bugs) wrote :

(In reply to Steeve McCauley from comment #57)
> It was from kernel.org.
>
> No problems with the fedora 5.10-19-200 so far, so it's looking good.
>
> Sorry for the confusion.

No problem. It would be good if you can give the Fedora 5.11.2 kernel a try, it should work, but if it does not now would be a good time to find out (before we start pushing 5.11.y kernels to the updates repo), you can grab it here:

https://koji.fedoraproject.org/koji/buildinfo?buildID=1715703

Generic install instructions for installing a kernel from koji (the Fedora buildsystem) are here:

https://fedorapeople.org/~jwrdegoede/kernel-test-instructions.txt

Note since this is an official build, there should be no need to disable secure-boot in this case.

Revision history for this message
In , steeve.mccauley (steeve.mccauley-redhat-bugs) wrote :

No problems with it so far, seems stable and gpu isn't hanging (as 5.10.19-200).

$ sudo rpm -ivh --oldpackage kernel-core-5.11.2-300.fc34.x86_64.rpm kernel-modules-5.11.2-300.fc34.x86_64.rpm

reboot

$ cat /proc/cmdline
BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.11.2-300.fc34.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet

Revision history for this message
In , steeve.mccauley (steeve.mccauley-redhat-bugs) wrote :

Just as an aside, my kernel build time when from 44+ minutes to 8 minutes with these fixes!

$ grep "build time" *.out
build_20210224.out:00:46:13 linux-5.11.1> INFO 2021-02-24 18:12:51> Kernel build time 46m 13s - sudo wait 0s
build_20210227.out:00:44:40 linux-5.11.2> INFO 2021-02-27 09:24:17> Kernel build time 44m 40s - sudo wait 0s
build_20210228.out:00:08:24 linux-5.11.2> INFO 2021-02-28 13:32:35> Kernel build time 8m 24s - sudo wait 0s

this was after doing "make distclean"

And the CPU fan doesn't go insane during the rebuild.

Revision history for this message
In , redhat (redhat-redhat-bugs) wrote :

I haven't seen this happening in two years. Does this really apply to rawhide?

Revision history for this message
In , rxguyrx (rxguyrx-redhat-bugs) wrote :

(In reply to Christian Kujau from comment #61)
> I haven't seen this happening in two years. Does this really apply to
> rawhide?

Yes. This seems to be fixed, for sure. This bug should probably be closed.

Thanks, everyone!

Changed in linux (Fedora):
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.