Intel GPU Hangs : random screen freezing w/ Ubuntu 20.04 (Linux 5.4) i915_active_acquire

Bug #1868551 reported by Guy Baconniere on 2020-03-23
52
This bug affects 9 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Focal
High
Unassigned

Bug Description

SRU Justification:

[Impact]
Users are experiencing a frequent NULL pointer dereference crash in i915_active_acquire when using kms, which is used by default.

[Fix]
The fix is a cherry pick from upstream which was supposed to be backported to 5.4 by upstream, but was neglected. The fix has a subsequent Fixes patch to resolve some uninitialized pointer usage.

[Test]
Verified by multiple bug reporters.

[Regression Potential]
Medium. Although there are a lot of lines added, they're mostly boilerplate, and this patch is confirmed by multiple users to fix a crash.
---
uname -a
Linux xps 5.4.0-14-generic #17-Ubuntu SMP Thu Feb 6 22:47:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu Focal Fossa (development branch)
Release: 20.04
Codename: focal

[ 2556.956079] BUG: kernel NULL pointer dereference, address: 0000000000000040
[ 2556.956084] #PF: supervisor read access in kernel mode
[ 2556.956084] #PF: error_code(0x0000) - not-present page
[ 2556.956085] PGD 0 P4D 0
[ 2556.956088] Oops: 0000 [#1] SMP NOPTI
[ 2556.956090] CPU: 2 PID: 1685 Comm: xfwm4 Not tainted 5.4.0-14-generic #17-Ubuntu
[ 2556.956092] Hardware name: Dell Inc. XPS 13 7390/0G2D0W, BIOS 1.2.0 10/03/2019
[ 2556.956161] RIP: 0010:i915_active_acquire+0xe/0x80 [i915]
[ 2556.956163] Code: 00 48 c7 c6 11 4d 6b c0 e8 af a1 d6 c7 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 <8b> 47 38 48 89 fb 85 c0 74 17 8d 50 01 f0 0f b1 53 38 75 f2 45 31
[ 2556.956164] RSP: 0018:ffffac17c13279c8 EFLAGS: 00010286
[ 2556.956165] RAX: 0000000000000000 RBX: ffff983831d3e480 RCX: 0000000000000000
[ 2556.956166] RDX: ffff983783475200 RSI: ffff983831d3e480 RDI: 0000000000000008
[ 2556.956167] RBP: ffffac17c13279e0 R08: 0000000000000000 R09: ffff98382d6b6520
[ 2556.956168] R10: 0000000000006cc0 R11: ffff983838b4db00 R12: ffff983783475200
[ 2556.956169] R13: 0000000000000008 R14: ffff983783475200 R15: ffff98382d6b6400
[ 2556.956170] FS: 00007f9031c28f00(0000) GS:ffff98383e500000(0000) knlGS:0000000000000000
[ 2556.956171] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2556.956172] CR2: 0000000000000040 CR3: 000000046eac6001 CR4: 00000000003606e0
[ 2556.956173] Call Trace:
[ 2556.956199] i915_active_ref+0x24/0x200 [i915]
[ 2556.956223] i915_vma_move_to_active+0x74/0xf0 [i915]
[ 2556.956245] eb_submit+0xff/0x440 [i915]
[ 2556.956267] i915_gem_do_execbuffer+0x88e/0xc20 [i915]
[ 2556.956271] ? sock_def_readable+0x40/0x70
[ 2556.956274] ? __kmalloc_node+0x205/0x320
[ 2556.956294] i915_gem_execbuffer2_ioctl+0x2c3/0x3d0 [i915]
[ 2556.956314] ? i915_gem_execbuffer_ioctl+0x2d0/0x2d0 [i915]
[ 2556.956330] drm_ioctl_kernel+0xae/0xf0 [drm]
[ 2556.956338] drm_ioctl+0x234/0x3d0 [drm]
[ 2556.956358] ? i915_gem_execbuffer_ioctl+0x2d0/0x2d0 [i915]
[ 2556.956361] ? vfs_writev+0xc3/0xf0
[ 2556.956363] do_vfs_ioctl+0x407/0x670
[ 2556.956365] ? fput+0x13/0x15
[ 2556.956367] ? __sys_recvmsg+0x88/0xa0
[ 2556.956369] ksys_ioctl+0x67/0x90
[ 2556.956371] __x64_sys_ioctl+0x1a/0x20
[ 2556.956373] do_syscall_64+0x57/0x190
[ 2556.956376] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2556.956377] RIP: 0033:0x7f9032b3f68b
[ 2556.956379] Code: 0f 1e fa 48 8b 05 05 28 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 27 0d 00 f7 d8 64 89 01 48
[ 2556.956380] RSP: 002b:00007ffee39a0078 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 2556.956381] RAX: ffffffffffffffda RBX: 000055a8abeb6e48 RCX: 00007f9032b3f68b
[ 2556.956382] RDX: 00007ffee39a0090 RSI: 0000000040406469 RDI: 000000000000000d
[ 2556.956382] RBP: 00007ffee39a0120 R08: 0000000000000001 R09: 0000000000000000
[ 2556.956383] R10: 00007ffee39a0140 R11: 0000000000000246 R12: 00007f9022a4f460
[ 2556.956384] R13: 0000000000000000 R14: 00007ffee39a0090 R15: 000000000000000d
[ 2556.956385] Modules linked in: ccm rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat typec_displayport iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables cmac nfnetlink algif_hash ip6table_filter ip6_tables iptable_filter algif_skcipher af_alg bpfilter bridge stp llc snd_sof_pci snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda snd_sof_intel_byt snd_sof_intel_ipc snd_sof snd_sof_xtensa_dsp snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_hda_codec_hdmi snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine bnep snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_intel_nhlt snd_hda_codec snd_hda_core snd_hwdep snd_pcm nls_iso8859_1 mei_hdcp intel_rapl_msr snd_seq_midi snd_seq_midi_event dell_laptop ledtrig_audio snd_rawmidi x86_pkg_temp_thermal intel_powerclamp coretemp joydev kvm_intel kvm cdc_ether intel_cstate intel_rapl_perf snd_seq usbnet serio_raw iwlmvm r8152
[ 2556.956409] wmi_bmof mii mac80211 dell_wmi dell_smbios dcdbas snd_seq_device uvcvideo intel_wmi_thunderbolt dell_wmi_descriptor videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 snd_timer libarc4 videobuf2_common snd videodev btusb input_leds hid_multitouch mc iwlwifi btrtl btbcm soundcore btintel bluetooth rtsx_pci_ms cfg80211 memstick ecdh_generic ecc mei_me mei processor_thermal_device intel_rapl_common ucsi_acpi intel_soc_dts_iosf typec_ucsi typec int3403_thermal int340x_thermal_zone mac_hid acpi_pad int3400_thermal acpi_tad acpi_thermal_rel intel_hid sparse_keymap sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 xfs btrfs xor zstd_compress raid6_pq libcrc32c dm_crypt uas usb_storage usbhid hid_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rtsx_pci_sdmmc i915 psmouse i2c_i801 nvme i2c_algo_bit drm_kms_helper aesni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops crypto_simd intel_lpss_pci nvme_core cryptd thunderbolt intel_lpss glue_helper rtsx_pci
[ 2556.956436] drm idma64 virt_dma wmi i2c_hid hid pinctrl_cannonlake pinctrl_intel video
[ 2556.956441] CR2: 0000000000000040
[ 2556.956443] ---[ end trace de83f1a5004a6b5f ]---
[ 2556.956467] RIP: 0010:i915_active_acquire+0xe/0x80 [i915]
[ 2556.956468] Code: 00 48 c7 c6 11 4d 6b c0 e8 af a1 d6 c7 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 <8b> 47 38 48 89 fb 85 c0 74 17 8d 50 01 f0 0f b1 53 38 75 f2 45 31
[ 2556.956469] RSP: 0018:ffffac17c13279c8 EFLAGS: 00010286
[ 2556.956470] RAX: 0000000000000000 RBX: ffff983831d3e480 RCX: 0000000000000000
[ 2556.956471] RDX: ffff983783475200 RSI: ffff983831d3e480 RDI: 0000000000000008
[ 2556.956472] RBP: ffffac17c13279e0 R08: 0000000000000000 R09: ffff98382d6b6520
[ 2556.956473] R10: 0000000000006cc0 R11: ffff983838b4db00 R12: ffff983783475200
[ 2556.956474] R13: 0000000000000008 R14: ffff983783475200 R15: ffff98382d6b6400
[ 2556.956475] FS: 00007f9031c28f00(0000) GS:ffff98383e500000(0000) knlGS:0000000000000000
[ 2556.956476] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2556.956477] CR2: 0000000000000040 CR3: 000000046eac6001 CR4: 00000000003606e0
[ 2726.251982] r8152 4-1.3.2:1.0 enx4865ee114b7b: carrier off
[ 2729.482616] r8152 4-1.3.2:1.0 enx4865ee114b7b: carrier on
[ 2795.952106] r8152 4-1.3.2:1.0 enx4865ee114b7b: carrier off
[ 2799.211692] r8152 4-1.3.2:1.0 enx4865ee114b7b: carrier on
[ 2801.199389] r8152 4-1.3.2:1.0 enx4865ee114b7b: carrier off
[ 2804.460009] r8152 4-1.3.2:1.0 enx4865ee114b7b: carrier on

ubuntu-bug linux

*** Collecting problem information

The collected information can be sent to the developers to improve the
application. This might take a few minutes.
.....

*** Problem in linux-image-5.4.0-14-generic

The problem cannot be reported:

This is not an official Ubuntu package. Please remove any third party package and try again.

Press any key to continue...

apt show linux-image-5.4.0-14-generic
Package: linux-image-5.4.0-14-generic
Version: 5.4.0-14.17
Built-Using: linux-5.4 (= 5.4.0-14.17)
Status: install ok installed
Priority: optional
Section: kernel
Source: linux-signed-5.4
Maintainer: Canonical Kernel Team <email address hidden>
Installed-Size: 11.6 MB
Provides: aufs-dkms, fuse-module, ivtv-modules, kvm-api-4, linux-image, redhat-cluster-modules, spl-dkms, spl-modules, virtualbox-guest-dkms, virtualbox-guest-modules, zfs-dkms, zfs-modules
Depends: kmod, linux-base (>= 4.5ubuntu1~16.04.1), linux-modules-5.4.0-14-generic
Recommends: grub-pc | grub-efi-amd64 | grub-efi-ia32 | grub | lilo, initramfs-tools | linux-initramfs-tool
Suggests: fdutils, linux-doc | linux-5.4-source-5.4.0, linux-5.4-tools, linux-headers-5.4.0-14-generic
Conflicts: linux-image-unsigned-5.4.0-14-generic
Download-Size: unknown
APT-Manual-Installed: no
APT-Sources: /var/lib/dpkg/status
Description: Signed kernel image generic
 A kernel image for generic. This version of it is signed with
 Canonical's UEFI/Opal signing key.
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu21
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: gbaconniere 1914 F.... pulseaudio
CurrentDesktop: XFCE
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2019-11-19 (124 days ago)
InstallationMedia: Ubuntu 20.04 LTS "Focal Fossa" - Alpha amd64 (20191119)
MachineType: Dell Inc. XPS 13 7390
Package: linux (not installed)
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-18-generic root=UUID=d9356c11-56a5-478c-b853-283a48be11f8 ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 5.4.0-18.22-generic 5.4.24
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-18-generic N/A
 linux-backports-modules-5.4.0-18-generic N/A
 linux-firmware 1.187
Tags: focal
Uname: Linux 5.4.0-18-generic x86_64
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip libvirt lpadmin lxd plugdev sambashare sudo
_MarkForUpload: False
dmi.bios.date: 10/03/2019
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.2.0
dmi.board.name: 0G2D0W
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 10
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.2.0:bd10/03/2019:svnDellInc.:pnXPS137390:pvr:rvnDellInc.:rn0G2D0W:rvrA00:cvnDellInc.:ct10:cvr:
dmi.product.family: XPS
dmi.product.name: XPS 13 7390
dmi.product.sku: 0962
dmi.sys.vendor: Dell Inc.

Guy Baconniere (lordbaco) wrote :
Guy Baconniere (lordbaco) wrote :

Similar to this one

NULL pointer dereference in i915_active_acquire since Linux 5.4
https://gitlab.freedesktop.org/drm/intel/issues/827

Guy Baconniere (lordbaco) wrote :
summary: - Dell XPS 13 : Screen freezes and Kernel Oops i915_active_ref
+ Dell XPS 13 : Screen freezes and Kernel Oops i915_active_acquire since
+ Linux 5.4

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1868551

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete

apport information

tags: added: apport-collected
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
summary: - Dell XPS 13 : Screen freezes and Kernel Oops i915_active_acquire since
+ Screen freezes : NULL pointer dereference i915_active_acquire since
Linux 5.4
tags: added: patch

I did not manage to compile the kernel the Ubuntu way
https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel
Is this doc still relevant for Linux 5.4 and 20.04 / Focal?

I included the patch I ported to the latest version of ubuntu-focal
git clone git://kernel.ubuntu.com/ubuntu/ubuntu-focal.git

Guy Baconniere (lordbaco) wrote :

I managed to build, install and boot the Linux Kernel 5.4 patched with the above patch.

Kai-Heng Feng (kaihengfeng) wrote :

Is there an upstream commit?

Guy Baconniere (lordbaco) wrote :

Q: Is there an upstream commit?

A: Not that I am aware of

https://gitlab.freedesktop.org/drm/intel/issues/827

A month ago:
Hi, when will this make it's way into Kernel 5.4? -- The Boy from the MAD show ;-)

You can check if you had the same type crash of your Ubuntu 20.04 (soon LTS)

zgrep -h i915_active_acquire /var/log/kern.log*

RIP: 0010:i915_active_acquire+0xe/0x80 [i915]

I had about one or two crash / freeze of Ubuntu 20.04 per week on my Dell XPS 13 connected to an external monitor. Initially, I thought it was related to the instability of the hardware itself but at the end it seems to be an Intel i915 graphics kernel driver bug or regression on Linux Kernel 5.4.

Guy Baconniere (lordbaco) wrote :

zgrep -h i915_active_acquire /var/log/kern.log* kern.log* | cut -d ' ' -f1-3,5,9-

Guy Baconniere (lordbaco) wrote :

Seems to be fixed in the latest 5.5
http://kobi.wang/v5.x/ChangeLog-5.5

    drm/i915: Hold reference to intel_frontbuffer as we track activity

    Since obj->frontbuffer is no longer protected by the struct_mutex, as we
    are processing the execbuf, it may be removed. Mark the
    intel_frontbuffer as rcu protected, and so acquire a reference to
    the struct as we track activity upon it.

    Closes: https://gitlab.freedesktop.org/drm/intel/issues/827
    Fixes: 8e7cb1799b4f ("drm/i915: Extract intel_frontbuffer active tracking")
    Signed-off-by: Chris Wilson <email address hidden>
    Link: https://patchwork<email address hidden>
    (cherry picked from commit da42104f589d979bbe402703fd836cec60befae1)

I am not sure that is a good idea to ship Ubuntu 20.04 LTS with Linux Kernel 5.4 branch unless you backport 5.5 patches as many people rely on bult-in Intel GPU in their workstation and the Intel graphics kernel driver is not very stable

https://linuxreviews.org/Linux_Kernel_5.5_Will_Not_Fix_The_Frequent_Intel_GPU_Hangs_In_Recent_Kernels

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=949369

https://bugzilla.redhat.com/show_bug.cgi?id=1805278

https://bugs.archlinux.org/task/65392

Google "i915_active_acquire"+"i915_vma_move_to_active"
Google link:https://gitlab.freedesktop.org/drm/intel/issues/827

commit e85ade1f50aae464ce196672faa7a099fd1721ed
Author: Chris Wilson <email address hidden>
Date: Wed Dec 18 10:40:43 2019 +0000

Guy Baconniere (lordbaco) wrote :
Download full text (5.4 KiB)

With the above patch, I still have the screen freeze but without Kernel oops.

I will compile and my own Linux Kernel 4.15 (18.04 LTS) for Ubuntu 20.04 (soon LTS) as
Intel Graphics i915 is unusable on my Dell XPS 13 connected to external
screen and XFCE4

 i915 0000:00:02.0: GPU HANG: ecode 9:1:0x00000000, hang on rcs0
 i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
 [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
 i915 0000:00:02.0: Resetting chip for hang on rcs0
 [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
 [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
 Asynchronous wait on fence i915:xfwm4[1901]:57922 timed out (hint:intel_atomic_commit_ready+0x0/0x54 [i915])
 Asynchronous wait on fence i915:xfwm4[1901]:57922 timed out (hint:intel_atomic_commit_ready+0x0/0x54 [i915])
 i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
 i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
 ...
 i915 0000:00:02.0: GPU recovery timed out, cancelling all in-flight rendering.
 i915 0000:00:02.0: Resetting chip for hang on rcs0
 i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
 i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
 Asynchronous wait on fence i915:xfwm4[1901]:57926 timed out (hint:intel_atomic_commit_ready+0x0/0x54 [i915])
 Asynchronous wait on fence i915:xfwm4[1901]:57926 timed out (hint:intel_atomic_commit_ready+0x0/0x54 [i915])
 i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
 i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
 ...
 i915 0000:00:02.0: GPU recovery timed out, cancelling all in-flight rendering.
 i915 0000:00:02.0: Resetting chip for hang on rcs0
 i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
 [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
 i915 0000:00:02.0: Resetting chip for hang on rcs0
 [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
 [drm:gen8_reset_engines [i915]] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}
 i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
 i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
 Asynchronous wait on fence i915:xfwm4[1901]:5792a timed out (hint:intel_atomic_commit_ready+0x0/0x54 [i915])
 Asynchronous wait on fence i915:xfwm4[1901]:5792a timed out (hint:intel_atomic_commit_ready+0x0/0x54 [i915])
 i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
 i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
 ...
 i915 0000:00:02.0: GPU recovery timed out, cancelling all in-flight rendering.
 i915 0000:00:02.0: Resetting chip for hang on rcs0
 i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
 ...
 Asynchronous wait on fence i915:xfwm4[1901]:5792e timed out (hint:intel_atomic_commit_ready+0x0/0x54 [i915])
 Asynchronous wait on fence i915:xfwm4[1901]:5792e timed out (hint:intel_atomic_commit_ready+0x0/0x54 [i915])
 i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
 i915 0000:00:02.0: Resetting rcs0 for hang on rcs0
 ...
 i915 0000:00:02.0: G...

Read more...

Guy Baconniere (lordbaco) wrote :

I am testing with Linux Kernel 5.5.13-050513-generic from
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.5.13/
as older kernels (<v5.3) do not have all drivers I need
on 20.04 LTS and v5.6 is not working properly with WiFi.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=954817#17

summary: - Screen freezes : NULL pointer dereference i915_active_acquire since
- Linux 5.4
+ Intel GPU Hangs : random screen freezing w/ Ubuntu 20.04 (Linux 5.4)
+ i915_active_acquire
Guy Baconniere (lordbaco) wrote :

So far no crash, freeze of any kind with Linux 5.5.13 compared to the unstability of Ubuntu 20.04 with Linux 5.4 on Dell XPS 13 with 10th Generation Intel Core i7-10510U CPU (Comet Lake)

I guess the patches are part of Linux v5.5.12 and later
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.5.12/CHANGES

 Chris Wilson (1):
      drm/i915/execlists: Track active elements during dequeue

 Matt Roper (1):
      drm/i915: Handle all MCR ranges

 Caz Yokoyama (1):
      Revert "drm/i915/tgl: Add extra hdc flush workaround"

Maybe Canonical should backport those patches to 5.4 or switch to 5.5 branch before Ubuntu 20.04 General Availability?

https://lkml.org/lkml/2020/3/22/419
https://lkml.org/lkml/2020/3/19/1779

Guy Baconniere (lordbaco) wrote :

Linux Kernel 5.5.x (>5.5.12) is working fine

Changed in linux (Ubuntu):
status: Confirmed → New
information type: Public → Private
information type: Private → Public
Guy Baconniere (lordbaco) wrote :

https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.4.32/CHANGES

https://patchwork.ozlabs.org/<email address hidden>/

Note that this bug only affects 5.4 and has since been fixed in 5.5.
Normally, a backport of the fix from 5.5 would be in order, but the
patch set that fixes this deadlock involves massive changes that are
neither feasible nor desirable for backporting [1][2][3]. Therefore,
this small patch was made to address the deadlock specifically for 5.4.

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Guy Baconniere (lordbaco) wrote :

As 5.5 is more stable on my Dell XPS 13 than 5.4 with Intel iGPU (i915)
I did the following one-liner to install the latest version of Linux Kernel 5.5 on Ubuntu 20.04

which curl >/dev/null 2>&1 || sudo apt-get install -qq -y curl; for version in $(curl -sL https://kernel.ubuntu.com/~kernel-ppa/mainline/ | tac | grep -Pom1 '(?<=")v5\.5\.[0-9]+/'); do for deb in $(curl -sL https://kernel.ubuntu.com/~kernel-ppa/mainline/${version} | grep -Pom4 '(?<=")linux-(headers|image-unsigned|modules)-5\.5\.[^"]+[0-9]+(-generic_5\.5\.|_5\.5\.)[^"]+.(amd64|all)\.deb'); do curl -sLo ${deb} https://kernel.ubuntu.com/~kernel-ppa/mainline/${version}/${deb}; debs="${debs} ${deb}"; done; sudo dpkg -i ${debs}; done

Sultan Alsawaf (kerneltoast) wrote :

@lordbaco Please install this kernel and see if it still crashes: https://kernel.ubuntu.com/~sultan/i915-lp1870265/

FurretUber (furretuber) wrote :

I tested the kernel from comment 34 and my computer is no longer crashing. With this kernel, it is possible to use default Xubuntu, without changing Xorg session or xfwm4 vblank_mode.

What I noticed is that there are 1330 dmesg errors like:

[52999.965468] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=3180783 end=3180784) time 440 us, min 763, max 767, scanline start 748, end 769

Sultan Alsawaf (kerneltoast) wrote :

@leozinho29-eu Can you add i915.enable_fbc=0 to the kernel command line and see if that fixes the "Atomic update failure" messages? If it does, then we'll know the problem is related to framebuffer compression.

FurretUber (furretuber) wrote :

Setting that option made my computer crash again. It crashed after 8 minutes of usage.

Sultan Alsawaf (kerneltoast) wrote :

@leozinho29-eu Wow, that's really unexpected! I'll take a look. Thanks for the log and quick reply.

Sultan Alsawaf (kerneltoast) wrote :

@leozinho29-eu For some reason, the builder I used to make the kernel in comment 34 didn't pick up the change that was supposed to fix this bug's crash... I'll rebuild it on my local machine for you.

Sultan Alsawaf (kerneltoast) wrote :

@leozinho29-eu Please try this kernel: https://kernel.ubuntu.com/~sultan/i915-lp1868551/

Note that there really is only one package to install. That single package contains everything.

FurretUber (furretuber) wrote :

For now (1 hour and 40 minutes running glxgears in one screen), the system hasn't crashed. With the following command line:

BOOT_IMAGE=/boot/vmlinuz-5.4.30+ root=UUID=6b4ae5c0-c78c-49a6-a1ba-029192618a7a ro quiet ro kvm.ignore_msrs=1 kvm.report_ignored_msrs=0 kvm.halt_poll_ns=0 kvm.halt_poll_ns_grow=0 i915.enable_fbc=0 i915.enable_gvt=1 resume=UUID=a82e38a0-8d20-49dd-9cbd-de7216b589fc log_buf_len=16M usbhid.quirks=0x0079:0x0006:0x100000 mtrr_gran_size=64M mtrr_chunk_size=64M nbd.nbds_max=2 nbd.max_part=63 cgroup_enable=memory swapaccount=1

The crash hasn't happened, but the Atomic update failure messages are still present and happening relatively often. Strange how the kernel from command 34 was fine until i915.enable_fbc=0 was set.

By the way, does the bisect from the duplicate make sense? I found the result strange, but it seemed to fix it to me.

Sultan Alsawaf (kerneltoast) wrote :

@leozinho29-eu The kernel from comment 34 was just a plain 5.4.0-26 kernel, with nothing added to it (which wasn't intentional). Could you reproduce the atomic update failure messages with an official Ubuntu kernel installed, and then create a new launchpad bug for it?

Given that you encountered the atomic update failures with comment 34's kernel, which was just an unmodified Ubuntu kernel, I don't think the problem is related to the fix for this bug.

The bisect from your duplicate doesn't make sense, but I think there's a reasonable explanation for that. This kernel panic is caused by a race, so any changes to code that could affect the timing of the GPU driver's code execution, in some indirect way, could either exacerbate or alleviate the crash.

That said, there is a clear diagnosis for this bug, and it is definitely fixed by the upstream commit da42104f589d ("drm/i915: Hold reference to intel_frontbuffer as we track activity").

Antony Jones (wrh) wrote :

Tried the kernel by @lordbaco in #33 and it doesn't improve anything, just in case it's useful.

My machine freezes up after a few hours, usually at a point where the fans start to spin up under load. 2-3 times a day.

The symptoms are that the cursor starts to lag, massively, and the whole machine gets less and less responsive until it freezes.

I didn't have any luck with the kernel in #40 as it wouldn't boot for me. If you have any pointers to booting with it then I'll happily try.

I'm not seeing the error messages mentioned by anybody else in this thread.

Guy Baconniere (lordbaco) wrote :

I tested the test kernel 5.4 for about a day without issue
but all my computers using Intel GPU are using by default
kernel 5.5. I don't want to change that.

If you want to follow other users having the same kind of
issues with different i915 bugs on different Linux distro
follow comments on
https://linuxreviews.org/Linux_Kernel_5.5_Will_Not_Fix_The_Frequent_Intel_GPU_Hangs_In_Recent_Kernels

FurretUber (furretuber) wrote :

On my computer I use the workaround of changing the Xorg session from modesetting to intel. To do this, firstly checking if xserver-xorg-video-intel is installed and then I create the file /etc/X11/xorg.conf.d/99-intel.conf with the following content:

Section "Device"
   Identifier "Intel Graphics"
   Driver "intel"
   Option "DRI" "3"
EndSection

Then I restart the display manager to apply the change. Not only this crash stops happening to me, but it also solves another bug I face with default Xubuntu 20.04 settings:

https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/1870250

Changed in linux (Ubuntu Focal):
status: New → Confirmed
Stefan Bader (smb) on 2020-05-14
Changed in linux (Ubuntu Focal):
importance: Undecided → High
description: updated
description: updated
Changed in linux (Ubuntu Focal):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu):
status: Fix Committed → Invalid
Changed in linux (Ubuntu Focal):
status: Fix Committed → In Progress
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Guy Baconniere (lordbaco) wrote :

Regarding #28

"on 20.04 LTS and v5.6 is not working properly with WiFi."

When I first tested Kernel 5.6 branch, the WiFi was broken because of this:
https://linuxreviews.org/Linux_5.6.2_Is_Released_With_Intel_Wifi_Fix

Now it is working again with the current 5.6 branch.

Kernel 5.5 is now EOL (as of 5.5.19 end of April)
https://en.wikipedia.org/wiki/Linux_kernel_version_history#Releases_5.x.y

Intel should do better QA for their iGPU and WiFi
and try to port it to all stable LTS branches aka 5.4.x and 5.6.x

If you still have issues with Intel i915 on 5.4.x, you can use
the following one-liner to install 5.6.x

which curl >/dev/null 2>&1 || sudo apt-get install -qq -y curl; for version in $(curl -sL https://kernel.ubuntu.com/~kernel-ppa/mainline/ | tac | grep -Pom1 '(?<=")v5\.6\.[0-9]+/'); do for deb in $(curl -sL https://kernel.ubuntu.com/~kernel-ppa/mainline/${version} | grep -Pom4 '(?<=")linux-(headers|image-unsigned|modules)-5\.6\.[^"]+[0-9]+(-generic_5\.6\.|_5\.6\.)[^"]+.(amd64|all)\.deb'); do curl -sLo ${deb} https://kernel.ubuntu.com/~kernel-ppa/mainline/${version}/${deb}; debs="${debs} ${deb}"; done; sudo dpkg -i ${debs}; done

If you need to go back to legacy Ubuntu 20.04 LTS Kernel and remove Linux Kernel 5.5 and 5.6

dpkg --get-selections | awk '/linux-.*-5\.[5-6]/ { print $1 }' | xargs echo sudo apt-get purge -y

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Focal):
status: Fix Released → Fix Committed
Ankush singh (ankush947) on 2020-05-29
Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released

All autopkgtests for the newly accepted linux-oracle-5.4 (5.4.0-1019.19~18.04.1) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

zfs-linux/unknown (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#linux-oracle-5.4

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.