Screen freeze when performing memory stress in Wayland mode

Bug #1990089 reported by jeremyszu
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Linux
New
Unknown
Mutter
Fix Released
Unknown
OEM Priority Project
Fix Released
Critical
jeremyszu
mesa (Ubuntu)
Fix Released
Medium
Unassigned
Jammy
Fix Released
Medium
Unassigned
Kinetic
Won't Fix
Medium
Unassigned
Lunar
Fix Released
Medium
Unassigned

Bug Description

[Impact]

Running stress-ng freezes the screen under wayland on intel iris. Upstream has fixed it in

5aae8a05264c354aa93017d323ce238858f68227 iris: Retry DRM_IOCTL_I915_GEM_EXECBUFFER2 on ENOMEM
646cff13bca8a92b846984d782ef00e57d34d7a1 Revert "iris: Avoid abort() if kernel can't allocate memory"

which need to be backported for 22.2.5

[Test case]

Install the update, then

1. stress-ng --stack 0 --timeout 300
2. check the screen

and note that it shouldn't freeze anymore.

[Where things could go wrong]

This moves checking ENOMEM to the right place, so it's hard to see how it might cause issues.

--

[Steps to reproduce]
(disable systemd-oomd or executing over ssh)
(below command allocates a lot of memory to stress kernel page fault)

1. stress-ng --stack 0 --timeout 300
2. check the screen

[Expected result]
Screen will update slowly when performing the stress-test.
but screen needs back to work after stress.

[Actual result]
Screen freeze after stress test.

[Additional information]
kernel version: vmlinuz-5.17.0-1017-oem
kernel version: vmlinuz-6.0.0-060000rc4drmtip20220910-generic
Mesa version: 22.0.5-0ubuntu0.1
Mutter version: 42.2-0ubuntu1
Gnome-shell version: 42.2-0ubuntu0.2

* Issue happens in Wayland only

* gnome-shell keeps issuing ioctl()
```
(gdb) bt
#0 __GI___ioctl (fd=fd@entry=14, request=request@entry=3223348419) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1 0x00007fcacab0eb4f in intel_ioctl (arg=0x7fffc5404d80, request=3223348419, fd=14) at ../src/intel/common/intel_gem.h:75
#2 iris_wait_syncobj (timeout_nsec=140736502713808, syncobj=<optimized out>, bufmgr=<optimized out>) at ../src/gallium/drivers/iris/iris_fence.c:229
#3 iris_wait_syncobj (bufmgr=<optimized out>, syncobj=<optimized out>, timeout_nsec=timeout_nsec@entry=9223372036854775807) at ../src/gallium/drivers/iris/iris_fence.c:215
#4 0x00007fcacab8beab in iris_get_query_result (result=0x7fffc5404e50, wait=<optimized out>, query=0x55774159b800, ctx=<optimized out>) at ../src/gallium/drivers/iris/iris_query.c:635
#5 iris_get_query_result (ctx=<optimized out>, query=0x55774159b800, wait=<optimized out>, result=0x7fffc5404e50) at ../src/gallium/drivers/iris/iris_query.c:601
#6 0x00007fcaca405e89 in tc_get_query_result (_pipe=<optimized out>, query=0x55774159b800, wait=<optimized out>, result=0x7fffc5404e50) at ../src/gallium/auxiliary/util/u_threaded_context.c:881
#7 0x00007fcaca0f0e64 in get_query_result (pipe=pipe@entry=0x55773f87bad0, q=q@entry=0x557741a0c790, wait=wait@entry=1 '\001') at ../src/mesa/main/queryobj.c:266
#8 0x00007fcaca0f1b12 in _mesa_wait_query (q=0x557741a0c790, ctx=0x55773f8af980) at ../src/mesa/main/queryobj.c:344
#9 get_query_object (ctx=0x55773f8af980, func=func@entry=0x7fcacaf28283 "glGetQueryObjecti64v", id=<optimized out>, pname=34918, ptype=ptype@entry=5134, buf=0x0, offset=140736502714192)
    at ../src/mesa/main/queryobj.c:1174
#10 0x00007fcaca0f2d65 in _mesa_GetQueryObjecti64v (id=<optimized out>, pname=<optimized out>, params=<optimized out>) at ../src/mesa/main/queryobj.c:1257
#11 0x00007fcadfa39b90 in ?? () from /usr/lib/x86_64-linux-gnu/mutter-10/libmutter-cogl-10.so.0
#12 0x00007fcadfa751b9 in cogl_frame_info_get_rendering_duration_ns () from /usr/lib/x86_64-linux-gnu/mutter-10/libmutter-cogl-10.so.0
#13 0x00007fcadf8819c4 in ?? () from /lib/x86_64-linux-gnu/libmutter-10.so.0
#14 0x00007fcadfa6e7a2 in _cogl_onscreen_notify_complete () from /usr/lib/x86_64-linux-gnu/mutter-10/libmutter-cogl-10.so.0
#15 0x00007fcadf969cdd in ?? () from /lib/x86_64-linux-gnu/libmutter-10.so.0
#16 0x00007fcadf96ec7b in ?? () from /lib/x86_64-linux-gnu/libmutter-10.so.0
#17 0x00007fcadf969601 in ?? () from /lib/x86_64-linux-gnu/libmutter-10.so.0
#18 0x00007fcadf98395e in ?? () from /lib/x86_64-linux-gnu/libmutter-10.so.0
#19 0x00007fcadf96962d in ?? () from /lib/x86_64-linux-gnu/libmutter-10.so.0
#20 0x00007fcae0743c24 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#21 0x00007fcae07986f8 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#22 0x00007fcae0743293 in g_main_loop_run () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#23 0x00007fcadf8d0849 in meta_context_run_main_loop () from /lib/x86_64-linux-gnu/libmutter-10.so.0
#24 0x000055773f1a4f12 in ?? ()
#25 0x00007fcadf429d90 in __libc_start_call_main (main=main@entry=0x55773f1a4a70, argc=argc@entry=1, argv=argv@entry=0x7fffc54053e8) at ../sysdeps/nptl/libc_start_call_main.h:58
#26 0x00007fcadf429e40 in __libc_start_main_impl (main=0x55773f1a4a70, argc=1, argv=0x7fffc54053e8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffc54053d8)
    at ../csu/libc-start.c:392
#27 0x000055773f1a51b5 in ?? ()
```

Other discussion thread:
https://gitlab.freedesktop.org/drm/intel/-/issues/6851
https://gitlab.gnome.org/GNOME/mutter/-/issues/2431

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: mutter (not installed)
Uname: Linux 5.18.0-rc2+ x86_64
ApportVersion: 2.20.11-0ubuntu82.1
Architecture: amd64
CasperMD5json:
 {
   "result": "skip"
 }
Date: Mon Sep 19 10:39:59 2022
DistributionChannelDescriptor:
 # This is the distribution channel descriptor for the OEM CDs
 # For more information see http://wiki.ubuntu.com/DistributionChannelDescriptor
 canonical-oem-somerville-jammy-amd64-20220504-33+jellyfish-chansey+X25
InstallationDate: Installed on 2022-09-06 (12 days ago)
InstallationMedia: Ubuntu 22.04 LTS "Jammy Jellyfish" - somerville-jammy-amd64-20220504-33
ProcEnviron:
 TERM=rxvt-unicode-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: mutter
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
jeremyszu (os369510) wrote :
description: updated
Changed in oem-priority:
assignee: nobody → jeremyszu (os369510)
importance: Undecided → Critical
status: New → In Progress
tags: added: oem-priority
jeremyszu (os369510)
tags: added: originate-from-1982914 stella
Revision history for this message
jeremyszu (os369510) wrote :
Revision history for this message
jeremyszu (os369510) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1990089

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

While this is a kernel bug, we will probably do a workaround in mutter before the kernel is ever fixed.

Changed in linux (Ubuntu):
importance: Undecided → High
Changed in mutter (Ubuntu):
importance: Undecided → Wishlist
status: New → Opinion
assignee: nobody → Daniel van Vugt (vanvugt)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Daniel van Vugt (vanvugt) wrote :
tags: added: dt-651
Changed in linux:
status: Unknown → New
Changed in mutter:
status: Unknown → Fix Released
Revision history for this message
jeremyszu (os369510) wrote :

The correct fix should in kernel driver or GPU firmware.
Please track on drm-intel#6851.

Changed in oem-priority:
status: In Progress → Incomplete
tags: added: originate-from-1983068
jeremyszu (os369510)
tags: added: originate-from-1992116
jeremyszu (os369510)
tags: added: originate-from-1992736
Andy Chi (andch)
tags: added: originate-from-1981168 somerville
Andy Chi (andch)
tags: added: originate-from-1992399
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

FWIW the mutter workaround doesn't work for this issue.

Changed in linux (Ubuntu):
assignee: nobody → Kai-Heng Feng (kaihengfeng)
status: Confirmed → In Progress
Changed in linux (Ubuntu):
status: In Progress → Invalid
no longer affects: linux (Ubuntu)
Changed in mesa (Ubuntu Jammy):
importance: Undecided → Medium
status: New → Confirmed
Changed in mesa (Ubuntu Kinetic):
importance: Undecided → Medium
status: New → Confirmed
Changed in mesa (Ubuntu Lunar):
importance: Undecided → Medium
status: New → Confirmed
Changed in mesa (Ubuntu Lunar):
assignee: nobody → Kai-Heng Feng (kaihengfeng)
tags: added: fixed-in-mesa-24 fixed-upstream
no longer affects: linux (Ubuntu Jammy)
no longer affects: linux (Ubuntu Kinetic)
no longer affects: linux (Ubuntu Lunar)
no longer affects: mutter (Ubuntu)
no longer affects: mutter (Ubuntu Jammy)
no longer affects: mutter (Ubuntu Kinetic)
no longer affects: mutter (Ubuntu Lunar)
Changed in mesa (Ubuntu Lunar):
status: Confirmed → In Progress
Changed in mesa (Ubuntu Lunar):
status: In Progress → Fix Released
assignee: Kai-Heng Feng (kaihengfeng) → nobody
description: updated
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

lunar is released, skipping kinetic

Changed in mesa (Ubuntu Kinetic):
status: Confirmed → Won't Fix
Timo Aaltonen (tjaalton)
description: updated
Timo Aaltonen (tjaalton)
description: updated
Timo Aaltonen (tjaalton)
Changed in mesa (Ubuntu Jammy):
status: Confirmed → In Progress
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Please test proposed package

Hello jeremyszu, or anyone else affected,

Accepted mesa into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/mesa/22.2.5-0ubuntu0.1~22.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in mesa (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

I can verify the issue is fixed following the steps:
1) Enable -proposed pocket, and upgrade mesa
2) Run `stress-ng --stack 0 --mmap 0 --aggressive --timeout 300`
3) See if GNOME is frozen
4) Use top to observe CPU usage

With the new mesa, GNOME is no longer hung and there's no high CPU usage anymore.

tags: added: done-needed
removed: verification-needed
tags: added: verification-done-jammy
removed: verification-needed-jammy
Andy Chi (andch)
Changed in oem-priority:
status: Incomplete → Fix Released
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (mesa/22.2.5-0ubuntu0.1~22.04.2)

All autopkgtests for the newly accepted mesa (22.2.5-0ubuntu0.1~22.04.2) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

gtk+3.0/3.24.33-1ubuntu2 (i386)
mir/2.7.0-0ubuntu3 (armhf)
mutter/42.5-0ubuntu1 (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/jammy/update_excuses.html#mesa

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mesa - 22.2.5-0ubuntu0.1~22.04.2

---------------
mesa (22.2.5-0ubuntu0.1~22.04.2) jammy; urgency=medium

  * patches: Fix VA-API on DCN 3.1.4. (LP: #2017142)
  * patches: Fix a freeze with iris under stress test. (LP: #1990089)
  * patches: Revert two commits causing kwin to eventually crash. (LP:
    #2003339)

 -- Timo Aaltonen <email address hidden> Fri, 21 Apr 2023 14:51:10 +0300

Changed in mesa (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Update Released

The verification of the Stable Release Update for mesa has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (mesa/22.2.5-0ubuntu0.1~22.04.2)

All autopkgtests for the newly accepted mesa (22.2.5-0ubuntu0.1~22.04.2) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

gtk+3.0/3.24.33-1ubuntu2 (i386)
mutter/42.5-0ubuntu1 (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/jammy/update_excuses.html#mesa

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.