amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:6 pasid:32781 / Fence fallback timer expired on ring sdma0

Bug #2083538 reported by Julian Andres Klode
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Unknown
Mesa
Fix Released
Unknown
firefox (Ubuntu)
Invalid
Undecided
Alessandro Astone
linux (Ubuntu)
Invalid
Undecided
Unassigned
mesa (Ubuntu)
Fix Released
Undecided
Alessandro Astone
Jammy
Fix Released
Undecided
Alessandro Astone
Noble
Fix Released
Undecided
Alessandro Astone

Bug Description

[ Impact ]

 * amdgpu changes in kernel 6.11 make VAAPI video playback crash the GPU and bring down the whole system.

 * A fix was shipped in mesa 24.2, which is already in Oracular - the only version with kernel 6.11 currently, but snaps bundle their own version of mesa so core22 snaps like Firefox are affected. Firefox has VAAPI disabled by default, but it can be enabled by the user.

[ Test Plan ]

 * Set up a system with an AMD GPU running Ubuntu 24.10 with kernel 6.11

 * Open Firefox

 * Navigate to `about:config`

 * Search `media.ffmpeg.vaapi.enabled`

 * Enable the setting by clicking the toggle button

 * Restart Firefox

 * Play different videos on x.com or youtube.com
   (I don't have a reliable reproducer unfortunately)

 * Ensure that there are no green artifacts on the video playback

 * Ensure that the system didn't lock up

 * 10 minutes of scrolling through videos was generally enough to trigger
   the bug in my testing.
   Sometimes I could only get the green artifacts, other times I could get
   the system to lock up.

[ Where problems could occur ]

 * The scope of the change is limited to the VAAPI driver for AMD GPUs
   ( /usr/lib/x86_64-linux-gnu/dri/radeonsi_drv_video.so )
   An issue with the change may break video playback.
   But, as this bug demonstrated, a bug in the video acceleration driver
   may also bring down the whole system...

[ Original Description ]

It turns out that amdgpu in kernel 6.11 on the Ryzen 6850U is quite crashy and laggy. I have attached the previous boot log which shows a lot of errors.

It does not seem to like firefox.

Behavior visible is that it hangs, then tries resets, fails to reset and then the screen is unusable so I sysrq reboot it.

ProblemType: Bug
DistroRelease: Ubuntu 24.10
Package: linux-image-6.11.0-8-generic 6.11.0-8.8
ProcVersionSignature: Ubuntu 6.11.0-8.8-generic 6.11.0
Uname: Linux 6.11.0-8-generic x86_64
ApportVersion: 2.30.0-0ubuntu3
Architecture: amd64
CasperMD5CheckResult: pass
CurrentDesktop: GNOME
Date: Wed Oct 2 19:06:47 2024
InstallationDate: Installed on 2022-11-26 (676 days ago)
InstallationMedia: Ubuntu 23.04 "Lunar Lobster" - Alpha amd64 (20221126)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: LENOVO 21CF004PGE
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-6.11.0-8-generic root=/dev/mapper/ubuntu-root ro rootflags=subvol=@next quiet splash crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M zswap.enabled=1 zswap.compressor=zstd zswap.max_pool_percent=20 zswap.zpool=zsmalloc vt.handoff=7
RebootRequiredPkgs: Error: path contained symlinks.
RelatedPackageVersions:
 linux-restricted-modules-6.11.0-8-generic N/A
 linux-backports-modules-6.11.0-8-generic N/A
 linux-firmware 20240913.gita34e7a5f-0ubuntu2
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 05/29/2024
dmi.bios.release: 1.53
dmi.bios.vendor: LENOVO
dmi.bios.version: R23ET77W (1.53 )
dmi.board.asset.tag: Not Available
dmi.board.name: 21CF004PGE
dmi.board.vendor: LENOVO
dmi.board.version: SDK0T76538 WIN
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.ec.firmware.release: 1.32
dmi.modalias: dmi:bvnLENOVO:bvrR23ET77W(1.53):bd05/29/2024:br1.53:efr1.32:svnLENOVO:pn21CF004PGE:pvrThinkPadT14Gen3:rvnLENOVO:rn21CF004PGE:rvrSDK0T76538WIN:cvnLENOVO:ct10:cvrNone:skuLENOVO_MT_21CF_BU_Think_FM_ThinkPadT14Gen3:
dmi.product.family: ThinkPad T14 Gen 3
dmi.product.name: 21CF004PGE
dmi.product.sku: LENOVO_MT_21CF_BU_Think_FM_ThinkPad T14 Gen 3
dmi.product.version: ThinkPad T14 Gen 3
dmi.sys.vendor: LENOVO

Revision history for this message
Julian Andres Klode (juliank) wrote :
Revision history for this message
Mario Limonciello (superm1) wrote :

This is most likely a mesa issue. Have you upgraded mesa recently to match when it showed up?

Revision history for this message
Julian Andres Klode (juliank) wrote :

To the best of my knowledge the issue started happening when the kernel was upgraded. That being said, the Firefox process that seemingly causes stalls is a snap so it is using its own mesa and not the host one, and I can't speak to the snap's mesa version.

Revision history for this message
Mario Limonciello (superm1) wrote :

Can you check that? Was this maybe when the snap refreshed silently in background? Or try a version without snap so you could use the host mesa?

Revision history for this message
Alessandro Astone (aleasto) wrote :

I'm hitting the same issue `amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32785)` when playing twitter videos in the firefox snap through VAAPI -- media.ffmpeg.vaapi.enabled=true in about:config.

The firefox snap uses mesa 23.2.1 from jammy and that hasn't changed in a while.

I get no more crashes if I disable media.ffmpeg.vaapi.enabled in firefox (it's disabled by default).

Revision history for this message
Alessandro Astone (aleasto) wrote :

I cannot reproduce with the prebuilt firefox tarball from upstream, running on oracular with mesa 24.2.3 (and LLVM 19)

Revision history for this message
Mario Limonciello (superm1) wrote :

Possibly the same issue as fixed in mesa 24.1

https://gitlab.freedesktop.org/mesa/mesa/-/issues/11138

Please uprev mesa in the snap and see if it helps.

Revision history for this message
Julian Andres Klode (juliank) wrote (last edit ):

It does seem vaapi related for me too.

One thing I wonder, the crash is one thing; but the inability of the kernel/firmware to recover from that is another. As in, new mesa fixing that particular crash is ok, but optimally you'd also be able to recover from the crash with the old mesa; but the GPU restarts are failing and the driver apparently deadlocks or something (quite a bunch of mutex backtraces).

Changed in mesa (Ubuntu):
assignee: nobody → Alessandro Astone (aleasto)
Changed in firefox (Ubuntu):
assignee: nobody → Alessandro Astone (aleasto)
Revision history for this message
Mario Limonciello (superm1) wrote :

> but the GPU restarts are failing and the driver apparently deadlocks or something (quite a bunch of mutex backtraces).

Yes; poor clean up/recovery should be tracked as a separate bug report at https://gitlab.freedesktop.org/drm/amd/-/issues

Revision history for this message
Alessandro Astone (aleasto) wrote :

Indeed installing mesa 24.2 in the snap resolves the issue.

However the patch that was linked in the mesa bug alone does not: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29124

Revision history for this message
Alessandro Astone (aleasto) wrote :

I don't think there's interest in upgrading mesa in jammy at this point so here comes a 4-versions-across bisect 😰

Revision history for this message
Alessandro Astone (aleasto) wrote :
Changed in mesa (Ubuntu):
status: New → In Progress
Changed in firefox (Ubuntu):
status: New → Triaged
Changed in mesa (Ubuntu Jammy):
assignee: nobody → Alessandro Astone (aleasto)
status: New → In Progress
Changed in mesa (Ubuntu Noble):
assignee: nobody → Alessandro Astone (aleasto)
status: New → In Progress
Changed in mesa (Ubuntu):
status: In Progress → Fix Released
no longer affects: firefox (Ubuntu Jammy)
no longer affects: firefox (Ubuntu Noble)
no longer affects: linux (Ubuntu Jammy)
no longer affects: linux (Ubuntu Noble)
Changed in linux:
status: Unknown → Fix Released
Changed in mesa:
status: Unknown → Fix Released
Revision history for this message
Alessandro Astone (aleasto) wrote :

Proposing a backport to jammy

Revision history for this message
Julian Andres Klode (juliank) wrote :
Revision history for this message
Alessandro Astone (aleasto) wrote :

Yeah I'll handle the SRU template.

Please coordinate the upload with tjaalton who's looking into the raspi enablement patches.

description: updated
description: updated
Changed in linux (Ubuntu):
status: New → Invalid
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

I've got this

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "mesa_23.2.1-1ubuntu3.1~22.04.3.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

uploaded to the queue

Timo Aaltonen (tjaalton)
description: updated
Revision history for this message
Timo Jyrinki (timo-jyrinki) wrote :

Since the fix is not yet even in proposed, and I have a possible Z13 Gen 2 Lenovo pre-installed Ubuntu 22.04 LTS showing this problem (despite switching to generic 6.8 kernel), I have a single purpose PPA at https://launchpad.net/~timo-jyrinki/+archive/ubuntu/mesaprerelease (the final release will overwrite the version in the PPA).

Revision history for this message
Julian Andres Klode (juliank) wrote :

I think ~ubuntu-sponsors was subscribed by accident and am unsubscribing it.

Revision history for this message
Chris Halse Rogers (raof) wrote : Please test proposed package

Hello Julian, or anyone else affected,

Accepted mesa into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/mesa/23.2.1-1ubuntu3.1~22.04.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in mesa (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

okay, it has taken a bit longer than expected to get the oracular backport in noble, so I'll prepare a new mesa upload with just this fix for noble to unblock jammy

Revision history for this message
Alessandro Astone (aleasto) wrote :

Any idea why the `mesa-va-drivers` binary package does not have a `23.2.1-1ubuntu3.1~22.04.3` build in proposed, while all the other binary packages that mesa produces do?

Revision history for this message
Alessandro Astone (aleasto) wrote :

Ahh, pardon, I didn't realize that's a universe package :o

Revision history for this message
Alessandro Astone (aleasto) wrote (last edit ):

* Added jammy-proposed to the gnome-42-2204 snapcraft recipe
package-repositories:
  - type: apt
    url: http://archive.ubuntu.com/ubuntu
    suites: [jammy-proposed]
    components: [main, universe]
    key-id: F6ECB3762474EDA9D21B7022871920D1991BC93C
    key-server: keyserver.ubuntu.com

$ snapcraft --verbosity debug
[...]
2024-11-20 14:37:59.284 Downloading package: mesa-va-drivers
2024-11-20 14:37:59.585 Get: 124 mesa-va-drivers_23.2.1-1ubuntu3.1~22.04.3_amd64.deb [4100 kB]
[...]
2024-11-20 14:39:22.398 Extracting stage package: mesa-va-drivers
[...]

$ sudo snap install --dangerous ./gnome-42-2204_0+git.682b718-dirty_amd64.snap

Then performed the test plan, ensuring that VAAPI is actually used.
Can confirm no more green artifacts or crashes.

tags: added: verification-done verification-done-jammy
removed: verification-needed verification-needed-jammy
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (mesa/23.2.1-1ubuntu3.1~22.04.3)

All autopkgtests for the newly accepted mesa (23.2.1-1ubuntu3.1~22.04.3) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

asymptote/unknown (ppc64el)
freeglut/unknown (ppc64el)
glfw3/unknown (ppc64el)
gtk4/4.6.9+ds-0ubuntu0.22.04.2 (ppc64el)
libalien-sdl-perl/1.446-3.1 (ppc64el)
libsdl2/2.0.20+dfsg-2ubuntu1.22.04.1 (i386)
mutter/42.9-0ubuntu9 (ppc64el)
pyopencl/unknown (ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/jammy/update_excuses.html#mesa

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Julian, or anyone else affected,

Accepted mesa into noble-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/mesa/24.0.9-0ubuntu0.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-noble to verification-done-noble. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-noble. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

description: updated
Changed in mesa (Ubuntu Noble):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-noble
removed: verification-done
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Corrected the bug description to not mention noble being a full backport, as this is not accurate anymore.

Revision history for this message
Alessandro Astone (aleasto) wrote :

* Installed the firefox snap from --channge=edge, which is core24

* Verified following the test plan that it would reproduce the crash

* Added noble-proposed to the mesa-2404 recipe

package-repositories:
  - type: apt
    url: http://archive.ubuntu.com/ubuntu
    suites: [noble-proposed]
    components: [main, universe]
    architectures: [amd64, i386]
    key-id: F6ECB3762474EDA9D21B7022871920D1991BC93C
    key-server: keyserver.ubuntu.com
    priority: always

$ snapcraft --verbosity debug
[...]
2024-12-03 10:39:04.936 Downloading package: mesa-va-drivers
2024-12-03 10:39:05.069 Get: 20 mesa-va-drivers_24.0.9-0ubuntu0.3_amd64.deb [4246 kB]
[...]
2024-12-03 10:39:14.796 Extracting stage package: mesa-va-drivers

$ sudo snap install --dangerous ./mesa-2404_24.0.9_amd64.snap

* Verified following the test plan that the crash and green artifacts did not happen.

tags: added: verification-done verification-done-noble
removed: verification-needed verification-needed-noble
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Update Released

The verification of the Stable Release Update for mesa has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mesa - 23.2.1-1ubuntu3.1~22.04.3

---------------
mesa (23.2.1-1ubuntu3.1~22.04.3) jammy; urgency=medium

  [ Timo Aaltonen ]
  * Add support for Pi 2712D0 stepping (LP: #2082072)

  [ Alessandro Astone ]
  * patches: Backport patch for green artifacting and GPU crash on
    radeonsi with kernel >= 6.10 (LP: #2083538)

 -- Timo Aaltonen <email address hidden> Wed, 09 Oct 2024 17:47:27 +0300

Changed in mesa (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mesa - 24.0.9-0ubuntu0.3

---------------
mesa (24.0.9-0ubuntu0.3) noble; urgency=medium

  [ Alessandro Astone ]
  * patches: Backport patch for green artifacting and GPU crash on
    radeonsi with kernel >= 6.10 (LP: #2083538)

 -- Timo Aaltonen <email address hidden> Wed, 20 Nov 2024 12:05:41 +0200

Changed in mesa (Ubuntu Noble):
status: Fix Committed → Fix Released
Changed in firefox (Ubuntu):
status: Triaged → Invalid
Juerg Haefliger (juergh)
tags: added: kernel-daily-bug
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.