i915: Fixup regressions introduced with enabling single CCS engine

Bug #2072755 reported by nyanmisaka
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Noble
Fix Committed
Medium
Matthew Ruffell

Bug Description

BugLink: https://bugs.launchpad.net/bugs/2072755

[Impact]

Recently, the Intel i915 susbsystem underwent a change that limited the number of CCS engines that were initialised by default, and exposed to the user. Different chipsets have differing amounts of CCS engines, but most available in the market have 4 CCS engines. The new change just starts a single engine only, and allocates all CCS slices to this single engine. This single engine is then exposed to userspace. This effort is to workaround a hardware bug.

This all happened in:

commit 6db31251bb265813994bfb104eb4b4d0f44d64fb
Author: Andi Shyti <email address hidden>
Date: Thu Mar 28 08:34:05 2024 +0100
Subject: drm/i915/gt: Enable only one CCS for compute workload
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6db31251bb265813994bfb104eb4b4d0f44d64fb

which landed in:

$ git describe --contains 67f164e8510b16bda18642464863dba87a33d8cb
Ubuntu-6.8.0-38.38~525

There have been some side effects as a result of these changes, leading to failure of userspace applications, namely in video transcoding with ffmepg, resulting in fence expiration errors in dmesg like:

[ 81.026591] Fence expiration time out i915-0000:01:00.0:ffmpeg[521]:2!

There has also been a performance impact introduced by this change, which dropped performance of the GPU to 1/4 of what it was previously. This is likely due to most ARC GPUs usually having 4 CCS engines, and going down to 1 only without actually allocating the other three.

There are no workarounds. Users are suggested to downgrade to 6.8.0-36-generic while the fix is coming.

[Fix]

The regression was fixed by these two commits:

commit aee54e282002a127612b71255bbe879ec0103afd
Author: Andi Shyti <email address hidden>
Date: Fri Apr 26 02:07:23 2024 +0200
Subject: drm/i915/gt: Automate CCS Mode setting during engine resets
Link: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/commit/?id=aee54e282002a127612b71255bbe879ec0103afd

commit ee01b6a386eaf9984b58a2476e8f531149679da9
Author: Andi Shyti <email address hidden>
Date: Fri May 17 11:06:16 2024 +0200
Subject: drm/i915/gt: Fix CCS id's calculation for CCS mode setting
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ee01b6a386eaf9984b58a2476e8f531149679da9

"drm/i915/gt: Automate CCS Mode setting during engine resets" is already applied to noble/master-next through upstream stable v6.8.10.

We just need "drm/i915/gt: Fix CCS id's calculation for CCS mode setting". It is queued up for v6.9.4, but that could still be another SRU cycle or two away. So send it now.

"drm/i915/gt: Fix CCS id's calculation for CCS mode setting" restores another 1/4 performance, but some performance issues still remain, and will hopefully be addressed in a future patch.

[Testcase]

This affects video transcoding with ffmpeg, on machines equipped with Intel ARC GPUs.

An example ffmpeg command might be:

/usr/lib/jellyfin-ffmpeg/ffmpeg -analyzeduration 200M -probesize 1G -ss 00:00:03.000 -noaccurate_seek -init_hw_device vaapi=va:,kernel_driver=i915,driver=iHD -init_hw_device qsv=qs@va -filter_hw_device qs -hwaccel vaapi -hwaccel_output_format vaapi -noautorotate -i file:"/path/to/1080_video.mkv" -noautoscale -map_metadata -1 -map_chapters -1 -threads 0 -map 0:0 -map 0:1 -map -0:s -codec:v:0 av1_qsv -preset veryfast -b:v 3616000 -maxrate 3616000 -bufsize 7232000 -g:v:0 72 -keyint_min:v:0 72 -vf "setparams=color_primaries=bt709:color_trc=bt709:colorspace=bt709,scale_vaapi=w=1280:h=720:format=nv12:extra_hw_frames=24,hwmap=derive_device=qsv,format=qsv" -codec:a:0 libfdk_aac -ac 2 -vbr:a 5 -copyts -avoid_negative_ts disabled -max_muxing_queue_size 2048 -f hls -max_delay 5000000 -hls_time 3 -hls_segment_type fmp4 -hls_fmp4_init_filename "c30716eb121448346fcc00a2440071a3-1.mp4" -start_number 1 -hls_segment_filename "/var/lib/jellyfin/transcodes/c30716eb121448346fcc00a2440071a3%d.mp4" -hls_playlist_type vod -hls_list_size 0 -y "/var/lib/jellyfin/transcodes/c30716eb121448346fcc00a2440071a3.m3u8

Another user on bug 2072933 came up with this minimalist reproducer:

#include <cstdio>
#include <sycl/sycl.hpp>

int main() {
  // auto selector = sycl::cpu_selector_v; // Works fine
  auto selector = sycl::gpu_selector_v;

  auto queue = sycl::queue(selector);

  printf("Hello\n");
  queue.submit([&](sycl::handler &cgh) {
    cgh.parallel_for(sycl::range(1), [=](sycl::item<1> item) {});
  });
  queue.wait();
  printf("Bye\n");

  return 0;
}

$ icpx -fsycl sycltest.cpp -o sycltest
$ ./sycltest

These commands should run successfully to completion. On failure, they will
emit in dmesg:

[ 81.026591] Fence expiration time out i915-0000:01:00.0:ffmpeg[521]:2!

A test kernel is available in the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/lp2072755-test

If you install the test kernel, things should work correctly.

[Where problems could occur]

This issue affects users of i915, which is a pretty universal integrated GPU present on Intel processors. While these patches are unlikely to cause outages that stop the primary display from functioning, any further regressions may add additional performance impact or prevent workloads from executing correctly.

These patches are all accepted into upstream -stable, and we would consume them in due course anyway.

If a regression were to occur, there are no workarounds, and users would need to select an older kernel until a fix is available.

[Other info]

Upstream Bug:
https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10895

nyanmisaka (nyanmisaka)
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
TheDreadPirate (solidsnake1298) wrote :

I was able to reproduce. Reverting to package 6.8.0-36 restores all Intel Quick Sync functionality on my Arc GPU.

But the issues with Intel Arc GPUs and Quick Sync on 6.8.0-38 appears to be limited to ffmpeg commands that scale the output resolution.

An example ffmpeg command that does NOT work. The original video is 1080P and the output is 720P.

/usr/lib/jellyfin-ffmpeg/ffmpeg -analyzeduration 200M -probesize 1G -ss 00:00:03.000 -noaccurate_seek -init_hw_device vaapi=va:,kernel_driver=i915,driver=iHD -init_hw_device qsv=qs@va -filter_hw_device qs -hwaccel vaapi -hwaccel_output_format vaapi -noautorotate -i file:"/path/to/1080_video.mkv" -noautoscale -map_metadata -1 -map_chapters -1 -threads 0 -map 0:0 -map 0:1 -map -0:s -codec:v:0 av1_qsv -preset veryfast -b:v 3616000 -maxrate 3616000 -bufsize 7232000 -g:v:0 72 -keyint_min:v:0 72 -vf "setparams=color_primaries=bt709:color_trc=bt709:colorspace=bt709,scale_vaapi=w=1280:h=720:format=nv12:extra_hw_frames=24,hwmap=derive_device=qsv,format=qsv" -codec:a:0 libfdk_aac -ac 2 -vbr:a 5 -copyts -avoid_negative_ts disabled -max_muxing_queue_size 2048 -f hls -max_delay 5000000 -hls_time 3 -hls_segment_type fmp4 -hls_fmp4_init_filename "c30716eb121448346fcc00a2440071a3-1.mp4" -start_number 1 -hls_segment_filename "/var/lib/jellyfin/transcodes/c30716eb121448346fcc00a2440071a3%d.mp4" -hls_playlist_type vod -hls_list_size 0 -y "/var/lib/jellyfin/transcodes/c30716eb121448346fcc00a2440071a3.m3u8

This results in this ffmpeg output, in addition to the "Fence expiration" messages in dmesg that nyanmisaka provided.

[AVHWDeviceContext @ 0x599b524f7c40] No VA display found for any default device.
Device creation failed: -22.
Failed to set value 'vaapi=va:,kernel_driver=i915,driver=iHD' for option 'init_hw_device': Invalid argument
Error parsing global options: Invalid argument

Here is an example ffmpeg command on 6.8.0-38 that still works. Same source video, but the output resolution of the video is NOT changed.

/usr/lib/jellyfin-ffmpeg/ffmpeg -analyzeduration 200M -probesize 1G -init_hw_device vaapi=va:,kernel_driver=i915,driver=iHD -init_hw_device qsv=qs@va -filter_hw_device qs -hwaccel vaapi -hwaccel_output_format vaapi -noautorotate -i file:"/path/to/1080_video.mkv" -noautoscale -map_metadata -1 -map_chapters -1 -threads 0 -map 0:0 -map 0:1 -map -0:s -codec:v:0 av1_qsv -preset veryfast -b:v 9616000 -maxrate 9616000 -bufsize 19232000 -g:v:0 72 -keyint_min:v:0 72 -vf "setparams=color_primaries=bt709:color_trc=bt709:colorspace=bt709,scale_vaapi=format=nv12:extra_hw_frames=24,hwmap=derive_device=qsv,format=qsv" -codec:a:0 libfdk_aac -ac 2 -vbr:a 5 -copyts -avoid_negative_ts disabled -max_muxing_queue_size 2048 -f hls -max_delay 5000000 -hls_time 3 -hls_segment_type fmp4 -hls_fmp4_init_filename "30a76f05628fdcae4769435bd663c36e-1.mp4" -start_number 0 -hls_segment_filename "/var/lib/jellyfin/transcodes/30a76f05628fdcae4769435bd663c36e%d.mp4" -hls_playlist_type vod -hls_list_size 0 -y "/var/lib/jellyfin/transcodes/30a76f05628fdcae4769435bd663c36e.m3u8"

Note the only differences are the bit rate, which I have run tests to factor that out, and the presence of "w=1280:h=720" to scale down the video resolution.

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi nyanmisaka,

"drm/i915/gt: Automate CCS Mode setting during engine resets" is already queued up in master-next, so it should be available in the next SRU cycle or so:

commit aee54e282002a127612b71255bbe879ec0103afd
Author: Andi Shyti <email address hidden>
Date: Fri Apr 26 02:07:23 2024 +0200
Subject: drm/i915/gt: Automate CCS Mode setting during engine resets
Link: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/commit/?id=aee54e282002a127612b71255bbe879ec0103afd

$ git describe --contains aee54e282002a127612b71255bbe879ec0103afd
fatal: cannot describe 'aee54e282002a127612b71255bbe879ec0103afd'

Its not tagged to any release yet, but its in the pipeline.

I think the Kernel Team are currently at upstream stable 6.8.12, and havne't begun 6.9.x patches, so at the moment, "drm/i915/gt: Fix CCS id's calculation for CCS mode setting" hasn't been applied yet.

I can make you a test kernel with the fix, and we can do a manual SRU if you like?

Thanks,
Matthew

Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi nyanmisaka,

I have built you a test kernel based on 6.8.0-38-generic with the following two commits added ontop:

commit aee54e282002a127612b71255bbe879ec0103afd
Author: Andi Shyti <email address hidden>
Date: Fri Apr 26 02:07:23 2024 +0200
Subject: drm/i915/gt: Automate CCS Mode setting during engine resets
Link: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/noble/commit/?id=aee54e282002a127612b71255bbe879ec0103afd

commit ee01b6a386eaf9984b58a2476e8f531149679da9
Author: Andi Shyti <email address hidden>
Date: Fri May 17 11:06:16 2024 +0200
Subject: drm/i915/gt: Fix CCS id's calculation for CCS mode setting
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ee01b6a386eaf9984b58a2476e8f531149679da9

Can you try it out and let me know if it fixes your issue:

Please note this package is NOT SUPPORTED by Canonical, and is for TESTING
PURPOSES ONLY. ONLY Install in a dedicated test environment.

Instructions to Install (On a noble system):
1) sudo add-apt-repository ppa:mruffell/lp2072755-test
2) sudo apt update
3) sudo apt install linux-image-unsigned-6.8.0-38-generic linux-modules-6.8.0-38-generic linux-modules-extra-6.8.0-38-generic linux-headers-6.8.0-38-generic
4) sudo reboot
5) uname -rv
6.8.0-38-generic #38+TEST2072755v20240712b1-Ubuntu SMP PREEMPT_DYNAMIC Fri Jul 12

If you are asked to abort removal of the running kernel, say no.

Let me know if it works, and if it does, I will send "drm/i915/gt: Fix CCS id's calculation for CCS mode setting" for SRU.

Thanks,
Matthew

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
TheDreadPirate (solidsnake1298) wrote :

Matthew,

Your patched 6.8.0-38 kernel appears to have resolved the issue. I will be performing additional tests over the weekend, but the ffmpeg transcodes that had failed with the production 6.8.0-38 are now working.

Revision history for this message
nyanmisaka (nyanmisaka) wrote :

Hi Matthew,

I can also confirm that it fixes the i915 regression on my Arc A380 graphics. So these two patches should be safe for SRU. Thanks for your help.

summary: - Request backport of two i915/Intel Arc GPU patches
+ i915: Fixup regressions introduced with enabling single CCS engine
Changed in linux (Ubuntu Noble):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Matthew Ruffell (mruffell)
description: updated
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Hi nyanmisaka, TheDreadPirate,

Thanks for trying the test kernel, and great to hear that it works.

I wrote up a SRU template, as you can see, in the description of the bug.

I also submitted the patch to the Ubuntu kernel mailing list:

Cover Letter:
https://lists.ubuntu.com/archives/kernel-team/2024-July/152131.html
Patch:
https://lists.ubuntu.com/archives/kernel-team/2024-July/152132.html

I will let you know once the Kernel team has reviewed and acked the patch,
and when they get built into a kernel in -proposed for verification.

Thanks,
Matthew

Stefan Bader (smb)
Changed in linux (Ubuntu Noble):
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.