i915: Fixup regressions introduced with enabling single CCS engine
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Noble |
Fix Committed
|
Medium
|
Matthew Ruffell |
Bug Description
BugLink: https:/
[Impact]
Recently, the Intel i915 susbsystem underwent a change that limited the number of CCS engines that were initialised by default, and exposed to the user. Different chipsets have differing amounts of CCS engines, but most available in the market have 4 CCS engines. The new change just starts a single engine only, and allocates all CCS slices to this single engine. This single engine is then exposed to userspace. This effort is to workaround a hardware bug.
This all happened in:
commit 6db31251bb26581
Author: Andi Shyti <email address hidden>
Date: Thu Mar 28 08:34:05 2024 +0100
Subject: drm/i915/gt: Enable only one CCS for compute workload
Link: https:/
which landed in:
$ git describe --contains 67f164e8510b16b
Ubuntu-
There have been some side effects as a result of these changes, leading to failure of userspace applications, namely in video transcoding with ffmepg, resulting in fence expiration errors in dmesg like:
[ 81.026591] Fence expiration time out i915-0000:
There has also been a performance impact introduced by this change, which dropped performance of the GPU to 1/4 of what it was previously. This is likely due to most ARC GPUs usually having 4 CCS engines, and going down to 1 only without actually allocating the other three.
There are no workarounds. Users are suggested to downgrade to 6.8.0-36-generic while the fix is coming.
[Fix]
The regression was fixed by these two commits:
commit aee54e282002a12
Author: Andi Shyti <email address hidden>
Date: Fri Apr 26 02:07:23 2024 +0200
Subject: drm/i915/gt: Automate CCS Mode setting during engine resets
Link: https:/
commit ee01b6a386eaf99
Author: Andi Shyti <email address hidden>
Date: Fri May 17 11:06:16 2024 +0200
Subject: drm/i915/gt: Fix CCS id's calculation for CCS mode setting
Link: https:/
"drm/i915/gt: Automate CCS Mode setting during engine resets" is already applied to noble/master-next through upstream stable v6.8.10.
We just need "drm/i915/gt: Fix CCS id's calculation for CCS mode setting". It is queued up for v6.9.4, but that could still be another SRU cycle or two away. So send it now.
"drm/i915/gt: Fix CCS id's calculation for CCS mode setting" restores another 1/4 performance, but some performance issues still remain, and will hopefully be addressed in a future patch.
[Testcase]
This affects video transcoding with ffmpeg, on machines equipped with Intel ARC GPUs.
An example ffmpeg command might be:
/usr/lib/
Another user on bug 2072933 came up with this minimalist reproducer:
#include <cstdio>
#include <sycl/sycl.hpp>
int main() {
// auto selector = sycl::cpu_
auto selector = sycl::gpu_
auto queue = sycl::queue(
printf(
queue.
cgh.
});
queue.wait();
printf("Bye\n");
return 0;
}
$ icpx -fsycl sycltest.cpp -o sycltest
$ ./sycltest
These commands should run successfully to completion. On failure, they will
emit in dmesg:
[ 81.026591] Fence expiration time out i915-0000:
A test kernel is available in the following ppa:
https:/
If you install the test kernel, things should work correctly.
[Where problems could occur]
This issue affects users of i915, which is a pretty universal integrated GPU present on Intel processors. While these patches are unlikely to cause outages that stop the primary display from functioning, any further regressions may add additional performance impact or prevent workloads from executing correctly.
These patches are all accepted into upstream -stable, and we would consume them in due course anyway.
If a regression were to occur, there are no workarounds, and users would need to select an older kernel until a fix is available.
[Other info]
Upstream Bug:
https:/
description: | updated |
summary: |
- Request backport of two i915/Intel Arc GPU patches + i915: Fixup regressions introduced with enabling single CCS engine |
Changed in linux (Ubuntu Noble): | |
status: | New → In Progress |
importance: | Undecided → Medium |
assignee: | nobody → Matthew Ruffell (mruffell) |
description: | updated |
Changed in linux (Ubuntu Noble): | |
status: | In Progress → Fix Committed |
Status changed to 'Confirmed' because the bug affects multiple users.