20230323.gitbcdcfbcf-0ubuntu1.4 borks amdgpu

Bug #2029396 reported by Henrik Holst
46
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Linux Firmware
Fix Released
Unknown
linux-firmware (Ubuntu)
Invalid
Undecided
Unassigned
Jammy
Fix Released
Undecided
Unassigned
Lunar
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

20230323.gitbcdcfbcf-0ubuntu1.4 borks amdgpu during boot, it completely hangs the entire kernel on x rx7900xtx. Reverting back to 20230323.gitbcdcfbcf-0ubuntu1.2 made the kernel boot properly again.

Regression! Apparently the AMD Navi31 firmware updates that just landed in Jammy and Lunar need associated kernel changes.

[Fix]

Revert AMD firmware updates.

[Test Case]

Boot a machine with an affected AMD GPU (see comments below).

[Where Problems Could Occur]

Machines with AMD GPUs might not boot or crash.

Revision history for this message
Henrik Holst (henrik-holst2) wrote :

This on Ubuntu 23.04 and 6.2.0-26-generic, but since the previous firmware worked I assume that the kernel version is not important here and it more is that the 1.4 update contains a firmware for the rx7900xtx that hangs it. Also booting to recovery and trying to enter graphical mode (when the borked firmware was installed) also borked the system while the non-graphical recovery mode worked just fine.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-firmware (Ubuntu):
status: New → Confirmed
Revision history for this message
M. Miller (mmill3r) wrote :

I was affected by this one, too.

In case someone faces this one and doesn't know what to do, here's the command lines that should get you started again:

wget http://archive.ubuntu.com/ubuntu/pool/main/l/linux-firmware/linux-firmware_20230323.gitbcdcfbcf-0ubuntu1.2_all.deb
sudo dpkg -i linux-firmware_20230323.gitbcdcfbcf-0ubuntu1.2_all.deb

HTH!

Revision history for this message
Juerg Haefliger (juergh) wrote :

Can you attach some logs so what we can at least see what the driver tries to load?

tags: added: kern-7610
Changed in linux-firmware (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
M. Miller (mmill3r) wrote :

This is the last syslog line I get before the boot process finally is stuck (prior to even being able to show the password dialog for the LUKS encryption): amdgpu 0000:0d:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode

the Xorg log file says "(EE) AMDGPU(0): [drm] Failed to open DRM device for pci:0000:0d:00.0: No such file or directory" before telling me that it unloaded the amdgpu module.

That's basically all I can see. If you have an idea for creating more meaningful logs, or where I could fine some, please let me know any time!

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

attach 'lspci -vnn' please, so we see the gpu model

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

also, please test if 1.3 was broken (1.4 replaced it before it got released)
https://launchpad.net/ubuntu/+source/linux-firmware/20230323.gitbcdcfbcf-0ubuntu1.3/+build/26397687

Revision history for this message
M. Miller (mmill3r) wrote :

I guess this should be the relevant part:

0b:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev 10) (prog-if 00 [Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 43, IOMMU group 27
        Memory at fcb00000 (32-bit, non-prefetchable) [size=16K]
        Bus: primary=0b, secondary=0c, subordinate=0d, sec-latency=0
        I/O behind bridge: e000-efff [size=4K] [16-bit]
        Memory behind bridge: fc900000-fcafffff [size=2M] [32-bit]
        Prefetchable memory behind bridge: f000000000-f80fffffff [size=33024M] [32-bit]
        Capabilities: <access denied>
        Kernel driver in use: pcieport

0c:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479] (rev 10) (prog-if 00 [Normal decode])
        Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
        Flags: bus master, fast devsel, latency 0, IRQ 44, IOMMU group 28
        Bus: primary=0c, secondary=0d, subordinate=0d, sec-latency=0
        I/O behind bridge: e000-efff [size=4K] [16-bit]
        Memory behind bridge: fc900000-fcafffff [size=2M] [32-bit]
        Prefetchable memory behind bridge: f000000000-f80fffffff [size=33024M] [32-bit]
        Capabilities: <access denied>
        Kernel driver in use: pcieport

0d:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX] [1002:744c] (rev c8) (prog-if 00 [VGA controller])
        Subsystem: ASRock Incorporation Device [1849:5304]
        Flags: bus master, fast devsel, latency 0, IRQ 154, IOMMU group 29
        Memory at f000000000 (64-bit, prefetchable) [size=32G]
        Memory at f800000000 (64-bit, prefetchable) [size=256M]
        I/O ports at e000 [size=256]
        Memory at fc900000 (32-bit, non-prefetchable) [size=1M]
        Expansion ROM at fca00000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu

I will try to test 1.3 some time tonight!

Revision history for this message
Mario Limonciello (superm1) wrote :

Are the matching sru kernel patches missing possibly?

ac2f5739fdca drm/amdgpu/mes11: enable reg active poll
a2fe4534bb38 drm/amd/amdgpu: update mes11 api def
da9a8dc33da2 drm/amdgpu: reserve the old gc_11_0_*_mes.bin
616843d5a11b drm/amd/amdgpu: introduce gc_*_mes_2.bin v2
09bf14907d86 drm/amdgpu: declare firmware for new MES 11.0.4

Revision history for this message
Timo Aaltonen (tjaalton) wrote (last edit ):

right, so please test the latest l-f together with 6.2 from lunar-proposed..

Revision history for this message
Mark (selective-panic) wrote :
Revision history for this message
Mark (selective-panic) wrote :
Download full text (8.7 KiB)

problem still exists with:
https://launchpad.net/ubuntu/+source/linux-firmware/20230323.gitbcdcfbcf-0ubuntu1.3/+build/26397687

reverted by:
$ sudo apt update && sudo apt upgrade -y

to
$ apt show linux-firmware
Package: linux-firmware
Version: 20230323.gitbcdcfbcf-0ubuntu1.4

$ lspci -v | grep "VGA"
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX] (rev cc) (prog-if 00 [VGA controller])

Linux 6.2.0-26-generic

with
$ grep /etc/default/grub -e GRUB_CMDLINE_LINUX_DEFAULT=
GRUB_CMDLINE_LINUX_DEFAULT=""

the last thing I see on screen when booting with '20230323.gitbcdcfbcf-0ubuntu1.4' and display port plugged into 7900xt is:
kernel: [ 2.358328] amdgpu 0000:03:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode

and then /var/log/syslog

kernel: [ 2.358328] amdgpu 0000:03:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
kernel: [ 2.358794] Console: switching to colour dummy device 80x25
kernel: [ 2.358819] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
kernel: [ 2.358851] amdgpu 0000:03:00.0: amdgpu: MEM ECC is not presented.
kernel: [ 2.358852] amdgpu 0000:03:00.0: amdgpu: SRAM ECC is not presented.
kernel: [ 2.358886] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
kernel: [ 2.358889] amdgpu 0000:03:00.0: amdgpu: VRAM: 20464M 0x0000008000000000 - 0x00000084FEFFFFFF (20464M used)
kernel: [ 2.358891] amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
kernel: [ 2.358899] [drm] Detected VRAM RAM=20464M, BAR=32768M
kernel: [ 2.358900] [drm] RAM width 320bits GDDR6
kernel: [ 2.358927] [drm] amdgpu: 20464M of VRAM memory ready
kernel: [ 2.358928] [drm] amdgpu: 15610M of GTT memory ready.
kernel: [ 2.358936] [drm] GART: num cpu pages 131072, num gpu pages 131072
kernel: [ 2.358989] [drm] PCIE GART of 512M enabled (table at 0x00000084FEB00000).
kernel: [ 2.359387] [drm] Loading DMUB firmware via PSP: version=0x07000A01
kernel: [ 2.359444] amdgpu 0000:03:00.0: amdgpu: CP RS64 enable
kernel: [ 2.359735] [drm] Found VCN firmware Version ENC: 1.9 DEC: 5 VEP: 0 Revision: 1
kernel: [ 2.359739] amdgpu 0000:03:00.0: amdgpu: Will use PSP to load VCN firmware
kernel: [ 2.359828] [drm] max_doorbell_slices=32767
kernel: [ 2.481183] [drm] reserve 0x1300000 from 0x84fc000000 for PSP TMR
kernel: [ 2.610923] amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
kernel: [ 2.610924] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
kernel: [ 2.610950] amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x00000037, smu fw if version = 0x00000034, smu fw program = 0, smu fw version = 0x004e4b00 (78.75.0)
kernel: [ 2.610952] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
kernel: [ 6.053214] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000006 SMN_C2PMSG_82:0x00000000
kernel: [ 6.053216] amdgpu 0000:03:00.0: amdgpu: Failed to enable requested dpm features!
kernel: [ 6.05321...

Read more...

Revision history for this message
Henrik Holst (henrik-holst2) wrote :

Tried with 6.2.0-27-generic from lunar-proposed and 1.4 still borked at the exact same place 6.2.0-26 did, only thing that fixed it was going back to 1.2

Revision history for this message
Henrik Holst (henrik-holst2) wrote :
Download full text (6.0 KiB)

And my rx7900xtx fails in the exact same way that the two others does above (this on 6.2.0-27):

aug 04 00:14:39 Sineya kernel: amdgpu 0000:0c:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000006 SMN_C2PMSG_82:0x00000000
aug 04 00:14:39 Sineya kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to enable requested dpm features!
aug 04 00:14:39 Sineya kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to setup smc hw!
aug 04 00:14:39 Sineya kernel: [drm:amdgpu_device_ip_init [amdgpu]] *ERROR* hw_init of IP block <smu> failed -62
aug 04 00:14:39 Sineya kernel: amdgpu 0000:0c:00.0: amdgpu: amdgpu_device_ip_init failed
aug 04 00:14:39 Sineya kernel: amdgpu 0000:0c:00.0: amdgpu: Fatal error during GPU init
aug 04 00:14:39 Sineya kernel: amdgpu 0000:0c:00.0: amdgpu: amdgpu: finishing device.
aug 04 00:14:39 Sineya kernel: ------------[ cut here ]------------
aug 04 00:14:39 Sineya kernel: WARNING: CPU: 3 PID: 214 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:600 amdgpu_irq_put+0x9f/0xb0 [amdgpu]
aug 04 00:14:39 Sineya kernel: Modules linked in: hid_logitech_hidpp hid_logitech_dj hid_steam hid_generic amdgpu(+) iommu_v2 drm_buddy gpu_sched i2c_algo_bit drm_ttm_helper usbhid ttm hid drm_display_helper cec rc_core mfd_aaeon crct10dif_pclmul drm_kms_helper crc32_pclmul asus_wmi polyval_clmulni polyval_generic syscopyarea ghash_clmulni>
aug 04 00:14:39 Sineya kernel: CPU: 3 PID: 214 Comm: systemd-udevd Not tainted 6.2.0-27-generic #28-Ubuntu
aug 04 00:14:39 Sineya kernel: Hardware name: System manufacturer System Product Name/PRIME X370-PRO, BIOS 6063 03/13/2023
aug 04 00:14:39 Sineya kernel: RIP: 0010:amdgpu_irq_put+0x9f/0xb0 [amdgpu]
aug 04 00:14:39 Sineya kernel: Code: 31 f6 31 ff c3 cc cc cc cc 44 89 e2 48 89 de 4c 89 f7 e8 94 fc ff ff 5b 41 5c 41 5d 41 5e 5d 31 d2 31 f6 31 ff c3 cc cc cc cc <0f> 0b b8 ea ff ff ff eb c3 b8 fe ff ff ff eb bc 90 90 90 90 90 90
aug 04 00:14:39 Sineya kernel: RSP: 0018:ffffa7f0c152f8e0 EFLAGS: 00010246
aug 04 00:14:39 Sineya kernel: RAX: 0000000000000000 RBX: ffff9aab2788bea8 RCX: 0000000000000000
aug 04 00:14:39 Sineya kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
aug 04 00:14:39 Sineya kernel: RBP: ffffa7f0c152f900 R08: 0000000000000000 R09: 0000000000000000
aug 04 00:14:39 Sineya kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
aug 04 00:14:39 Sineya kernel: R13: 0000000000000001 R14: ffff9aab27880000 R15: 0000000000000001
aug 04 00:14:39 Sineya kernel: FS: 00007f7988bb88c0(0000) GS:ffff9ab20eac0000(0000) knlGS:0000000000000000
aug 04 00:14:39 Sineya kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
aug 04 00:14:39 Sineya kernel: CR2: 000055a136471288 CR3: 000000010683c000 CR4: 0000000000750ee0
aug 04 00:14:39 Sineya kernel: PKRU: 55555554
aug 04 00:14:39 Sineya kernel: Call Trace:
aug 04 00:14:39 Sineya kernel: <TASK>
aug 04 00:14:39 Sineya kernel: amdgpu_fence_driver_hw_fini+0x55/0x110 [amdgpu]
aug 04 00:14:39 Sineya kernel: amdgpu_device_fini_hw+0xb3/0x240 [amdgpu]
aug 04 00:14:39 Sineya kernel: amdgpu_driver_unload_kms+0x4b/0x70 [amdgpu]
aug 04 00:14:39 Sineya kernel: amdgpu_driver_load_kms+0xf9/0x1c0 [amdgpu]
aug 04 00:14:39 Sineya ...

Read more...

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

So you tested with 1.3 too? Anyway, thanks for testing the proposed kernel.

Juerg Haefliger (juergh)
Changed in linux-firmware (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

turns out the required kernel commits are not in -27 but will be in the upcoming cycle

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-firmware (Ubuntu Jammy):
status: New → Confirmed
Changed in linux-firmware (Ubuntu Lunar):
status: New → Confirmed
Juerg Haefliger (juergh)
description: updated
description: updated
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Henrik, or anyone else affected,

Accepted linux-firmware into lunar-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/linux-firmware/20230323.gitbcdcfbcf-0ubuntu1.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-lunar to verification-done-lunar. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-lunar. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in linux-firmware (Ubuntu Lunar):
status: Confirmed → Fix Committed
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

mantic shouldn't be affected as it has 6.3

description: updated
Changed in linux-firmware (Ubuntu Jammy):
status: Confirmed → Fix Committed
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

Hello Henrik, or anyone else affected,

Accepted linux-firmware into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/linux-firmware/20220329.git681281e4-0ubuntu3.17 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Mark (selective-panic) wrote :

booted Kubuntu 23.04 6.2.0-26-generic with monitor connected to DisplayPort to 7900XT

installed with:
$ sudo dpkg -i linux-firmware_20230323.gitbcdcfbcf-0ubuntu1.5_all.deb

$ apt policy linux-firmware
linux-firmware:
  Installed: 20230323.gitbcdcfbcf-0ubuntu1.5
  Candidate: 20230323.gitbcdcfbcf-0ubuntu1.5

$ apt show linux-firmware
Package: linux-firmware
Version: 20230323.gitbcdcfbcf-0ubuntu1.5
Status: install ok installed
Priority: optional
Section: misc

performance looks ok

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

thanks for testing!

tags: added: verification-done-lunar
Revision history for this message
Mario Limonciello (superm1) wrote :

Ooph this is complicated; the revert is reverting not just navi31/navi33 but also Phoenix updates which can cause other problems for those systems.

These are PHX:
gc_11_0_1
gc_11_0_4

These are navi31:
gc_11_0_0

These are navi33:
gc_11_0_2

Revision history for this message
Mario Limonciello (superm1) wrote :

Can the kernel commits not be expedited instead to solve this?

Revision history for this message
Henrik Holst (henrik-holst2) wrote :

tested linux-firmware_20230323.gitbcdcfbcf-0ubuntu1.5 on 6.2.0-27-generic with my rx7900xtx and it works just as fine as the "original" linux-firmware_20230323.gitbcdcfbcf-0ubuntu1.2 , tried a few games and have stable and same FPS so everything is as stable and performant as before the 1.3 release.

Revision history for this message
Steve Langasek (vorlon) wrote :

Since this is a straight partial revert of the previous SRU and is critical path for the upcoming point release (which is why we cannot wait for kernel-side fixes), I've done a manual three-way diff of the contents to confirm that the revert has been done correctly:

$ dpkg-deb -R linux-firmware_20220329.git681281e4-0ubuntu3.16_all.deb 16
$ dpkg-deb -R linux-firmware_20220329.git681281e4-0ubuntu3.17_all.deb 17
$ dpkg-deb -R linux-firmware_20220329.git681281e4-0ubuntu3.14_all.deb 14

(14 is the last version that was published to jammy-updates)

$ diff -ur 16/lib 17/lib | grep -v amdgpu
$

No changes in this upload outside of the amdgpu directory.

$ diff -ur 17/lib 14/lib | grep amdgpu
Binary files 17/lib/firmware/amdgpu/dcn_3_1_4_dmcub.bin and 14/lib/firmware/amdgpu/dcn_3_1_4_dmcub.bin differ
Binary files 17/lib/firmware/amdgpu/yellow_carp_dmcub.bin and 14/lib/firmware/amdgpu/yellow_carp_dmcub.bin differ
$

Two changes to the amdgpu firmware files vs the 3.14 build; these are also present in 3.15 (so no change - not reverted), and not the target of the revert (the gc_11_* files).

I have also verified that these files are identical to those in the 1.5 build from lunar which passed verification.

$ dpkg-deb -R linux-firmware_20230323.gitbcdcfbcf-0ubuntu1.5_all.deb 1.5
$ diff -ur 17/lib 1.5/lib/ | grep -E 'dcn_3_1_4|yellow_carp_dmcub'
$

Under the circumstances I am going to treat this as sufficient verification of a fix for a regression-update bug and release the SRU.

tags: added: regression-update
Revision history for this message
Steve Langasek (vorlon) wrote : Update Released

The verification of the Stable Release Update for linux-firmware has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-firmware - 20230323.gitbcdcfbcf-0ubuntu1.5

---------------
linux-firmware (20230323.gitbcdcfbcf-0ubuntu1.5) lunar; urgency=medium

  * 20230323.gitbcdcfbcf-0ubuntu1.4 borks amdgpu (LP: #2029396)
    - Revert "amdgpu: Update GC 11.0.1 and 11.0.4"
    - Revert "amdgpu: update GC 11.0.4 firmware for amd.5.5 release"
    - Revert "amdgpu: update GC 11.0.1 firmware for amd.5.5 release"
    - Revert "amdgpu: update GC 11.0.2 firmware for amd.5.5 release"
    - Revert "amdgpu: update GC 11.0.0 firmware for amd.5.5 release"

 -- Juerg Haefliger <email address hidden> Fri, 04 Aug 2023 14:52:56 +0200

Changed in linux-firmware (Ubuntu Lunar):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-firmware - 20220329.git681281e4-0ubuntu3.17

---------------
linux-firmware (20220329.git681281e4-0ubuntu3.17) jammy; urgency=medium

  * 20230323.gitbcdcfbcf-0ubuntu1.4 borks amdgpu (LP: #2029396)
    - Revert "amdgpu: Update GC 11.0.1 and 11.0.4"
    - Revert "amdgpu: update GC 11.0.4 firmware for amd.5.5 release"
    - Revert "amdgpu: update GC 11.0.1 firmware for amd.5.5 release"
    - Revert "amdgpu: update GC 11.0.2 firmware for amd.5.5 release"
    - Revert "amdgpu: update GC 11.0.0 firmware for amd.5.5 release"

 -- Juerg Haefliger <email address hidden> Fri, 04 Aug 2023 15:01:18 +0200

Changed in linux-firmware (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Mario Limonciello (superm1) wrote (last edit ):

> Under the circumstances I am going to treat this as sufficient verification of a fix for a regression-update bug and release the SRU.
> which is why we cannot wait for kernel-side fixes

I understand the urgency but I need to express that there is something wrong process wise here. This rush has caused firmware introduced by bug 2027959 for Phoenix to be reverted as well, which HAS been tested against Lunar and OEM-6.1. I don't know what this means for the stability of Phoenix.

The original SRU bug #2024427 called out the kernel commits, there was a task there and they were marked fix committed. The bug description explicitly stated that the firmware and commits need to go together.

The commits are upstream, and in -stable since the end of May, they're not risky. I don't understand why they weren't prioritized.

This bug even has a comment "verified linux-firmware/lunar version 20230323.gitbcdcfbcf-0ubuntu1.4" which I strongly suspect wasn't validated against the right Lunar kernel (otherwise this bug would have occurred).

I hope this process can be improved in the future.

Revision history for this message
Steve Langasek (vorlon) wrote :

I agree that we should look to improve the process in the future. In the immediate term, I am not in a position to verify any of the possible combinations of firmware on hardware, so the "safe" options are either this partial revert of these amdgpu firmware blobs, or a full revert to the version of linux-firmware previously published in the -updates pocket. This firmware update was not published in lockstep with a kernel, so while it may regress support for some devices that were newly enabled with this firmware, it should not be possible for it to regress support for any devices that were already supported by linux-firmware 3.14 (jammy) or 1.2 (lunar).

Changed in linux-firmware (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

The kernel commits were sent too late for 2023.07.10 kernel cycle, they were committed to the tree earlier this week for 2023.08.07 cycle. And bug 2027959 description only mentions Navi 31/33.

Revision history for this message
Renjith Pananchikkal (renjith-pananchikkal) wrote :

The latest linux-firmware package has messed up AMD PHX based OEM laptops, while the previous version works fine.

Failing version: 20220329.git681281e4-0ubuntu3.17
Working version: 20220329.git681281e4-0ubuntu3.16

For more info, please refer comment 17 & 18 of https://bugs.launchpad.net/ubuntu/+source/linux-firmware/+bug/2027959 .

Changed in linux-firmware:
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.