external monitors flash white with every interrupt

Bug #2056445 reported by Brett Holman
34
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Unknown
linux (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

an interesting amdgpu issue:

I currently have two external monitors, and only one will display the desktop at a time - the other is completely white. The interesting thing is that with each interrupt that the kernel receives from keyboard / mouse, the screens will flip which one is visible. With my monitors positioned above and to the left of the laptop's built-in display, the flashing monitors feel akin to attending an electronic light show, or perhaps a display of fireworks - the monitors flip visibility with each keypress.

ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: linux-image-6.8.0-11-generic 6.8.0-11.11
ProcVersionSignature: Ubuntu 6.8.0-11.11-generic 6.8.0-rc4
Uname: Linux 6.8.0-11-generic x86_64
NonfreeKernelModules: zfs
ApportVersion: 2.28.0-0ubuntu1
Architecture: amd64
CRDA: N/A
CasperMD5CheckResult: unknown
CurrentDesktop: ubuntu:GNOME
Date: Thu Mar 7 05:23:23 2024
InstallationDate: Installed on 2024-02-02 (34 days ago)
InstallationMedia: Ubuntu 24.04 LTS "Noble Numbat" - Daily amd64 (20240123)
MachineType: Framework Laptop 13 (AMD Ryzen 7040Series)
ProcEnviron:
 LANG=en_US.UTF-8
 PATH=(custom, no user)
 SHELL=/bin/bash
 TERM=xterm-256color
 XDG_RUNTIME_DIR=<set>
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-6.8.0-11-generic root=UUID=8de457ce-abc8-47cf-a43f-c13183977aa5 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-6.8.0-11-generic N/A
 linux-backports-modules-6.8.0-11-generic N/A
 linux-firmware 20240202.git36777504-0ubuntu1
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 10/17/2023
dmi.bios.release: 3.3
dmi.bios.vendor: INSYDE Corp.
dmi.bios.version: 03.03
dmi.board.asset.tag: *
dmi.board.name: FRANMDCP07
dmi.board.vendor: Framework
dmi.board.version: A7
dmi.chassis.asset.tag: FRANDGCPA735010072
dmi.chassis.type: 10
dmi.chassis.vendor: Framework
dmi.chassis.version: A7
dmi.modalias: dmi:bvnINSYDECorp.:bvr03.03:bd10/17/2023:br3.3:svnFramework:pnLaptop13(AMDRyzen7040Series):pvrA7:rvnFramework:rnFRANMDCP07:rvrA7:cvnFramework:ct10:cvrA7:skuFRANDGCP07:
dmi.product.family: Laptop
dmi.product.name: Laptop 13 (AMD Ryzen 7040Series)
dmi.product.sku: FRANDGCP07
dmi.product.version: A7
dmi.sys.vendor: Framework

Revision history for this message
Brett Holman (holmanb) wrote :
Revision history for this message
Mario Limonciello (superm1) wrote :

For some unknown reason this only seems to be reported on some people's Framework 13 AMD laptops.

FYI - there are two workarounds for those currently encountering this issue.

* On the kernel command line: amdgpu.sg_display=0
* Change the BIOS settings from Auto to UMA_Game_Optimized.

Changed in linux (Ubuntu):
status: New → Triaged
Revision history for this message
Mario Limonciello (superm1) wrote :

I suggest updating to 6.8.0-20 though, this 6.8.0-11 has an old 6.8-RC snapshot and there are other bugs that got fixed later on in the 6.8-RC's.

Changed in linux:
status: Unknown → New
Revision history for this message
Brett Holman (holmanb) wrote :

Thanks for the update @superm1, I appreciate the help.

> I suggest updating to 6.8.0-20 though, this 6.8.0-11 has an old 6.8-RC snapshot and there are other bugs that got fixed later on in the 6.8-RC's.

Should this fix the flickering issue? I'm happy to test it out.

> FYI - there are two workarounds for those currently encountering this issue.
>
> * On the kernel command line: amdgpu.sg_display=0
> * Change the BIOS settings from Auto to UMA_Game_Optimized.

I'm curious about the implications of the two workarounds presented. I see that UMA_Game_Optimized will reserve some system ram for the GPU, which I don't have any issues with if there is some benefit. Will this use scatter/gather and therefore perform better as a result?

Revision history for this message
Mario Limonciello (superm1) wrote :

> Should this fix the flickering issue? I'm happy to test it out.

It will fix a colored flickering; but not the white screen (if that is what you're observing). It also fixes issues with suspend/resume that were present in earlier 6.8-rc kernels but fixed by the final version.

> I'm curious about the implications of the two workarounds presented. I see that UMA_Game_Optimized will reserve some system ram for the GPU, which I don't have any issues with if there is some benefit. Will this use scatter/gather and therefore perform better as a result?

If you use the BIOS option yes it will carve out more memory for VRAM use. S/G is still used, but GTT memory is less likely to be accessed when you have more VRAM. If you turn off S/G then GTT will NEVER be used.

The potential negative implication for the BIOS option is more RAM being used for GPU that can't be for anything else.
The potential negative implications of turning off S/G in the driver are that the driver may run out of memory in some situations that require a lot of VRAM such as multiple 4k monitor docking.

FWIW - I have a FW13 AMD and I've never seen this white screen problem myself. I don't know if it is tied to RAM vendor, RAM size, or a BIOS issue. My educated guess is it's actually a BIOS issue and it might be properly fixed when FW upgrades to the next BIOS.

Revision history for this message
Brett Holman (holmanb) wrote : Re: [Bug 2056445] Re: external monitors flash white with every interrupt

>> Should this fix the flickering issue? I'm happy to test it out.
>
> It will fix a colored flickering;

My flickering has mostly actually usually been white - not colored (though
I was seeing the colored flickering lots before I switched off of Wayland,
which crashed too often with multiple monitors to be usable).

I see that the kernel is available in -proposed via rmadison, but I'm
struggling to install it. I added the -proposed pocket for main to my
/etc/apt/sources.list and updated my cache, but an `apt upgrade` didn't try
to upgrade to the new kernel. Am I missing something to get the proposed
kernel?

> but not the white screen (if that is
> what you're observing). It also fixes issues with suspend/resume that
> were present in earlier 6.8-rc kernels but fixed by the final version.

I've seen the white screen and suspend/resume issues as well.

>> I'm curious about the implications of the two workarounds presented. I
see that UMA_Game_Optimized will reserve some system ram for the GPU,
which I don't have any issues with if there is some benefit. Will this
use scatter/gather and therefore perform better as a result?
>
> If you use the BIOS option yes it will carve out more memory for VRAM
use. S/G is still used, but GTT memory is less likely to be accessed
when you have more VRAM. If you turn off S/G then GTT will NEVER be
used.

It sounds like GTT access is what causes this issue, and that using the
bios option won't guarantee that it will never happen. Is that correct?

> The potential negative implication for the BIOS option is more RAM being
used for GPU that can't be for anything else.
> The potential negative implications of turning off S/G in the driver are
that the driver may run out of memory in some situations that require a lot
of VRAM such as multiple 4k monitor docking.

Good to know, thanks for the details!

> FWIW - I have a FW13 AMD and I've never seen this white screen problem
myself. I don't know if it is tied to RAM vendor, RAM size, or a BIOS
issue. My educated guess is it's actually a BIOS issue and it might be
properly fixed when FW upgrades to the next BIOS.

Interesting. If there's any extra info (or testing) I can provide to help
with debugging/fixing this issue, please let me know.

Revision history for this message
Mario Limonciello (superm1) wrote (last edit ):

> Am I missing something to get the proposed
kernel?

I guess the same reason it's not migrating is the reason you can't install it? Maybe some kernel team members can comment.

> It sounds like GTT access is what causes this issue, and that using the
bios option won't guarantee that it will never happen. Is that correct?

If you look at your journal from when this issue occurs you'll see an IOMMU page fault. This is the IOMMU reporting that there was an attempted access at memory that is out of the region that the GPU is allowed to access. This can happen when there is a userspace driver bug (IE mesa) or a misconfigured driver (IE amdgpu) or a faulty firmware (IE the BIOS).

There have been cases of a white screen caused by an incorrect memory addressing mask on systems with 64GB, but that's been fixed a long time now.

It is definitely an attempt at accessing outside of VRAM but I have a suspicion the address attempting to be addressed is ALSO outside of GTT.

> Interesting. If there's any extra info (or testing) I can provide to help
with debugging/fixing this issue, please let me know.

If you can add your details for your "specific" FW13 AMD to the upstream bug maybe others who are affected can help with building the pattern.
I mean the memory vendor, speed of memory, amount, channel configuration.

What you do to trigger it, etc.

Revision history for this message
Brett Holman (holmanb) wrote (last edit ):

> If you can add your details for your "specific" FW13 AMD to the upstream bug maybe others who are affected can help with building the pattern.
> I mean the memory vendor, speed of memory, amount, channel configuration.

this system has a single DIMM which was provided by framework: DDR5-5600 - 32GB (1 x 32GB)

memory vendor: A-DATA Technology
speed of memory: 5600 MT/s
amount: 32 GB
channel configuration: single
uefi version: 03.03

see below[2] for more details

> What you do to trigger it, etc.

Various symptoms appear triggered by different things. The suspend/resume issue (which sounds like it is fixed) causes either the completely white screen or the white screen that flickers. Most graphics-related issues that I see on this machine either happen immediately on first boot, or after a suspend/resume cycle, or don't appear to have a trigger.

On my most recent boot, my screen flickers in the upper left and lower right corners as a wide rectangular bar, this started immediately after passing the GDM login screen.

Another symptom[1] that I experience on this machine (that is probably unrelated) is that heavy firefox usage causes firefox latency to become unusably high - and CPU usage remains low. Restarting firefox temporarily resolves the issue. This behavior started to happen after I switched to X11.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1883077
[2] memory details

# dmidecode 3.5
Getting SMBIOS data from sysfs.
SMBIOS 3.5.0 present.

Handle 0x0012, DMI type 17, 92 bytes
Memory Device
        Array Handle: 0x0011
        Error Information Handle: 0x0015
        Total Width: Unknown
        Data Width: Unknown
        Size: No Module Installed
        Form Factor: Unknown
        Set: None
        Locator: DIMM 0
        Bank Locator: P0 CHANNEL A
        Type: Unknown
        Type Detail: Unknown

Handle 0x0013, DMI type 17, 92 bytes
Memory Device
        Array Handle: 0x0011
        Error Information Handle: 0x0016
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 32 GB
        Form Factor: SODIMM
        Set: None
        Locator: DIMM 0
        Bank Locator: P0 CHANNEL B
        Type: DDR5
        Type Detail: Synchronous Unbuffered (Unregistered)
        Speed: 5600 MT/s
        Manufacturer: A-DATA Technology
        Serial Number: 00301896
        Asset Tag: Not Specified
        Part Number: AD5S560032G-SFW
        Rank: 2
        Configured Memory Speed: 5600 MT/s
        Minimum Voltage: 1.1 V
        Maximum Voltage: 1.1 V
        Configured Voltage: 1.1 V
        Memory Technology: DRAM
        Memory Operating Mode Capability: Volatile memory
        Firmware Version: Unknown
        Module Manufacturer ID: Bank 5, Hex 0xCB
        Module Product ID: Unknown
        Memory Subsystem Controller Manufacturer ID: Unknown
        Memory Subsystem Controller Product ID: Unknown
        Non-Volatile Size: None
        Volatile Size: 32 GB
        Cache Size: None
        Logical Size: None

Revision history for this message
Brett Holman (holmanb) wrote :

After setting UMA_Game_Optimized, I can confirm that (after 30 minutes of usage), the system appears far more stable. Since X11 appeared stable I also switched back to Wayland which was previously unusable and this also appears to be working.

I'll also report back once I am able to test the new kernel.

Thanks again for the help Mario.

Revision history for this message
Mario Limonciello (superm1) wrote :

> this system has a single DIMM which was provided by framework: DDR5-5600 - 32GB (1 x 32GB)

My system is 8GB, single stick, same speed 5600 MT/s, same company (A-DATA).

> Restarting firefox temporarily resolves the issue. This behavior started to happen after I switched to X11.

You're the first I've ever heard of this. Hopefully it's not present in Wayland. There is very little testing or development that happens in X11.

> After setting UMA_Game_Optimized, I can confirm that (after 30 minutes of usage), the system appears far more stable.

That's good to hear the workarounds improve things for you too.

Revision history for this message
Mario Limonciello (superm1) wrote :

> I'll also report back once I am able to test the new kernel.

I had a try and I was able to install it today on Noble by pulling the deb packages for linux-image-unsigned-6.8.0-20-generic, linux-modules-6.8.0-20-generic, linux-modules-extra-6.8.0-20-generic from Launchpad manually.

Revision history for this message
Mario Limonciello (superm1) wrote :

I have a suspicion the root cause of the white screen could be fixed in BIOS 3.05.

https://community.frame.work/t/framework-laptop-13-ryzen-7040-bios-3-05-release-and-driver-bundle-beta/48276

Can you still reproduce it with no workarounds, 6.8.0-20 and the BIOS upgrade?

Revision history for this message
Mario Limonciello (superm1) wrote :

It was confirmed by many people this is fixed in the upgraded Framework BIOS 3.05.

Changed in linux (Ubuntu):
status: Triaged → Invalid
Changed in linux:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.