Chromium leaks tens of gigabytes of pixmaps (in the Xorg process) after using hardware accelerated video

Bug #2033433 reported by Andreas Hasenack
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
chromium-browser (Ubuntu)
Fix Released
Critical
Unassigned
xorg-server (Ubuntu)
Opinion
Undecided
Unassigned

Bug Description

https://crbug.com/1467689

---

chromium 118.0.5966.0 2604 latest/edge

Whenever I use chromium in a video session (meet, youtube) that used hardware acceleration, for a long time (like 20min or more), and then close it, the whole desktop freezes, and the laptop fan turns on. Logging in remotely I see that Xorg is at 100% CPU.

The rest of the system is operational, but nothing graphical is. Just the mouse cursor moves around.

SOMETIMES, if I wait a few minutes, Xorg resumes behaving normally, and I can use the desktop again, but most of the time I have to login remotely and kill -9 xorg (plain kill won't do it).

With wayland, this does not happen. This could very well be an Xorg bug, but filing here first for visibility.

$ lsgpu
card0 Intel Alderlake_p (Gen12) drm:/dev/dri/card0
└─renderD128

$ lscpu
Architecture: x86_64
  CPU op-mode(s): 32-bit, 64-bit
  Address sizes: 46 bits physical, 48 bits virtual
  Byte Order: Little Endian
CPU(s): 16
  On-line CPU(s) list: 0-15
Vendor ID: GenuineIntel
  Model name: 12th Gen Intel(R) Core(TM) i7-1270P
    CPU family: 6
    Model: 154
    Thread(s) per core: 2
    Core(s) per socket: 12
    Socket(s): 1
    Stepping: 3
    CPU(s) scaling MHz: 41%
    CPU max MHz: 4800,0000
    CPU min MHz: 400,0000
    BogoMIPS: 4992,00
    Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdp
                         e1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqd
                         q dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave a
                         vx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_
                         ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves
                          split_lock_detect avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi umip pku ospke waitpkg gfni
                         vaes vpclmulqdq tme rdpid movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr ibt flush_l1d arch_capabilities
Virtualization features:
  Virtualization: VT-x
Caches (sum of all):
  L1d: 448 KiB (12 instances)
  L1i: 640 KiB (12 instances)
  L2: 9 MiB (6 instances)
  L3: 18 MiB (1 instance)
NUMA:
  NUMA node(s): 1
  NUMA node0 CPU(s): 0-15
Vulnerabilities:
  Itlb multihit: Not affected
  L1tf: Not affected
  Mds: Not affected
  Meltdown: Not affected
  Mmio stale data: Not affected
  Retbleed: Not affected
  Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
  Srbds: Not affected
  Tsx async abort: Not affected

ProblemType: Bug
DistroRelease: Ubuntu 23.04
Package: chromium-browser (not installed)
ProcVersionSignature: Ubuntu 6.2.0-27.28-generic 6.2.15
Uname: Linux 6.2.0-27-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.26.1-0ubuntu2
Architecture: amd64
CasperMD5CheckResult: unknown
CurrentDesktop: ubuntu:GNOME
Date: Tue Aug 29 16:47:30 2023
RebootRequiredPkgs: Error: path contained symlinks.
SourcePackage: chromium-browser
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
tags: added: snap
description: updated
description: updated
Revision history for this message
Nathan Teodosio (nteodosio) wrote :

Duly reproduced. I don't have SSH set up here and couldn't get out with Ctrl+Alt+Backspace either, had to resort to Sysrq.

The journal shows a nouveau fault at the time of the crash.

Changed in chromium-browser (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Nathan Teodosio (nteodosio) wrote :

If you hit that again can you please also retrieve the log?

tags: added: xorg
tags: added: nouveau
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I have some straces of just before, and during, this "crash". I didn't see anything super obvious. The logs also didn't have anything out of the ordinary, but I'll capture it again probably today, after my 30min meet.

That meet is in 1h15min, if you want to let me know beforehand what you would like me to grab.

Revision history for this message
Nathan Teodosio (nteodosio) wrote : Re: [Bug 2033433] Re: Xorg 100% CPU and frozen desktop after closing chromium

The journal and

   >log 2>&1 snap run --strace='-o strace' chromium --enable-logging=stderr

would be very thorough if that runs fine.

Thanks for testing and have a good crash!

Revision history for this message
Andreas Hasenack (ahasenack) wrote : Re: Xorg 100% CPU and frozen desktop after closing chromium

> Thanks for testing and have a good crash!

:D :D

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

strace I started manually on the chromium pid just before closing the meet session and the app. The strace coupled with the snap run command would have been too large, gigabytes.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

log produced by the snap run command.

At some point during the call, around 12:16:55, I noticed the video became a bit sluggish and xorg was already using a lot of cpu, but I can't tell if it was like this since the start (the xorg cpu usage, I mean). So take this info with caution, as it might be misleading.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

And yes, I had to kill -9 xorg after closing chromium, because it was spinning at 100% cpu again, and the desktop was frozen except for the mouse cursor.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

And here is the journal log corresponding to the ~30min long meet session, culminating in the kill -9 Xorg

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

And this I'm 90% sure is the xorg log file corresponding to that session.

The pid at the top of the file, where it says the file was renamed (Xorg.pid-958859.log), matches what I kill-9'ed.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

In comment #10 there are some instances of "Atomic update failure" that go for a few minutes each. I don't know how to disable atomic KMS in Xorg. It's easy to toggle in Wayland but apparently this bug doesn't exist in Wayland.

And I don't think mentions of "nouveau" are relevant to Andreas.

affects: chromium-browser (Ubuntu) → xorg-server (Ubuntu)
tags: removed: nouveau
Revision history for this message
Daniel van Vugt (vanvugt) wrote (last edit ):

"Atomic update failure" may also be specific to the OLED panel that it looks like Andreas is using. I have one of those but have never run Xorg on it.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I suggest installing the Xorg debug symbols and then attaching gdb to it while it's using 100% CPU. Do that a few times and you should get a good idea of the stack trace.

A lazier way to do it would be to just kill Xorg with a fatal signal, upload the resulting crash file (ubuntu-bug /var/crash/...) and see if the robots have any luck providing a stack trace.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

> "Atomic update failure" may also be specific to the OLED panel that it
> looks like Andreas is using. I have one of those but have never run Xorg
> on it.

FWIW, I'm using two external 4k Dell monitors via usb-c, plus the laptop panel also at its max resolution (a bit lower then the monitors: 2880x1800 instead of 3840x2160), everything at 100% scaling.

But I have seen those "Atomic update failure" messages while at a sprint, where I only had my laptop and no external monitor. Back then, another symptom was the screen shifting a few pixels at random time, as if in an earthquake. I filed a kernel bug, and even found what looked like a patch we didn't have yet, but failed to pursue it as back home with the external monitor, that screen shifting problem didn't appear anymore. I"m trying to find that bug at the moment.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Found it, and I closed it because the screen flickering is gone:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2018448

But the "Atomic update" error was there, on pipe A if that makes any difference. I see it on A/B/C, very much like https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1806242

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I attached gdb to xorg while the bug was happening (100% cpu usage after closing chromium), and got the backtrace with symbols. I hope this helps, but maybe it missed the loop, I don't know.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Wow, that doesn't look like a loop unless Chromium had leaked an out-of-control number of pixmaps. Please use 'xrestop' to see what the pixmap count and other resource usage of Chromium is before it's closed. If that's the issue then we might also want to check to see if the leak is due to our hwacc patches.

Changed in chromium-browser (Ubuntu):
status: New → Incomplete
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

So I took some screenshots of `xrestop` while chromium was in the google meet.

There was an "unknown PID" listed, but given other characteristics of that line, I assume that it is related to chromium.

Here are some of the columns over time:

         Pxms Misc Pxm mem Total
12:09:28 46749 9 18815550K 18815550K
12:10:38 55508 9 20869237K 20869237K
12:13:06 71707 10 24679650K 24679650K
12:17:06 93990 11 29960587K 29960587K
12:30:17 157769 11 45126262K 45126262K
12:35:06 175712 2 49397850K 49397850K

The last one, 12:35:06, is just before I closed chromium. It stayed frozen like that, the xrestop app froze just like the rest of the desktop (I was running it over ssh, so I could still ctrl-c it, but it was frozen).

Attached is the last screenshot.

Changed in chromium-browser (Ubuntu):
status: Incomplete → Confirmed
status: Confirmed → New
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

That's tens of gigabytes of pixmap memory so there's a major leak.

Let's say this is definitely a bug in Chromium, but arguably a bug in Xorg for not responding while it frees the massive amount of leaked memory.

Changed in xorg-server (Ubuntu):
importance: High → Undecided
status: Confirmed → Opinion
Changed in chromium-browser (Ubuntu):
importance: Undecided → Critical
status: New → Confirmed
summary: - Xorg 100% CPU and frozen desktop after closing chromium
+ Chromium leaks tens of gigabytes of pixmaps (in the Xorg process)
summary: - Chromium leaks tens of gigabytes of pixmaps (in the Xorg process)
+ Chromium leaks tens of gigabytes of pixmaps (in the Xorg process) after
+ using hardware accelerated video
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Thanks for guiding the troubleshooting on this one, I learned a new tool ;) (xrestop)

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Is there a way to disable VAAPI in Chromium to see if that stops the leak?

Revision history for this message
Nathan Teodosio (nteodosio) wrote : Re: [Bug 2033433] Re: Chromium leaks tens of gigabytes of pixmaps (in the Xorg process) after using hardware accelerated video

Daniel, thanks a lot for guiding the debugging on this, and Andreas for
carrying the debugging out.

I thought incorrectly that this was already stated: The leak does not
happen with VAAPI disabled

--->
chromium
--disable-features=VaapiVideoDecoder,VaapiVideoEncoder,VaapiVideoDecodeLinuxGL
<---

(As such it does not occur in stable/candidate channels, that do not
have VAAPI merged in.)

We don't have much of a delta about hwacc with upstream right now, it's
mostly patches for Ozone (which doesn't get used in Xorg, correct me if
I'm wrong) and then the chromium.launcher that simply turns on some flags.

Patches:
https://git.launchpad.net/~chromium-team/chromium-browser/+git/snap-from-source/tree/build/chromium-patches/optimization?h=dev
Flags:
https://git.launchpad.net/~chromium-team/chromium-browser/+git/snap-from-source/tree/launcher/chromium.launcher?h=dev#n128

Revision history for this message
Daniel van Vugt (vanvugt) wrote (last edit ):

Oh! Intel and Google already know about it: https://crbug.com/1467689

description: updated
tags: added: kivu
Revision history for this message
Jianhui Dai (jianhuidai) wrote :

I try the self build Chromium 118.0.5966.0, and run below command for h264 hw decoding.
After 30min playback, the chromium is well.

```shell
./src/out/Default/chrome --ignore-gpu-blocklist --disable-gpu-driver-bug-workaround --use-fake-ui-for-media-stream --vmodule=*/ozone/*=1,*/wayland/*=1,*/vaapi/*=1,*/viz/*=1,*/media/*=1,*/shared_image/*=1 --enable-logging=stderr --v=0 --enable-features=VaapiVideoDecodeLinuxGL,VaapiVideoDecoder,VaapiVideoEncoder,UseChromeOSDirectVideoDecoder --disable-features= --enable-hardware-overlays="" --ozone-platform=x11 --use-gl=angle --use-angle=gl
```

Revision history for this message
Jianhui Dai (jianhuidai) wrote :

I captured the 'top' and `xrestop` as attachment.

The `xrestop` show quite a lot memory usage, but it is not aligned w/ `top`.
`top` shows no obvious memory leak.

Revision history for this message
Daniel van Vugt (vanvugt) wrote (last edit ):

Still, making Xorg think there's a massive pixmap leak is causing it to become unresponsive while it frees them all when Chromium exits.

Also, would 'top' show graphics memory allocations at all?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

The output from `top` was always well behaved while chromium was open, until it was closed, at which point xorg cpu's usage went through the roof.

The test case here is indeed watching "Pxms" in xrestop while chromium is displaying hw accelerated video.

Revision history for this message
Jianhui Dai (jianhuidai) wrote :

My understanding is 'top' is able to show graphics memory allocations at well.
There maybe also drm tools can show the graphics memory usage.

I will reproduce more on this issue.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

This bug is easy to reproduce, what we need now is a fix ;)

Revision history for this message
Jianhui Dai (jianhuidai) wrote :

A fixing was prepared and will updated on chromium issue.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Nathan Teodosio (nteodosio) wrote :

I'm committing this to edge only as there are reports of H264 failure in the upstream tracker. Let's see if we can reproduce it.

Changed in chromium-browser (Ubuntu):
status: Confirmed → Fix Committed
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Sounds coincidental judging by the simplicity of the patch.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Please ping here when a new chromium snap is available in latest/edge.

Revision history for this message
Nathan Teodosio (nteodosio) wrote :

I will, but just to update expectations: Each revision of Chromium in beta and edge need to be manually reviewed because of a new plug[1].

So expect some a couple of days of delay. Or install it manually from [2].

[1] https://forum.snapcraft.io/t/request-for-personal-files-interface-in-chromium-for-local-share-applications/36629/14
[2] https://launchpad.net/~chromium-team/+snap/chromium-snap-from-source-dev/+build/2240399

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I used that snap in a few long calls already, and did not experience the Pxms leak. Closing the browser after such calls also did not leave xorg at 100% cpu like before.

Haven't checked x264 playback yet, but as Daniel said, I doubt it would regress because of this patch.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Chromium played a local video just fine, and that video was recognized like this by mpv:

 (+) Video --vid=1 (*) (h264 1920x1080 60.000fps)

Revision history for this message
Nathan Teodosio (nteodosio) wrote :

Great, thanks for your continued testing on this bug report!

I'm pushing it to beta as well.

Revision history for this message
Nathan Teodosio (nteodosio) wrote :

This is now released to beta and edge.

And thank you, Jianhui, for contributing the fix!

Changed in chromium-browser (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.