Screen freeze and high CPU when a second monitor of different scaling factor is attached (but only if DING is active)

Bug #1976204 reported by Andy Chi
40
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mutter
New
Unknown
OEM Priority Project
Fix Released
Critical
Andy Chi
gnome-shell-extension-desktop-icons-ng (Ubuntu)
Fix Released
High
Daniel van Vugt
Jammy
Fix Released
High
Sergio Costas
mutter (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

[Summary]
The system will run into freeze if I connect the external monitor through DP/HDMI with dongle DA310 or Docking WD19TB

Per further check, it will freeze only if the built-in and external display has different scale factors.

[Steps to reproduce]
1. Attact DA310 or WD19TB to DUT
2. Cold Boot into OS
3. Plug the DP/HDMI cable to DA310 or WD19TB
4. Check the external monitor has signal
5. Click keyboard or move mouse to check system can work well

[Expected result]
System won't be freeze with external monitor.

[Actual result]
Screen will freeze, but ssh can still reach out DUT. Keyboard and mouse
don't work during the screen freeze.
It can be recovered after unplug the external monitor sometime.

[gnome-shell stack trace]
 五 27 22:12:31 ubuntu-XPS-9320 gnome-shell[1998]: Window manager warning: Overwriting existing binding of keysym 37 with keysym 37 (keycode 10).
 五 27 22:12:31 ubuntu-XPS-9320 gnome-shell[1998]: Window manager warning: Overwriting existing binding of keysym 35 with keysym 35 (keycode e).
 五 27 22:12:31 ubuntu-XPS-9320 gnome-shell[1998]: Window manager warning: Overwriting existing binding of keysym 33 with keysym 33 (keycode c).
 五 27 22:12:31 ubuntu-XPS-9320 gnome-shell[1998]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed
 五 27 22:12:55 ubuntu-XPS-9320 sudo[2916]: ubuntu : TTY=pts/2 ; PWD=/home/ubuntu ; USER=root ; COMMAND=/usr/bin/dmesg
 五 27 22:12:55 ubuntu-XPS-9320 sudo[2916]: pam_unix(sudo:session): session opened for user root(uid=0) by ubuntu(uid=1001)
 五 27 22:12:55 ubuntu-XPS-9320 sudo[2916]: pam_unix(sudo:session): session closed for user root
 五 27 22:12:56 ubuntu-XPS-9320 gnome-shell[1998]: Attempting to call back into JSAPI during the sweeping phase of GC. This is most likely caused by not destroying a Clutter actor or Gtk+ widget with ::destroy signals connected, but can also be caused by using the destroy(), dispose(), or remove() vfuncs. Because it would crash the application, it has been blocked and the JS callback not invoked.
 五 27 22:12:56 ubuntu-XPS-9320 gnome-shell[1998]: == Stack trace for context 0x56156f8f04a0 ==
 五 27 22:12:56 ubuntu-XPS-9320 gnome-shell[1998]: == Stack trace for context 0x56156f8f04a0 ==
 五 27 22:12:56 ubuntu-XPS-9320 gnome-shell[1998]: The offending signal was window-added on MetaWorkspace 0x56156f916640.
 五 27 22:12:56 ubuntu-XPS-9320 gnome-shell[1998]: Attempting to call back into JSAPI during the sweeping phase of GC. This is most likely caused by not destroying a Clutter actor or Gtk+ widget with ::destroy signals connected, but can also be caused by using the destroy(), dispose(), or remove() vfuncs. Because it would crash the application, it has been blocked and the JS callback not invoked.
 五 27 22:12:56 ubuntu-XPS-9320 gnome-shell[1998]: The offending signal was window-added on MetaWorkspace 0x56156f916640.
 五 27 22:12:56 ubuntu-XPS-9320 gnome-shell[1998]: Attempting to call back into JSAPI during the sweeping phase of GC. This is most likely caused by not destroying a Clutter actor or Gtk+ widget with ::destroy signals connected, but can also be caused by using the destroy(), dispose(), or remove() vfuncs. Because it would crash the application, it has been blocked and the JS callback not invoked.

This system is using OLED (3456x2160) panel, and I can't reproduce this issue on FHD panel.

[What could go wrong]
This 2 line bugfix maximizes the desktop icons window avoiding an infinite loop. This is a workaround for an apparent bug in mutter but fixing mutter is much more complicated.

The Desktop Icons NG GNOME Shell extension is enabled by default on Ubuntu 22.04 LTS so a bug here could make desktops unusable.

---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu82.1
Architecture: amd64
CasperMD5CheckResult: pass
CurrentDesktop: ubuntu:GNOME
DisplayManager: gdm3
DistroRelease: Ubuntu 22.04
InstallationDate: Installed on 2022-03-17 (75 days ago)
InstallationMedia: Ubuntu 22.04 LTS "Jammy Jellyfish" - Alpha amd64 (20220313)
Package: gnome-shell 42.1-0ubuntu0.1
PackageArchitecture: amd64
ProcVersionSignature: Ubuntu 5.15.0-35.36-generic 5.15.35
RelatedPackageVersions: mutter-common 42.0-3ubuntu2
Tags: jammy wayland-session
Uname: Linux 5.15.0-35-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin lxd plugdev sambashare sudo
_MarkForUpload: True

Revision history for this message
Andy Chi (andch) wrote :
Andy Chi (andch)
tags: added: oem-priority originate-from-1967530 somerville
Changed in oem-priority:
status: New → Confirmed
importance: Undecided → Critical
assignee: nobody → Andy Chi (andch)
tags: added: jellyfish-edge-staging
Revision history for this message
Andy Chi (andch) wrote :
Revision history for this message
Andy Chi (andch) wrote : Dependencies.txt

apport information

tags: added: apport-collected jammy wayland-session
description: updated
Revision history for this message
Andy Chi (andch) wrote : GsettingsChanges.txt

apport information

Revision history for this message
Andy Chi (andch) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Andy Chi (andch) wrote : ProcEnviron.txt

apport information

Revision history for this message
Andy Chi (andch) wrote : ShellJournal.txt

apport information

Revision history for this message
Andy Chi (andch) wrote : Re: system freeze and gnome-shell reports multiple stack trace

If I remove gnome-shell-extension-desktop-icons-ng, I can't reproduce this issue. Also I can't reproduce this issue under x-session.

Revision history for this message
Marco Trevisan (Treviño) (3v1n0) wrote :

It would be relevant to understand where gnome-shell freezes.

So please once you've installed the debug symbols of gnome-shell, mutter and glib do:

 sudo gdb -p $(pidof /usr/bin/gnome-shell) /usr/bin/gnome-shell

Then there Ctrl+C and `bt`.

Also do:

`call (void) gjs_dumpstack()`

And you should find where the shell is hanging on JS side reading the shell journal.

Revision history for this message
Andy Chi (andch) wrote :

Hi @3v1n0,

[steps]
1. Install gnome-shell-dbgsym_42.1-0ubuntu0.1_amd64.ddeb, libglib2.0-dev-bin-dbgsym_2.72.1-1_amd64.ddeb, libmutter-test-10-dbgsym_42.1-0ubuntu1_amd64.ddeb, libglib2.0-0-dbgsym_2.72.1-1_amd64.ddeb, libglib2.0-tests-dbgsym_2.72.1-1_amd64.ddeb, mutter-10-tests-dbgsym_42.1-0ubuntu1_amd64.ddeb, libglib2.0-bin-dbgsym_2.72.1-1_amd64.ddeb, libmutter-10-0-dbgsym_42.1-0ubuntu1_amd64.ddeb and mutter-dbgsym_42.1-0ubuntu1_amd64.ddeb
2. Plug external monitor
3. Execute `sudo gdb -p $(pidof /usr/bin/gnome-shell) /usr/bin/gnome-shell`
4. Ctrl+C and `bt`
5. `call (void) gjs_dumpstack()`

[journal log]
六 06 16:23:27 ubuntu-XPS-13-9310 gnome-shell[1071]: The offending callback was SourceFunc().
 六 06 16:23:27 ubuntu-XPS-13-9310 gnome-shell[1071]: Attempting to run a JS callback during garbage collection. This is most likely caused by destroying a Clutter a
ctor or GTK widget with ::destroy signal connected, or using the destroy(), dispose(), or remove() vfuncs. Because it would crash the application, it has been blocked
.

Revision history for this message
Marco Trevisan (Treviño) (3v1n0) wrote :

So, attaching with gdb should happen when it's hanging, assuming that's the case I still don't see the result for `bt full` (or better `thread apply all bt full`), so that we can try to figure out where the process is hanging.

As per the JS stack the one I see in logs is:

六 06 16:11:11 ubuntu-XPS-13-9310 gnome-shell[1071]: == Stack trace for context 0x5557c18144b0 ==
 六 06 16:11:11 ubuntu-XPS-13-9310 gnome-shell[1071]: #0 7fffd9616ef0 I resource:///org/gnome/shell/ui/workspace.js:820 (e45d6bfdbf0 @ 274)
 六 06 16:11:11 ubuntu-XPS-13-9310 gnome-shell[1071]: #1 7fffd9616f80 I resource:///org/gnome/shell/ui/workspace.js:1409 (47907707d80 @ 264)
 六 06 16:11:11 ubuntu-XPS-13-9310 gnome-shell[1071]: #2 7fffd9616ff0 I resource:///org/gnome/shell/ui/workspace.js:1274 (47907707830 @ 465)
 六 06 16:11:11 ubuntu-XPS-13-9310 gnome-shell[1071]: #3 7fffd9617030 I resource:///org/gnome/shell/ui/workspace.js:1298 (479077079c0 @ 45)
 六 06 16:11:11 ubuntu-XPS-13-9310 gnome-shell[1071]: #4 7fffd9617080 I self-hosted:1181 (3a3c201b0a10 @ 454)

Revision history for this message
Andy Chi (andch) wrote :

I use `thread apply all bt full` to capture the gdb debug message and use `journalctl -b 0 /usr/bin/gnome-shell` to capture journal log.

Revision history for this message
Marco Trevisan (Treviño) (3v1n0) wrote (last edit ):

Some updates here:

The shell isn't really hanging... The fact is that the input/output GLib Sources that the shell
adds to the main context are starving and never processed because we add thousands of timeouts and `Meta.Later` sources as per a `window-removed` signal that is triggered thousand times (source is actually `[gnome-shell] this._queueCheckWorkspaces`).

This leads to CPU being 100% spending time in processing such sources (https://i.imgur.com/Dlt5dFy.png) thousands of GSources being added: https://i.imgur.com/QUuuZA1.png

And these are added via https://usercontent.irccloud-cdn.com/file/V4jfRY94/with%20laters%20added%20at that sends the window-removed signal (https://i.imgur.com/PNBn6Cv.png).

Now, it seems possible that the fixes part of 42.2 in https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/2413 fix this case.

But we still need to test that.

Revision history for this message
Kai-Chuan Hsieh (kchsieh) wrote :

I add commit mentioned in #13 to build mutter packages.
https://launchpad.net/~kchsieh/+archive/ubuntu/verification/

It doesn't help the issue.

Andy Chi (andch)
description: updated
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: System freeze

Since Javascript stack traces are hardly ever related to freezes, and particularly if there are multiple of them, I've removed mention of the stack traces.

summary: - system freeze and gnome-shell reports multiple stack trace
+ System freeze
Revision history for this message
Daniel van Vugt (vanvugt) wrote (last edit ):

Since you mention DA310 and WD19TB, I am reminded of bug 1970495.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Also the flood of log messages seen in comment #2 is tracked in bug 1908429.

Revision history for this message
Andy Chi (andch) wrote :

Sorry, connect type-c monitor directly without docking can also reproduce this issue.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

If your log still gets flooded with messages then there's a good chance that's the reason why gnome-shell is not responding. Only the "stack trace" info is usually irrelevant.

If you get a freeze without a flood of log messages then we should put mutter in debug mode via /etc/environment:

  MUTTER_DEBUG=kms

and then see what messages get logged at the time of the freeze.

Revision history for this message
Andy Chi (andch) wrote : Re: Screen freeze

Change the title to "Screen freeze" to avoid any confusion of "system hang" or "system freeze".

Sure, I'll add the debug in the /etc/environment. Thanks.

summary: - System freeze
+ Screen freeze
Revision history for this message
Andy Chi (andch) wrote :

Set the DUT back to X, plug external 4K monitor. I won't see the screen freeze.

[DUT]
XPS 9310

[Monitor]
ASUS PA279CV

[kernel]
5.15.0-35-generic

[BIOS]
3.4.0

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

If and when the freeze happens again, please:

1. Check to see if gnome-shell's CPU usage is high or low.

2. Turn the freeze into a crash report so we can analyse the thread stack traces: kill -ABRT PID

3. Report the crash formally, like in https://wiki.ubuntu.com/Bugs/Responses#Missing_a_crash_report_or_having_a_.crash_attachment

4. Wait for the bots to process it.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thanks for creating bug 1978764. The freeze does not appear to be one that is visible in stack traces so my next theory is a KMS/DRM screen update failure. To investigate that please try these separately:

 * Disable atomic KMS in /etc/environment:
   MUTTER_DEBUG_ENABLE_ATOMIC_KMS=0
   and reboot.

 * Follow the instructions in comment #19 so we can see all the interaction with the kernel DRM functions at the time of the freeze.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Per comment #18 please also avoid using a dock in testing. We would like to minimize the number of variables.

Revision history for this message
Andy Chi (andch) wrote :

@vanvugt,
Sure, but we don't have type-c monitor in Lab. I'm using a type-c to HDMI dongle.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I think a dongle is fine. I just wanted to avoid getting stuck on unrelated bugs that docking stations might trigger.

Revision history for this message
Andy Chi (andch) wrote :

Ubuntu-bug xxx.crash
Bug #1978764

Revision history for this message
Andy Chi (andch) wrote :

Follow comment #19 to enable MUTTER_DEBUG=kms and get the sos report

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I can't seem to find the right log file in that tarball.

Please just use the journalctl command to generate a log file on the system from the period of time when MUTTER_DEBUG=kms was set and when the system was frozen. Ideally please tell us the timestamp at which it became frozen.

Revision history for this message
Andy Chi (andch) wrote :

timestamp of freeze time:
16:27:00 gnome-shell[1880]KMS: [atomic] Setting plane 31 (/dev/dri/card0) property 'rotation' (33) to 1

Unplug external monitor and screen does not freeze:
16:27:23 gnome-shell[1880]KMS: [atomic] Page flip callback for CRTC (98, /dev/dri/card0), data: 0x559af8d398890

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

This looks like the bug:

 六 16 16:27:00 ubuntu-XPS-13-9310 gnome-shell[1880]: KMS: Swap buffers: 2 frames pending (triple-buffering)

     (one new frame is rendered while the previous one hasn't finished yet. This is normal.)

 六 16 16:27:00 ubuntu-XPS-13-9310 gnome-shell[1880]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed

     (don't know if this is related but other people have reported bugs when this message occurs)

 六 16 16:27:00 ubuntu-XPS-13-9310 gnome-shell[1880]: KMS: [atomic] Page flip callback for CRTC (167, /dev/dri/card0), data: 0x559afbe6f410
 六 16 16:27:00 ubuntu-XPS-13-9310 gnome-shell[1880]: KMS: [atomic] Page flip callback for CRTC (98, /dev/dri/card0), data: 0x559af83bad60

     (both monitors completed their oldest frames and *should* be told to start the most recent frame now. Instead nothing happens for 21 seconds.)

 六 16 16:27:21 ubuntu-XPS-13-9310 gnome-shell[1880]: KMS: Updating device state for /dev/dri/card0

     (everything wakes up - the external monitor is removed)

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Please try adding MUTTER_DEBUG_DISABLE_TRIPLE_BUFFERING=1 to /etc/environment and tell us if that avoids the bug.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I think either:

  * "Page flip callback for CRTC" is following a path that never reaches try_post_latest_swap(); or

  * meta_monitor_manager_get_power_save_mode() is returning something other than ON while the external monitor is plugged in, causing try_post_latest_swap() to defer the next frame forever. Is the problem with only one specific model of monitor?

In both cases setting MUTTER_DEBUG_DISABLE_TRIPLE_BUFFERING=1 would probably avoid the bug.

Revision history for this message
Andy Chi (andch) wrote :

Tried adding MUTTER_DEBUG_DISABLE_TRIPLE_BUFFERING=1 and screen freeze after plugging external monitor. I test with 2 different 4k monitor, one is ASUS and the other one is Benq. Both can cause this issue.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Presumably you're still using a type-C to HDMI dongle. Does the same dongle work on other laptops with jammy?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I'm also wondering why the bug is specific to the OLED models. We patched the kernel specifically to support those systems in:

  * Screen sometimes can't update [Failed to post KMS update: CRTC property
    (GAMMA_LUT) not found] (LP: #1967274)
    - drm/i915/xelpd: Enable Pipe color support for D13 platform
    - drm/i915: Use unlocked register accesses for LUT loads
    - drm/i915/xelpd: Enable Pipe Degamma
    - drm/i915/xelpd: Add Pipe Color Lut caps to platform config

But then we also fixed mutter so the kernel fix isn't strictly needed for testing other than to support Night Light. I'm wondering if a mainline kernel behaves any differently?

Revision history for this message
Andy Chi (andch) wrote :

If eDP is 4k or OLED (3456x2160), screen will freeze after plugging 4k external monitor.
I switched to type-c to DP dongle, and the dongle works with other laptop with FHD eDP.

Revision history for this message
Kai-Chuan Hsieh (kchsieh) wrote (last edit ):

@vanvugt

Hello,

We've tried mainline and drm-tip kernel, the result is the same. https://gitlab.freedesktop.org/drm/intel/-/issues/6101

I found that disable gnome-shell extension DING can omit the issue. However, we think DING is the one that trigger the bug but not the root cause. It can be triggered by ./ding.js -E -D 0:0:3456:2160:2:0:0:0:0:0 -D 3456:0:3840:2160:1:0:0:0:0:0 when DING is disabled and there is a 4K external monitor connected.

We've tried the eDP FHD sku but can't reproduce it, then we assume that it requires to update two big windows to trigger the issue.

Thanks,

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Two new pieces of information in comment #38:

 * The bug may not happen if DING is disabled - please confirm this!

 * The bug may not happen if MST is disabled in the kernel - please confirm this!

Revision history for this message
Kai-Chuan Hsieh (kchsieh) wrote :

reply #39

The bug is not happened if DING is disabled, confirmed.

The bug can still be reproduced even MST is disabled, confirmed.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

That's interesting. If triple buffering is not the cause of what we see in comment #31 then something higher level like DING definitely could be.

summary: - Screen freeze
+ Screen freeze if a 4K monitor is added to a 4K laptop while DING is
+ active
Changed in gnome-shell (Ubuntu):
importance: Undecided → High
Changed in mutter (Ubuntu):
importance: Undecided → High
Changed in gnome-shell-extension-desktop-icons-ng (Ubuntu):
importance: Undecided → High
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: Screen freeze if a 4K monitor is added to a 4K laptop while DING is active

I've just finished testing 2x4K on i7-9750H with jammy and it works perfectly. All default settings with DING enabled. I tested both HDMI and USB-C.

So something about this bug seems to be hardware-specific as you keep mentioning. It would be really annoying if an OLED XPS was the minimum requirement to reproduce the bug.

Revision history for this message
Andy Chi (andch) wrote :

All the 4k eDP laptops will have this issue, I tried on HP ZBook (which includes nv card) and switch to wayland. Plug external 4k monitor can reproduce too.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Good news: I can reproduce the bug now using an HP Skylake laptop with built-in 4K panel. Everything freezes if I add an external 4K display.

The freeze does NOT occur if I have DING disabled.

If I use both monitors without DING, then only enabling DING makes the freeze happen again.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The bug doesn't happen if I build mutter/gnome-shell from git. The bug still doesn't happen if I build the Ubuntu 42.1 tags (jammy) from salsa, with all Ubuntu patches.

Using the jammy release binaries the bug still occurs consistently. I am not getting any results from gjs_dumpstack anymore (like above), but earlier in the day I did get:

  == Stack trace for context 0x56139f48b4a0 ==
  gnome-shell[10091]: #0 7ffc53926a00 I resource:///org/gnome/shell/ui/windowManager.js:314 (2b03e27293d0 @ 78)
  gnome-shell[10091]: #1 7ffc53926a50 I self-hosted:1181 (2e95f68b0a10 @ 454)

So actually that's the same as Marco mentioned in comment #13.

Revision history for this message
jeremyszu (os369510) wrote :

Daniel,

Thus, does the PPA from comment#14 can fix your case?
if it doesn't, then probably there are two issues or other patch(s) are needed.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

mutter!2413 is in version 42.2 which I am using already, and the bug is still present, confirming comment #14.

Changed in gnome-shell (Ubuntu):
assignee: nobody → Daniel van Vugt (vanvugt)
Changed in mutter (Ubuntu):
assignee: nobody → Daniel van Vugt (vanvugt)
Changed in gnome-shell (Ubuntu):
status: New → In Progress
Changed in mutter (Ubuntu):
status: New → In Progress
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Small progress trying to trace the root cause today:

I don't yet see gnome-shell doing anything wrong. It is reacting to an infinite flood of workspace 'window-removed' signals. That signal can only come from one function in mutter, and during the infinite flood I see 'on_all_workspaces' is toggling between 0 and 1:

(gdb) bt
#0 meta_workspace_remove_window
    (workspace=0x55a010a2d1d0, window=0x55a01135ce20)
    at ../src/core/workspace.c:421
#1 0x00007fb5a3d5c40c in set_workspace_state
    (window=0x55a01135ce20, on_all_workspaces=0, workspace=0x55a00eac8ea0)
    at ../src/core/window.c:4635
#2 0x00007fb5a3d5f9b9 in meta_window_update_monitor
    (window=window@entry=0x55a01135ce20, flags=flags@entry=META_WINDOW_UPDATE_MONITOR_FLAGS_NONE) at ../src/core/window.c:3746
#3 0x00007fb5a3d5fcc6 in meta_window_move_resize_internal
    (window=0x55a01135ce20, flags=(META_MOVE_RESIZE_MOVE_ACTION | META_MOVE_RESIZE_RESIZE_ACTION), gravity=META_GRAVITY_NORTH_WEST, frame_rect=...)
    at ../src/core/window.c:3945
#4 0x00007fb5a49d3f60 in g_list_foreach ()
    at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5 0x00007fb5a3d32d39 in move_resize
    (display=<optimised out>, windows=0x55a00eb224e0 = {...})
    at ../src/core/display.c:4044
#6 0x00007fb5a3d32c7e in window_queue_run_later_func
    (user_data=<optimised out>) at ../src/core/display.c:4070
#7 0x00007fb5a3d24ff3 in invoke_later_idle (data=0x55a011660c00)
    at ../src/compositor/meta-later.c:199

I'm not saying gnome-shell isn't to blame, just that I've only worked back to mutter so far.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

There is a corresponding number of meta_workspace_add_window() calls, also from set_workspace_state().

Revision history for this message
Daniel van Vugt (vanvugt) wrote (last edit ):

More progress:

The cause is meta_window_wayland_update_main_monitor() constantly changing which monitor the window (presumably DING because there aren't any others) prefers. The toggle is happening on the last line of meta_window_wayland_update_main_monitor() and if you skip that line then the bug goes away.

The next step will be to figure out why mutter is switching the DING window between monitors constantly.

Revision history for this message
Andy Chi (andch) wrote :

Tried with the newest mutter (42.2-0ubuntu1) and gnome-shell (42.2-0ubuntu0.2), the screen freeze after I plug external monitor.

Revision history for this message
Marco Trevisan (Treviño) (3v1n0) wrote :

Oh, I see you're taking the bug.

I should receive an affected laptop in few days, so if you want I can continue with this. As debugging via SSH wasn't the best thing.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I also have hardware that exhibits the bug. That's why I started digging into it this week. No guarantees I will have a solution any time soon, so don't assume I've totally taken over. I just assigned it to myself because Seb thought nobody was working on it when I already was.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I can reproduce the bug with a 1080p laptop + 1080p monitor now. The trick is to configure the laptop screen with a different scaling factor to whatever the external monitor will get by default. When the two screens have the same resolution but different scales, windows will jump between the monitors in an infinite loop.

So the bug isn't caused by 4K, just that 4K systems are more likely to have differing monitor scaling factors that trigger the bug.

Once again, removing the last line of meta_window_wayland_update_main_monitor() fixes the issue but we need to find a nicer fix.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The same bug can occur with app windows if the scale is high enough to make them larger than the monitor. So DING really isn't to blame - it's just a likely trigger given the size of the window.

no longer affects: gnome-shell-extension-desktop-icons-ng (Ubuntu)
summary: - Screen freeze if a 4K monitor is added to a 4K laptop while DING is
- active
+ Screen freeze and high CPU when a second monitor of different scaling
+ factor is attached (but only if DING is active)
Changed in gnome-shell-extension-desktop-icons-ng (Ubuntu):
status: New → Opinion
assignee: nobody → Daniel van Vugt (vanvugt)
Revision history for this message
Daniel van Vugt (vanvugt) wrote (last edit ):

I have a workaround/fix for DING that works. But before I propose that I'm going to see if I can fix mutter still. It seems the problem is more fundamentally in mutter, though they are fighting.

Roughly the problem is:

1. DING creates a fixed size window the size of the monitor. But it always starts on the primary monitor because the extension hasn't noticed it yet so hasn't moved it to the correct monitor.
2. Mutter moves the window down below the panel because on startup it doesn't have any special window type just yet (the extension hasn't seen its own window yet).
3. Now the window extends below the bottom of the monitor.
4. Mutter moves the window right to the next monitor because there's no panel there and it can fit fully on screen.
5. Mutter mistakenly delays in logical window resizing and halves the size of the window to match the OLD monitor scale it is no longer on.
6. Mutter decides this new smaller window is now so small it should move back to the old monitor.
7. Mutter moves the window down below the panel to fit in the work area. Everything fits.
8. Mutter resizes the window to match the scale of the OLD monitor it is no longer on.
9. Goto 3.

The infinite loop in mutter prevents the extension from completing proper startup. It never calls hide_from_window_list() that would allow us to use proper placement rules.

Changed in gnome-shell-extension-desktop-icons-ng (Ubuntu):
status: Opinion → In Progress
importance: Undecided → High
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

DING fix proposed:
https://gitlab.com/rastersoft/desktop-icons-ng/-/merge_requests/356

I will keep working on a mutter fix, but we're no longer blocked waiting for it. Regular app windows can get stuck in the same kind of infinite loop so we should figure out a test case in that style before proposing anything to mutter.

no longer affects: gnome-shell (Ubuntu)
Revision history for this message
Andy Chi (andch) wrote :

Based on comment#57, I built a test deb for this issue. If anyone wants to test the patched gnome-shell-extension-desktop-icons-ng, please refer to https://launchpad.net/~andch/+archive/ubuntu/experimental-package.

@vanvugt, thanks a lot!

Revision history for this message
Zorro Zhang (zorro-zhang) wrote :

Tested #58, issue couldn't be reproduced on my Tributo UHD panel + 4K external monitor.
Image: OEM x19
BIOS: 1.4.0
Docking: WD19TB

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

General steps to reproduce the bug on any jammy system:

0. Find a single display machine that you can attach a second monitor to.

1. Disconnect all secondary monitors so you only have one screen.

2. Log into a Wayland session.

3. Open the Extensions app and disable Desktop Icons NG (DING).

4. Plug in the second monitor.

5. Open Settings and set both monitors to the SAME resolution with Fractional Scaling OFF. Set the primary display scale to 200%, and the secondary display scale to 100%. Click Apply, and Keep Changes.

6. If the Settings window is flickering between both monitors then that's an early sign of the bug happening. Try and move the window and it will then stop flickering.

7. In the Extensions app, enable Desktop Icons NG (DING).

Expected: Screen does not freeze.
Observed: Screen freezes.

Unplug the second monitor to unfreeze.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Also the bug seems to be GDM-specific. It never happens if you launch 'gnome-shell --wayland' manually from a VT. Even the same binaries.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Figured it out: The bug only happens with:

  org.gnome.mutter workspaces-only-on-primary true

which is only set in GNOME sessions because:

  /usr/share/glib-2.0/schemas/10_ubuntu-settings.gschema.override
  /usr/share/glib-2.0/schemas/00_org.gnome.shell.gschema.override

You can avoid the bug by using the upstream default:

  org.gnome.mutter workspaces-only-on-primary false

Revision history for this message
Andy Chi (andch) wrote :

Hello @vanvugt,
I manually edit workspaces-only-on-primary false in

  /usr/share/glib-2.0/schemas/10_ubuntu-settings.gschema.override
  /usr/share/glib-2.0/schemas/00_org.gnome.shell.gschema.override

Followed the steps in the bug description, my screen freeze after the external monitor plugged.
Probably my setting is not correct?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Schema changes need to be manually "installed". I don't remember the syntax but it doesn't really matter. Just try changing the setting manually:

  gsettings set org.gnome.mutter workspaces-only-on-primary false

But that's not really important either right now, because changing the setting mid-LTS is not a fix we would consider anyway. Comment #62 is just a note to myself about where to continue debugging this week.

tags: added: fixed-in-ding-47 fixed-upstream
Changed in gnome-shell-extension-desktop-icons-ng (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Daniel van Vugt (vanvugt) wrote (last edit ):

Upstream bug independent of Ubuntu customization:
https://gitlab.gnome.org/GNOME/mutter/-/issues/2333

Changed in mutter:
status: Unknown → New
Revision history for this message
Daniel van Vugt (vanvugt) wrote :
Jeremy Bícha (jbicha)
Changed in gnome-shell-extension-desktop-icons-ng (Ubuntu Jammy):
status: New → Triaged
importance: Undecided → High
assignee: nobody → Sergio Costas (rastersoft-gmail)
Jeremy Bícha (jbicha)
no longer affects: mutter (Ubuntu Jammy)
Changed in gnome-shell-extension-desktop-icons-ng (Ubuntu Jammy):
status: Triaged → In Progress
description: updated
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Andy, or anyone else affected,

Accepted gnome-shell-extension-desktop-icons-ng into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/gnome-shell-extension-desktop-icons-ng/43-2ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in gnome-shell-extension-desktop-icons-ng (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package gnome-shell-extension-desktop-icons-ng - 46-2

---------------
gnome-shell-extension-desktop-icons-ng (46-2) unstable; urgency=medium

  [ Sergio Costas ]
  * Cherry-pick patch to fix hang when using 2 monitors with different zoom
    settings (LP: #1976204)

 -- Jeremy Bicha <email address hidden> Tue, 28 Jun 2022 08:54:11 -0400

Changed in gnome-shell-extension-desktop-icons-ng (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Andy Chi (andch) wrote :

Enable -proposed on XPS9310(3456x2160), install gnome-shell-extension-desktop-icons-ng (43-2ubuntu1).

[steps]
1. Plug type-c monitor to DUT
2. Cold Boot into OS
3. Check the external monitor has signal
4. Click keyboard or move mouse to check system can work well

External and built-in monitor works fine.

tags: added: verification-done verification-done-jammy
removed: verification-needed verification-needed-jammy
Revision history for this message
Zorro Zhang (zorro-zhang) wrote (last edit ):

Result at Dell side:
Enable -proposed on XPS9320(3456x2160), install gnome-shell-extension-desktop-icons-ng (43-2ubuntu1).
XPS 9320 UHD Panel
BIOS: 1.4.0
Dock: WD19 TBT
Monitors: 4K 60hz
Plug/Unplug: 10 times

External and built-in monitors works fine with mirror mode or join display mode.

description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package gnome-shell-extension-desktop-icons-ng - 43-2ubuntu1

---------------
gnome-shell-extension-desktop-icons-ng (43-2ubuntu1) jammy; urgency=medium

  [ Sergio Costas]
  * Ensure that the extension isn't loaded twice, fixing problems when other
    extensions are disabled (LP: #1978330)
  * Keep the application even if there are no connected monitors,
    fixing excessive journal error logging & CPU usage (LP: #1978331)
  * Fix hang when using 2 monitors with different zoom settings (LP: #1976204)

 -- Jeremy Bicha <email address hidden> Tue, 28 Jun 2022 08:57:13 -0400

Changed in gnome-shell-extension-desktop-icons-ng (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for gnome-shell-extension-desktop-icons-ng has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Andy Chi (andch)
Changed in oem-priority:
status: Confirmed → Fix Committed
Changed in mutter (Ubuntu):
assignee: Daniel van Vugt (vanvugt) → nobody
importance: High → Medium
status: In Progress → Triaged
tags: removed: fixed-upstream
Andy Chi (andch)
Changed in oem-priority:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.