Gnome Shell lock-up on GPU hotplug

Bug #1873401 reported by Aleksander Miera
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
mutter (Ubuntu)
New
Undecided
Unassigned

Bug Description

Discovered when testing EVDI-backed screen (https://github.com/DisplayLink/evdi).
While adding new device /dev/dri/cardX xorg locks, it can be observed by the cursors that stop blinking or menus not responding. Sometimes it is just a short (1-2 sec.) pause, but sometimes (hard to state the exact reproduction rate, rather rarely) the screen stops completely.
Interestingly enough, if using HW cursor, it moves fine. Keyboard is at least partially functional, as it's possible to "bail out" to tty with C-A-Fx and kill the session. Anything else is difficult to check, as nothing really happens on the screen, it just remains frozen.

When adding 4 GPUs at once (e.g. via config in /etc/modprobe.d according to https://support.displaylink.com/knowledgebase/articles/1843660-screen-freezes-after-opening-an-application-only). Haven't checked other numbers yet, to be done soon.
The problem occurs, when adding the devices, not enabling their associated screens/reading EDID or anything like that.
Also, to be checked with UDL, similar behaviour is expected, but due to the fact that it adds only a single device at a time reproducing "complete" freeze might be difficult.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: xserver-xorg-core 2:1.20.8-2ubuntu2
ProcVersionSignature: Ubuntu 5.4.0-24.28-generic 5.4.30
Uname: Linux 5.4.0-24-generic x86_64
ApportVersion: 2.20.11-0ubuntu27
Architecture: amd64
BootLog: Error: [Errno 13] Permission denied: '/var/log/boot.log'
CasperMD5CheckResult: skip
CompositorRunning: None
CurrentDesktop: ubuntu:GNOME
Date: Fri Apr 17 09:24:35 2020
DistUpgraded: Fresh install
DistroCodename: focal
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes, including running git bisection searches
GraphicsCard:
 Intel Corporation 4th Gen Core Processor Integrated Graphics Controller [8086:0416] (rev 06) (prog-if 00 [VGA controller])
   Subsystem: Dell 4th Gen Core Processor Integrated Graphics Controller [1028:060d]
   Subsystem: Dell GK107GLM [Quadro K1100M] [1028:060d]
InstallationDate: Installed on 2020-04-16 (0 days ago)
InstallationMedia: Ubuntu 20.04 LTS "Focal Fossa" - Beta amd64 (20200415)
MachineType: Dell Inc. Dell Precision M3800
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-24-generic root=UUID=e0a7aba8-1c09-4367-9d6b-c74b73017fb4 ro quiet splash vt.handoff=7
SourcePackage: xorg-server
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 05/25/2018
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A12
dmi.board.name: Dell Precision M3800
dmi.board.vendor: Dell Inc.
dmi.board.version: A12
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.chassis.version: Not Specified
dmi.modalias: dmi:bvnDellInc.:bvrA12:bd05/25/2018:svnDellInc.:pnDellPrecisionM3800:pvrA12:rvnDellInc.:rnDellPrecisionM3800:rvrA12:cvnDellInc.:ct8:cvrNotSpecified:
dmi.product.name: Dell Precision M3800
dmi.product.sku: Dell Precision M3800
dmi.product.version: A12
dmi.sys.vendor: Dell Inc.
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.101-2
version.libgl1-mesa-dri: libgl1-mesa-dri 20.0.4-1ubuntu1
version.libgl1-mesa-glx: libgl1-mesa-glx N/A
version.xserver-xorg-core: xserver-xorg-core 2:1.20.8-2ubuntu2
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20200226-1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.16-1

Revision history for this message
Aleksander Miera (amiera) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

If the HW cursor still moves then it's not Xorg that's locked/frozen/crashed, it is gnome-shell (via the mutter project).

affects: xorg-server (Ubuntu) → mutter (Ubuntu)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It sounds like some part of the system has crashed. To help us find the cause of the crash please follow these steps:

1. Look in /var/crash for crash files and if found run:
    ubuntu-bug YOURFILE.crash
Then tell us the ID of the newly-created bug.

2. If step 1 failed then look at https://errors.ubuntu.com/user/ID where ID is the content of file /var/lib/whoopsie/whoopsie-id on the machine. Do you find any links to recent problems on that page? If so then please send the links to us.

3. If step 2 also failed then apply the workaround from bug 994921, reboot, reproduce the crash, and retry step 1.

Please take care to avoid attaching .crash files to bugs as we are unable to process them as file attachments. It would also be a security risk for yourself.

Changed in mutter (Ubuntu):
status: New → Incomplete
tags: added: displaylink
Revision history for this message
Aleksander Miera (amiera) wrote :

Point is, there is no crash file. Graphical environment is not responsive, but nothing seems to have crashed.
If there's a stacktrace or core file needed, no problem, I can rebuild in debug mode, attach debugger and gather those.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Alright. Please first ensure you have followed all the steps in comment #3. Then if there's still no evidence of a crash we can assume it's a freeze. The next thing to try then is to force it to dump core during the freeze (from an ssh login) by sending some signal to the process like:

  kill -USR2 PID

Then we would be able to analyse it as a crash (assuming the automated crash reporting picks it up). But if you want to do the debugging locally yourself then feel free. Just paste the stack trace you find here so we can see what you see.

Revision history for this message
Aleksander Miera (amiera) wrote :

OK, I'm totally sure it's a freeze.
Moreover, gnome-shell stack trace looks normal, event loop runs just fine.
I can generate and provide core file if needed.

What I managed to debug so far, is that in case of multiple /dev/dri/cardX being created
frame_cb in meta-stage-x11.c:296 is not called which simply results in the screen being stalled.
Debugging it up the stack trace leads to conclusion, that cogl_glib_source_dispatch in cogl-glib-source.c:133 is not called at all for some reason.

Natural assumption would be to track that function down, unless there's sth more you can recommend me?

summary: - xorg lock-up on GPU hotplug
+ Gnome Shell lock-up on GPU hotplug
Changed in mutter (Ubuntu):
status: Incomplete → New
Revision history for this message
Aleksander Miera (amiera) wrote :

Consulted the bug on #gnome-shell IRC.
Next step is to track down the event that actually triggers execution of cogl_glib_source_dispatch. It's possible it simply does not arrive. BTW, this does not rule out Xorg as potential culprit, it might be simply not delivering the expected event. This is yet to be checked.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.