Xorg crashes on second and next unbind operations of Arc GPU

Bug #2012832 reported by Sergei
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned
xorg-server (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Hello folks,

Experimenting with new Lenovo Yoga i7 with Intel Arc on board and found that even if it's possible to disable the GPU to save power, Xorg session crashes the session after the second attempt (the first after reboot works just fine).

This works like that - after poweroff/poweron of the system I execute the next commands:
1. First attempt:
   * DRI_PRIME=1 glxinfo | grep 'OpenGL renderer' # Shows: "OpenGL renderer string: Mesa Intel(R) Arc(tm) A370M Graphics (DG2)"
   * echo -n "0000:04:00.0" | sudo tee /sys/bus/pci/drivers/i915/unbind # Works great
   * DRI_PRIME=1 glxinfo | grep 'OpenGL renderer' # Shows: "OpenGL renderer string: Mesa Intel(R) Graphics (ADL GT2)", it's embedded graphics
   * echo -n "0000:04:00.0" | sudo tee /sys/bus/pci/drivers/i915/bind
   * DRI_PRIME=1 glxinfo | grep 'OpenGL renderer' # Shows: "OpenGL renderer string: Mesa Intel(R) Arc(tm) A370M Graphics (DG2)"
2. Second attempt:
   * echo -n "0000:04:00.0" | sudo tee /sys/bus/pci/drivers/i915/unbind # Here the Xorg crashes, but I can login again
   * DRI_PRIME=1 glxinfo | grep 'OpenGL renderer' # Shows: "OpenGL renderer string: Mesa Intel(R) Graphics (ADL GT2)", it's embedded graphics
   * echo -n "0000:04:00.0" | sudo tee /sys/bus/pci/drivers/i915/bind
   * DRI_PRIME=1 glxinfo | grep 'OpenGL renderer' # Shows: "OpenGL renderer string: Mesa Intel(R) Arc(tm) A370M Graphics (DG2)"
3. This and the next attempts are just the same as second one

I collected the info and can reproduce this state after each reboot, so hopefully it will help you to find what's going wrong during second and next unbind operations. I can always help with additional information if you will need some.

Thank you

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: xorg 1:7.7+23ubuntu2
Uname: Linux 6.2.8-060208-generic x86_64
ApportVersion: 2.20.11-0ubuntu82.3
Architecture: amd64
BootLog: Error: [Errno 13] Permission denied: '/var/log/boot.log'
CasperMD5CheckResult: pass
CompositorRunning: None
CurrentDesktop: XFCE
Date: Sun Mar 26 12:25:03 2023
DistUpgraded: Fresh install
DistroCodename: jammy
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes, including running git bisection searches
GraphicsCard:
 Intel Corporation Alder Lake-P Integrated Graphics Controller [8086:46a6] (rev 0c) (prog-if 00 [VGA controller])
   Subsystem: Lenovo Device [17aa:3ae3]
   Subsystem: Lenovo Device [17aa:3ae3]
InstallationDate: Installed on 2023-02-21 (33 days ago)
InstallationMedia: Xubuntu 22.04.1 LTS "Jammy Jellyfish" - Release amd64 (20220809.1)
MachineType: LENOVO 82UF
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.8-060208-generic root=UUID=52e0f674-3384-4128-973b-d049407f961a ro quiet splash vt.handoff=7
SourcePackage: xorg
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 11/16/2022
dmi.bios.release: 1.35
dmi.bios.vendor: LENOVO
dmi.bios.version: J1CN35WW
dmi.board.asset.tag: NO Asset Tag
dmi.board.name: LNVNB161216
dmi.board.vendor: LENOVO
dmi.board.version: SDK0T76461 WIN
dmi.chassis.asset.tag: NO Asset Tag
dmi.chassis.type: 31
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Yoga 7 16IAH7
dmi.ec.firmware.release: 1.35
dmi.modalias: dmi:bvnLENOVO:bvrJ1CN35WW:bd11/16/2022:br1.35:efr1.35:svnLENOVO:pn82UF:pvrYoga716IAH7:rvnLENOVO:rnLNVNB161216:rvrSDK0T76461WIN:cvnLENOVO:ct31:cvrYoga716IAH7:skuLENOVO_MT_82UF_BU_idea_FM_Yoga716IAH7:
dmi.product.family: Yoga 7 16IAH7
dmi.product.name: 82UF
dmi.product.sku: LENOVO_MT_82UF_BU_idea_FM_Yoga 7 16IAH7
dmi.product.version: Yoga 7 16IAH7
dmi.sys.vendor: LENOVO
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.113-2~ubuntu0.22.04.1
version.libgl1-mesa-dri: libgl1-mesa-dri 23.0.0.20221126.1+2050~u22.04
version.libgl1-mesa-glx: libgl1-mesa-glx N/A
version.xserver-xorg-core: xserver-xorg-core 2:21.1.3-2ubuntu2.8
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-2ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20210115-1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.17-2build1

Revision history for this message
Sergei (sergei0) wrote :
Sergei (sergei0)
description: updated
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Definitely a crash, but not yet actionable:

[ 220.992] (II) config/udev: removing GPU device /sys/devices/pci0000:00/0000:00:06.2/0000:02:00.0/0000:03:01.0/0000:04:00.0/drm/card1 /dev/dri/card1
[ 220.992] xf86: remove device 1 /sys/devices/pci0000:00/0000:00:06.2/0000:02:00.0/0000:03:01.0/0000:04:00.0/drm/card1
[ 220.992] (EE)
[ 220.992] (EE) Backtrace:
[ 220.995] (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x139) [0x561dfa9a3809]
[ 220.996] (EE) 1: /lib/x86_64-linux-gnu/libc.so.6 (__sigaction+0x50) [0x7f0ed6842520]
[ 220.997] (EE) 2: ? (?+0x0) [0x0]
[ 220.997] (EE)
[ 220.997] (EE) Segmentation fault at address 0x0
[ 220.997] (EE)
Fatal server error:
[ 220.997] (EE) Caught signal 11 (Segmentation fault). Server aborting

affects: xorg (Ubuntu) → xorg-server (Ubuntu)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It sounds like some part of the system has crashed. To help us find the cause of the crash please follow these steps:

1. Look in /var/crash for crash files and if found run:
    ubuntu-bug YOURFILE.crash
Then tell us the ID of the newly-created bug.

2. If step 1 failed then look at https://errors.ubuntu.com/user/ID where ID is the content of file /var/lib/whoopsie/whoopsie-id on the machine. Do you find any links to recent problems on that page? If so then please send the links to us.

Please take care to avoid attaching .crash files to bugs as we are unable to process them as file attachments. It would also be a security risk for yourself.

Changed in xorg-server (Ubuntu):
status: New → Incomplete
Revision history for this message
Sergei (sergei0) wrote :

Hi Daniel,

So here what I found: https://errors.ubuntu.com/oops/0e9e0a38-cc6e-11ed-b99a-fa163ef35206 - seems about that, but I did not found the same Backtrace thing inside. If it will not help - I can try to reproduce again and see if it will generate the crash again.

Thank you

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I can't see where Xorg crashed, but it looks like your kernel graphics driver also crashed at about the same time. Can you collect more from around this time in your system log?

Mar 26 12:21:08 anna-yoga kernel: RDX: 0000000000000000 RSI: 00007fdb0f0f8441 RDI: 0000000000000018
Mar 26 12:21:08 anna-yoga kernel: RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000011
Mar 26 12:21:08 anna-yoga kernel: R10: 0000000000000018 R11: 0000000000000246 R12: 00007fdb0f0f8441
Mar 26 12:21:08 anna-yoga kernel: R13: 00005642fef3dbe0 R14: 00005642fef3da80 R15: 00005642fef465b0
Mar 26 12:21:08 anna-yoga kernel: </TASK>
Mar 26 12:21:08 anna-yoga kernel: ---[ end trace 0000000000000000 ]---
Mar 26 12:21:08 anna-yoga kernel: i915 0000:04:00.0: [drm] *ERROR* Writing dc state to 0x1 failed, now 0x0

Revision history for this message
Sergei (sergei0) wrote :

Ok, looks like I see what's happening:

1. I boot the laptop after cold-reboot and it crashes during xorg startup and shows on VT:
i915 0000:04:00.0: [drm] *ERROR* Writing dc state to 0x1 failed, now 0x0
i915 0000:04:00.0: [drm] *ERROR* DC state mismatch (0x1 -> 0x0)

2. It starts in 5sec-1min up and shows the login screen, then I login and send this first crash: https://errors.ubuntu.com/oops/ea15e7a0-cd06-11ed-9fcb-fa163e993415 (~300KB)

3. Now I executing sequentially the steps to unbind, then bind, then unbind - and it crashes again.

4. I login back and sending this bigger crash log: https://errors.ubuntu.com/oops/a797a552-cd07-11ed-b9a4-fa163ef35206 (~6MB)

Now I can find the X crash "Backtrace" in the last crash:
 [ 98.987] (EE) Backtrace:
 [ 98.987] (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x139) [0x5610ac7d2809]
 [ 98.988] (EE) 1: /lib/x86_64-linux-gnu/libc.so.6 (__sigaction+0x50) [0x7f36ec042520]
 [ 98.990] (EE) 2: ? (?+0x0) [0x0]
 [ 98.990] (EE)
 [ 98.990] (EE) Segmentation fault at address 0x0

So I hope it is the one you look for - or I can try to find something else, just describe which additional info I need to look for and I will do that) Maybe journalctl logs or something else?

Thank you

Revision history for this message
Sergei (sergei0) wrote :

Or maybe I just have to wait until linux 6.4: https://www.phoronix.com/news/Intel-Linux-6.4-More-GT-Next

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Please try a kernel version that is supported on Ubuntu 22.04, like 5.15.0-69.76, 5.19.0-38.39~22.04.1 or 6.1.0-1008.8

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Sergei (sergei0) wrote :

Yeah I tried to reproduce the same on 5.15 kernel (since it's supported by intel-i915-dkms, 5.19 and 6.1 are not supported and fails on i915 module compilation).

The results are in general the same: https://errors.ubuntu.com/oops/f8d291da-cd9c-11ed-b9ad-fa163ef35206

It's most likely i915 driver, but I'm not sure why Xorg fails on OsLookupColor when the device is disconnected...

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Ignore "OsLookupColor" -- all Xorg crashes without debug symbols are reported as being in that function.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for xorg-server (Ubuntu) because there has been no activity for 60 days.]

Changed in xorg-server (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.