amdgpu-pro xorg crash

Bug #1820323 reported by Jill Manfield
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
xorg-server (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

This is on Precision 3620, Precision 5820, Precision 7820 - systems with AMD bundled for OEM fail to properly boot after update/upgrade. I believe this is related to this issue:

https://www.amd.com/en/support/kb/faq/amdgpupro-ubuntu-advisory

Since our image contains the amdgpu-pro package we are impacted. Canonical image is not impacted.

Various systems (Precision 3420, 3620, 5820, and 7820) are having login issues with AMD GPUs after running system updates on the OEM Ubuntu image.
After running sudo apt-get update && apt-get upgrade, the system will exhibit one of a few different behaviors:
• Boots to a black screen
• Loops at login
• Boots to a screen indicating the system has been forced to low graphics mode and hard locks.
I have a systems in the lab that exhibits this issue and I don’t know how to get around it.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: xorg 1:7.7+13ubuntu3.1
ProcVersionSignature: Ubuntu 4.4.0-143.169-generic 4.4.170
Uname: Linux 4.4.0-143-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.18
Architecture: amd64
BootLog:
 Scanning for Btrfs filesystems
 UBUNTU: clean, 267691/28237824 files, 3796147/112924672 blocks
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
Date: Fri Mar 15 13:22:36 2019
DistUpgraded: Fresh install
DistributionChannelDescriptor:
 # This is a distribution channel descriptor
 # For more information see http://wiki.ubuntu.com/DistributionChannelDescriptor
 canonical-oem-somerville-xenial-amd64-20160624-2
DistroCodename: xenial
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes
GraphicsCard:
 Subsystem: Dell Device [1028:06c7]
 Advanced Micro Devices, Inc. [AMD/ATI] Lexa XT [Radeon PRO WX 2100] [1002:6995] (prog-if 00 [VGA controller])
   Subsystem: Dell Device [1028:0b0c]
InstallationDate: Installed on 2019-03-14 (0 days ago)
InstallationMedia: Ubuntu 16.04 "Xenial" - Build amd64 LIVE Binary 20160624-10:47
MachineType: Dell Inc. Precision Tower 3420
ProcEnviron:
 TERM=linux
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-143-generic root=UUID=ced18cfc-68aa-461d-a69d-d482f5880d16 ro quiet splash vt.handoff=7
SourcePackage: xorg
Symptom: display
Title: Xorg crash
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 11/07/2018
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.11.2
dmi.board.name: 02K9CR
dmi.board.vendor: Dell Inc.
dmi.board.version: A02
dmi.chassis.type: 3
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr2.11.2:bd11/07/2018:svnDellInc.:pnPrecisionTower3420:pvr:rvnDellInc.:rn02K9CR:rvrA02:cvnDellInc.:ct3:cvr:
dmi.product.name: Precision Tower 3420
dmi.sys.vendor: Dell Inc.
version.compiz: compiz 1:0.9.12.3+16.04.20180221-0ubuntu1
version.ia32-libs: ia32-libs N/A
version.libdrm2: libdrm2 2.4.91-2~16.04.1
version.libgl1-mesa-dri: libgl1-mesa-dri 18.0.5-0ubuntu0~16.04.1
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 18.0.5-0ubuntu0~16.04.1
version.xserver-xorg-core: xserver-xorg-core 2:1.18.4-0ubuntu0.8
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.10.1-1ubuntu2
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:7.7.0-1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20160325-1ubuntu1.2
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.12-1build2
xserver.bootTime: Fri Mar 15 13:21:28 2019
xserver.configfile: default
xserver.devices:

xserver.logfile: /var/log/Xorg.0.log
xserver.outputs:

xserver.version: 2:1.18.4-0ubuntu0.8

Revision history for this message
Jill Manfield (jlmanfield) wrote :
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

This bugreport was filed with the stock 16.04(.1) stack though, with kernel 4.4 and xserver 1.18 but with an updated mesa. The failure here is that some symlinks to amdgpu-pro got overwritten (since they were owned by mesa, not amdgpu-pro..), and those need to be fixed manually.

To do that I think this should be enough:
apt --reinstall install libgl1-amdgpu-mesa-dri

That should restore the symlinks. AMD has fixed this bug in newer releases, so that if the system mesa is upgraded, the symlinks restoration is triggered after the upgrade.

Changed in xorg (Ubuntu):
status: New → Incomplete
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Also, to help us find crashes from other people that are the same as this one, please follow these instructions:

https://wiki.ubuntu.com/Bugs/Responses#Missing_a_crash_report_or_having_a_.crash_attachment

affects: xorg (Ubuntu) → xorg-server (Ubuntu)
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

No need for crash debugging, since this is a known issue in the amdgpu-pro package. This should help:

https://askubuntu.com/questions/1104379/system-running-in-low-graphics-mode-on-dell-system-after-software-update/1104384#1104384

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

From xserver.errors.txt:

"module ABI major version (10) doesn't match the server's version (9)
Failed to load module "glx" (module requirement mismatch, 0)
Screen 0 deleted because of no matching config section."

So I think you need to reinstall libgl1-amdgpu-mesa-glx as well. The askubuntu suggestion to remove amdgpu-pro is mainly for after upgrade to 18.04, in this case 16.04 is still used and amdgpu-pro probably still needed, so amdgpu reinstall (or upgrade to latest one available for 16.04) is likely the only supported way.

summary: - xorg crash
+ amdgpu-pro xorg crash
tags: added: amdgpu-pro
Revision history for this message
Jill Manfield (jlmanfield) wrote :

The uninstall and reinstall of amdgpu-pro has not worked for me. I have also tried to purge amdgpu-core and amdgpu-dmks which I was told by others got around this issue. I have not been successful thus far. Any other suggestions and or logs needed to help me find a suitable workaround until this is resolved?

Revision history for this message
Jill Manfield (jlmanfield) wrote :

Any updates on this bug? Is there anything else I need to provide you?

Revision history for this message
Jill Manfield (jlmanfield) wrote :

https://www.amd.com/en/support/kb/faq/amdgpupro-ubuntu-advisory Is there a solution or workaround for this?

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for xorg-server (Ubuntu) because there has been no activity for 60 days.]

Changed in xorg-server (Ubuntu):
status: Incomplete → Expired
Changed in xorg-server (Ubuntu):
status: Expired → New
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

the only solution is to install a newer amdgpu-pro which supports 16.04.3, there's nothing we can do on ubuntu side

Changed in xorg-server (Ubuntu):
status: New → Invalid
Revision history for this message
Chih-Hsyuan Ho (chih) wrote :

- Original shipping condition:
* Dell Farallon + WX2100
* linux-image-4.4.0-36-generic + amdgpu-17.10-429170

- Test procedure:
1. sudo apt install linux-image-4.4.0-xxx-generic linux-headers-4.4.0-xxx-generic
2. reboot

Expected behavior
System can boot into desktop successfully to the updated kernel

Actual behavior
Loop at login session

- Findings:
1. amdgpu DKMS fails to build exactly between 4.4.0-142(OK) and 4.4.0-143(NG) kernels
2. However, starting from 4.4.0-116 kernel, amdgpu module fails to load with the following error message:
amdkcl: version magic '4.4.0-xxx-generic SMP mod_unload modversions' should be '4.4.0-xxx-generic SMP mod_unload modversions retpoline'

- In summary, the issue has 2 causes, amdgpu loading error and amdgpu DKMS build error, the former happens in earlier kernel revisions than the latter.

Revision history for this message
Chih-Hsyuan Ho (chih) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.