mesa gpu lockup
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mesa |
Fix Released
|
Medium
|
|||
mesa (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Xenial |
Fix Released
|
Medium
|
Unassigned | ||
Yakkety |
Fix Released
|
Medium
|
Unassigned |
Bug Description
package: mesa
version: 11.2.0-1ubuntu2.2
release: 16.04.1 / "Xenial"
references:
* https:/
* https:/
symptoms:
1. kernel reports "GPU lockup"
2. kernel reports "GPU softreset"
3. immediately frozen display
4. eventual "INFO: task (kworker|
5. computer hanging on system reboot requiring a forced power off
Symptoms occur when playing Team Fortress 2.
Attachments:
1. mesa source package patch
2. lspci output for video card (ie Radeon HD 7750)
3. kernel log
In freedesktop.org Bugzilla #93649, Nhdls-matthew-8d0ze (nhdls-matthew-8d0ze) wrote : | #17 |
In freedesktop.org Bugzilla #93649, Nhdls-matthew-8d0ze (nhdls-matthew-8d0ze) wrote : | #18 |
Created attachment 120926
Strace of Xorg up to X freezing
FD 20 is the drm device node, and it freezes on ioctl 0xc020645d.
In freedesktop.org Bugzilla #93649, Nhdls-matthew-8d0ze (nhdls-matthew-8d0ze) wrote : | #19 |
Created attachment 120927
Radeon blocked locks
Since X seemed blocked on an ioctl, I managed to get a list of all the blocked locks, and found most of my taken locks were from GUI related programs who would be doing GL things, and they are all blocked on a lock, including one that is currently trying to reset my GPU.
I'm guessing there is a lock that is being grabbed twice, once when userspace makes an ioctl, and again during the reset. I'll keep digging.
Also, I think this may be a duplicate of #90217, as both involve source games. I'll leave this open for now, in case tf2 has a different trigger.
In freedesktop.org Bugzilla #93649, RussianNeuroMancer (russianneuromancer) wrote : | #20 |
There is other logs: https:/
In freedesktop.org Bugzilla #93649, Nhdls-matthew-8d0ze (nhdls-matthew-8d0ze) wrote : | #21 |
Created attachment 121242
This helps avoid a complete crash when a lockup occurs.
Note this doesn't solve this bug, it just helps manage it.
In freedesktop.org Bugzilla #93649, Paul Jago (pc-jago1337) wrote : | #22 |
Can confirm, I have either the same or a similar problem on my R9 390 (using radeon, with DPM disabled). It doesn't just crash X though, it completely locks up and I have to reboot to even use TTY. Happens after 10-20 mins of TF2.
Running Arch Linux with everything up to date but no AUR packages, will post specifics later.
In freedesktop.org Bugzilla #93649, Nhdls-matthew-8d0ze (nhdls-matthew-8d0ze) wrote : | #23 |
Created attachment 121293
Second patch to fix system lockup after gpu reset
This is already taken accepted from the mailing list, including here for completeness.
If anyone is experiencing this issue, can you please try with all of these patches applied? For now, X should die and restart without acceleration, but getting a dmesg out or restarting should be fine.
In freedesktop.org Bugzilla #93649, Paul Jago (pc-jago1337) wrote : | #24 |
CPU: FX 8350
GPU: R9 390
MB: Asrock 970 Extreme4
Software:
Kernel: 4.3.3-3-ARCH x86_64
Mesa: 11.1.1
DRM: 2.43.0
LLVM: 3.7.0
X: 1.18.0
As mentioned above, I get the crash with TF2, but *NOT* CS:GO.
In freedesktop.org Bugzilla #93649, Paul Jago (pc-jago1337) wrote : | #25 |
Also, this could be a duplicate of bug #92912 - random lockups in TF2, all with radeon.
In freedesktop.org Bugzilla #93649, Nhdls-matthew-8d0ze (nhdls-matthew-8d0ze) wrote : | #26 |
(In reply to pc.jago1337 from comment #8)
> Also, this could be a duplicate of bug #92912 - random lockups in TF2, all
> with radeon.
I was asked to file this bug separately. Also that covers R600, a different GPU the GCN.
In freedesktop.org Bugzilla #93649, Roscofdporg (roscofdporg) wrote : | #27 |
Same problem here on a fedora 23
GPU: HD 7970
CPU: Intel Core i7 950
Mesa 11.1.0
DRM 2.43.0
LLVM 3.7.0
kernel: 4.3.4
The logs are filed with "ring stalled" and GPU lock messages. I can send more logs if needed.
radeon 0000:02:00.0: ring 3 stalled for more than 10249msec
radeon 0000:02:00.0: GPU lockup (current fence id 0x000000000001e5f1 last fence id 0x000000000001e5f2 on ring 3)
I've tried a different firmware (http://
Does it makes sense to try to rollback to an older kernel?
In freedesktop.org Bugzilla #93649, Nhdls-matthew-8d0ze (nhdls-matthew-8d0ze) wrote : | #28 |
Created attachment 121578
New avoid lockup patch
Latest version as posted to dri-devel. With these two patches, your system should no longer lockup forever. It will freeze the game for a moment, and X may die for other reasons.
Now the underlying tf2 issue needs investigation.
In freedesktop.org Bugzilla #93649, Luca Osvaldo Mastromatteo (lukycrociato) wrote : | #29 |
I can say that it also affects me, I'm using the AMDGPU drivers with powerplay enabled, using a custom linux4.5 kernel.
AMD r9 380 video card.
In freedesktop.org Bugzilla #93649, Nhdls-matthew-8d0ze (nhdls-matthew-8d0ze) wrote : | #30 |
*** Bug 95308 has been marked as a duplicate of this bug. ***
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #31 |
Any chance VALVe introduced this? They won't admit it. https:/
The patches attatched here are present in Linux 4.6. I tested linux-git-4.7-rc7 with mesa-git-12.1 compiled against llvm-snv-3.9, and TF2 still crashes.
Setting every graphical option to Low doesn't help.
In freedesktop.org Bugzilla #93649, Nicolai Hähnle (nha) wrote : | #32 |
This is certainly a bug in our driver (unlike what was written on the Github tracker, a game *can* cause a hang e.g. by writing an infinite loop in a shader, but that seems exceedingly unlikely in the case of TF2). The problem with this particular bug is that it seems non-deterministic (i.e. not reliably reproducible), and that makes it hard to debug.
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #33 |
So there's a chance it won't be fixed at all?
I was thinking about bisecting from version 3.16 (where I know it worked for me, on Debian Jessie) until ~4.1, but I don't have that kind of time right now.
In freedesktop.org Bugzilla #93649, Nicolai Hähnle (nha) wrote : | #34 |
Actually, if you could find a clear bisection result, that would be tremendously helpful and would probably lead to a fix.
However, with this kind of bug you need to be extremely sure about what you're doing when bisecting. For example, if you know that the hang typically occurs after 10 minutes, then you should play for at least one hour (perhaps even longer) with each kernel. Otherwise, you might have just gotten lucky, and the bisect result would be worse than useless.
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #35 |
Yes, I would definitely test it for a long period, something like 16 hours hehehe.
However, I can't do any besecting right now, I'm tremendously busy at the moment. Too bad there's not many Linux players with this problem, otherwise someone would have figured this out already.
Cheers.
In freedesktop.org Bugzilla #93649, Pandiculationfinch (pandiculationfinch) wrote : | #36 |
happens with stellaris as well.
In freedesktop.org Bugzilla #93649, Marek Olšák (maraeo) wrote : | #37 |
Does this fix it?
https:/
In other words, does mesa/master work?
In freedesktop.org Bugzilla #93649, Nhdls-matthew-8d0ze (nhdls-matthew-8d0ze) wrote : | #38 |
I can confirm lastest git head (50b49d242d702e
Also seems to have a regression regarding lighting, I'll see about bisecting that in a separate report.
LLVM: 3.8.0
DRM: 2.43.0
Linux: 4.6.3-gentoo
In freedesktop.org Bugzilla #93649, Pandiculationfinch (pandiculationfinch) wrote : | #39 |
I'll test this weekend with stellaris and let you know.
In freedesktop.org Bugzilla #93649, Pandiculationfinch (pandiculationfinch) wrote : | #40 |
sad to say it did not fix the issue for me. it ran longer than usual though prior to the crash. I suspect you nixed one issue but multiple are going on.
I'm happy to run any debugging/patches you wish to try.
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #41 |
Didn't fix for me either, on Arch Linux.
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #42 |
Marek, since you work for AMD, I wonder if you could get a few hints for the fix on Catalyst's sources?
In freedesktop.org Bugzilla #93649, Marek Olšák (maraeo) wrote : | #43 |
(In reply to AmarildoJr from comment #25)
> Marek, since you work for AMD, I wonder if you could get a few hints for the
> fix on Catalyst's sources?
It's not so simple. This is a bug somewhere in the Mesa driver such that looking at other drivers won't likely help.
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #44 |
(In reply to Marek Olšák from comment #26)
> (In reply to AmarildoJr from comment #25)
> > Marek, since you work for AMD, I wonder if you could get a few hints for the
> > fix on Catalyst's sources?
>
> It's not so simple. This is a bug somewhere in the Mesa driver such that
> looking at other drivers won't likely help.
This is a very weird issue. I think it may not be in Mesa, and here's why:
* On Debian Jessie with kernel 3.16 and Mesa 10.3, the problem doesn't happen;
* On the same Debian, but with mesa backported, the problem also doesn't happen;
* On the same Debian with Mesa backported and the Kernel backported, the problem still doesn't happen;
* On Arch Linux with Mesa downgraded to 10.3, the problem happens;
* On the same Arch Linux with Mesa and Kernel downgraded (Kernel to version 3.16 and even 3.10), the problem still happens;
* I'm not 100% sure I downgraded the Firmware on Arch, but I'll try today since I'm testing a few drivers in Linux;
* On vanilla Arch with Catalyst/FGLRX, the problem doesn't happen;
So I do think this issue is much bigger than everybody thinks and only happens with a certain combination of Mesa, Kernel, Firmware, and possibly libdrm, llvm, and other pieces of software as well.
What I really think is that VALVe should investigate this since this problem started happening after they introduced mandatory Texture Streaming.
In freedesktop.org Bugzilla #93649, Vedran-f (vedran-f) wrote : | #45 |
(In reply to AmarildoJr from comment #27)
> (In reply to Marek Olšák from comment #26)
> > (In reply to AmarildoJr from comment #25)
> > > Marek, since you work for AMD, I wonder if you could get a few hints for the
> > > fix on Catalyst's sources?
> >
> > It's not so simple. This is a bug somewhere in the Mesa driver such that
> > looking at other drivers won't likely help.
>
> This is a very weird issue. I think it may not be in Mesa, and here's why:
>
> * On Debian Jessie with kernel 3.16 and Mesa 10.3, the problem doesn't
> happen;
> * On the same Debian, but with mesa backported, the problem also doesn't
> happen;
> * On the same Debian with Mesa backported and the Kernel backported, the
> problem still doesn't happen;
> * On Arch Linux with Mesa downgraded to 10.3, the problem happens;
> * On the same Arch Linux with Mesa and Kernel downgraded (Kernel to version
> 3.16 and even 3.10), the problem still happens;
> * I'm not 100% sure I downgraded the Firmware on Arch, but I'll try today
> since I'm testing a few drivers in Linux;
> * On vanilla Arch with Catalyst/FGLRX, the problem doesn't happen;
>
> So I do think this issue is much bigger than everybody thinks and only
> happens with a certain combination of Mesa, Kernel, Firmware, and possibly
> libdrm, llvm, and other pieces of software as well.
>
> What I really think is that VALVe should investigate this since this problem
> started happening after they introduced mandatory Texture Streaming.
Is the elephant in the room in this case the LLVM version difference between the two setups?
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #46 |
I just tested the oldest firmware available in the Arch Linux Archive, namely linux-firmware 20130725-1, and the crashes don't happen. This is with current Arch, not a single package is old and all packages are up-to-date according to the repos.
I'm hitting 10 to 30 FPS in-game, but at least the crashes don't happen which IMO is a very good sign of where the problem might be.
I'll report the firmware problem to AMD.
In the mean time, does anyone know how I can try running the firmware from Catalyst?
@Marek, where is the best place to report this?
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #47 |
(In reply to Vedran Miletić from comment #28)
> Is the elephant in the room in this case the LLVM version difference between
> the two setups?
According to a Gentoo user who compiled llvm 3.5 and and older version of mesa against it, the problem still occurs.
In freedesktop.org Bugzilla #93649, Marek Olšák (maraeo) wrote : | #48 |
(In reply to AmarildoJr from comment #29)
> I just tested the oldest firmware available in the Arch Linux Archive,
> namely linux-firmware 20130725-1, and the crashes don't happen. This is with
> current Arch, not a single package is old and all packages are up-to-date
> according to the repos.
>
> I'm hitting 10 to 30 FPS in-game, but at least the crashes don't happen
> which IMO is a very good sign of where the problem might be.
>
> I'll report the firmware problem to AMD.
>
> In the mean time, does anyone know how I can try running the firmware from
> Catalyst?
>
> @Marek, where is the best place to report this?
So are we certain the hangs are caused by firmware? Bisecting the firmware would help a lot.
What's your GPU?
In freedesktop.org Bugzilla #93649, Roscofdporg (roscofdporg) wrote : | #49 |
I tested today 3 different firmwares on manjaro (HD7970)
linux-firmware-
This allowed me to play TF2 without bugs for ~30 min. Then I had the bug (screen freeze, sound loop) but the system recovered fine after 20 sec with no loss of performance. I still had a problem before and after the bug with the mouse pointer which wasn't visible at all time.
linux-firmware-
This allowed me to play for a good hour, then: bug + recovery after 20 sec. At the fifth bug the screen simply hanged, TF2 and steam crashed. (had to ctrl+alt+f2). This one didn't have the mouse bug. This is the most stable TF2 experience I can get.
linux-firmware-
This one crashed after 2 seconds loading the first map.
The first two firmwares also seem to have fixed the same bug which was present in "Victor Vran" (same symptoms, screen freeze + sound loop).
In freedesktop.org Bugzilla #93649, Pandiculationfinch (pandiculationfinch) wrote : | #50 |
not certain but assuming I ran the test correctly, I experienced a crash using the oldest linux firmware I had linux-firmware-
commands run to downgrade to linux-firmware-
sudo pacman -U /var/cache/
sudo pacman -S linux
after downgrade I had the following error on boot, so I'm assuming it worked:
Sep 04 09:53:14 jambli kernel: radeon 0000:01:00.0: Direct firmware load for radeon/
Sep 04 09:53:14 jambli kernel: radeon 0000:01:00.0: radeon_vce: Can't load firmware "radeon/
Sep 04 09:53:14 jambli kernel: radeon 0000:01:00.0: failed VCE (-2) init.
other info:
Name : llvm-libs
Version : 3.8.1-1
Name : linux
Version : 4.7.2-1
Name : mesa-git
Version : 84594.98f734e-1
Extended renderer info (GLX_MESA_
Vendor: X.Org (0x1002)
Device: AMD OLAND (DRM 2.45.0 / 4.7.2-1-ARCH, LLVM 4.0.0) (0x6610)
Version: 12.1.0
Accelerated: yes
Video memory: 2048MB
Unified memory: no
Preferred profile: core (0x1)
Max core profile version: 4.3
Max compat profile version: 3.0
Max GLES1 profile version: 1.1
Max GLES[23] profile version: 3.1
I forget the exact card off the top of my head but here is the output of lspci, if you need more precise card information let me know how to get it from the cli =):
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland XT [Radeon HD 8670 / R7 250/350]
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
In freedesktop.org Bugzilla #93649, Pandiculationfinch (pandiculationfinch) wrote : | #51 |
I should note I was testing against stellaris.
In freedesktop.org Bugzilla #93649, Pandiculationfinch (pandiculationfinch) wrote : | #52 |
game froze again after ~20minutes. using the 20130725 version firmware. so if downgrading to 20130725 fixes TF2 it likely isn't the same issue as TF2.
game: stellaris
commands run to downgrade to linux-firmware-
sudo pacman -U /var/cache/
sudo pacman -S linux
other info:
Name : llvm-libs
Version : 3.8.1-1
Name : linux
Version : 4.7.2-1
Name : mesa-git
Version : 84594.98f734e-1
Name : linux-firmware
Version : 20130725-1
lspci:
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland XT [Radeon HD 8670 / R7 250/350]
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
boot logs:
Sep 04 11:12:28 jambli kernel: [drm] initializing kernel modesetting (OLAND 0x1002:0x6610 0x174B:0xE269 0x00).
Sep 04 11:12:28 jambli kernel: [drm] register mmio base: 0xFDD80000
Sep 04 11:12:28 jambli kernel: [drm] register mmio size: 262144
Sep 04 11:12:28 jambli kernel: ATOM BIOS: C66201
Sep 04 11:12:28 jambli kernel: radeon 0000:01:00.0: VRAM: 2048M 0x0000000000000000 - 0x000000007FFFFFFF (2048M used)
Sep 04 11:12:28 jambli kernel: radeon 0000:01:00.0: GTT: 2048M 0x0000000080000000 - 0x00000000FFFFFFFF
Sep 04 11:12:28 jambli kernel: [drm] Detected VRAM RAM=2048M, BAR=256M
Sep 04 11:12:28 jambli kernel: [drm] RAM width 128bits DDR
Sep 04 11:12:28 jambli kernel: [TTM] Zone kernel: Available graphics memory: 8209378 kiB
Sep 04 11:12:28 jambli kernel: [TTM] Zone dma32: Available graphics memory: 2097152 kiB
Sep 04 11:12:28 jambli kernel: [TTM] Initializing pool allocator
Sep 04 11:12:28 jambli kernel: [TTM] Initializing DMA pool allocator
Sep 04 11:12:28 jambli kernel: [drm] radeon: 2048M of VRAM memory ready
Sep 04 11:12:28 jambli kernel: [drm] radeon: 2048M of GTT memory ready.
Sep 04 11:12:28 jambli kernel: [drm] Loading oland Microcode
Sep 04 11:12:28 jambli kernel: radeon 0000:01:00.0: Direct firmware load for radeon/
Sep 04 11:12:28 jambli systemd[1]: Created slice system-
Sep 04 11:12:28 jambli kernel: radeon 0000:01:00.0: Direct firmware load for radeon/oland_me.bin failed with error -2
Sep 04 11:12:28 jambli kernel: radeon 0000:01:00.0: Direct firmware load for radeon/oland_ce.bin failed with error -2
Sep 04 11:12:28 jambli kernel: radeon 0000:01:00.0: Direct firmware load for radeon/
Sep 04 11:12:28 jambli kernel: radeon 0000:01:00.0: Direct firmware load for radeon/oland_mc.bin failed with error -2
Sep 04 11:12:28 jambli kernel: radeon 0000:01:00.0: Direct firmware load for radeon/
Sep 04 11:12:28 jambli kernel: [drm] radeon/
Sep 04 11:12:28 jambli kernel: radeon 0000:01:00.0: Direct firmware load for radeon/
Sep 04 11:12:28 jambli kernel: radeon 0000:01:00.0: Direct firmware load for radeon/
Sep 04 11:12:28 jambli kernel: smc: error loading firmware "radeon/
Sep 04 11:12:28 jambli kernel: [drm] Internal...
In freedesktop.org Bugzilla #93649, Marek Olšák (maraeo) wrote : | #53 |
If you're testing Mesa git, would you please set GALLIUM_
Though I've got a hunch that we're just running around in circles.
In freedesktop.org Bugzilla #93649, Pandiculationfinch (pandiculationfinch) wrote : | #54 |
Created attachment 126454
stellaris run via steam: GALLIUM_
here are the dumps generated.
it seems like a hit or miss if anything was actually written into the files.
the computer completely locks up when it encounter the freeze in stellaris.
stellaris was even more unstable with the GALLIUM_DDEBUG, often failing to even start up.
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #55 |
Does anyone have a little bit of free time to extract the files from "lib32-
In freedesktop.org Bugzilla #93649, Hofmann-zachary (hofmann-zachary) wrote : | #56 |
I'm also having this problem with Radeon R7 250 (radeonsi), Mesa 12.0.2, LLVM 3.8.1 and kernel version 4.6.0.
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #57 |
If disabling DPM fixed the issue, shouldn't developers study it's code a little bit? I'm 99.99% positive the issue is in there somewhere, even for AMDGPU (since RadeonSI and AMDGPU drivers share a lot of code).
In freedesktop.org Bugzilla #93649, Hofmann-zachary (hofmann-zachary) wrote : | #58 |
(In reply to Amarildo from comment #40)
> If disabling DPM fixed the issue, shouldn't developers study it's code a
> little bit? I'm 99.99% positive the issue is in there somewhere, even for
> AMDGPU (since RadeonSI and AMDGPU drivers share a lot of code).
Another user previously stated in the thread that they were experiencing the issues and had DPM disabled.
@Marek Olšák
Please let me know if there's anything I can do to help hunt this bug down.
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #59 |
(In reply to hofmann.zachary from comment #41)
> (In reply to Amarildo from comment #40)
> > If disabling DPM fixed the issue, shouldn't developers study it's code a
> > little bit? I'm 99.99% positive the issue is in there somewhere, even for
> > AMDGPU (since RadeonSI and AMDGPU drivers share a lot of code).
>
> Another user previously stated in the thread that they were experiencing the
> issues and had DPM disabled.
>
> @Marek Olšák
> Please let me know if there's anything I can do to help hunt this bug down.
But that's one user's word against at least 5. Do we even know if the user actually disabled DPM or has the capacity to do so? Because I'm sure me and others (like Gentoo users) did in fact disable DPM and the hang didn't happen. So I don't think our word is less valid just because *one* user claimed he/she disabled DPM and the hang still happened.
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #60 |
Just tried Mesa-Git (13.1) with the AMDGPU driver on R9 270X. The crash happens here as well.
However, looking at journalctl I can see new errors from the AMDGPU driver, and a brief research tells me it could be some TF2 texturing problem.
The error: GPU fault detected: 147 0x000ac802
Similar bugs have been resolved already:
https:/
https:/
LLVM seems to be related too.
In freedesktop.org Bugzilla #93649, Roscofdporg (roscofdporg) wrote : | #61 |
I don't know if it can be of any help, but I've been playing "7 days to die" during the last weeks, regularly for the last days, and I didn't encounter any kind of bug.
Until yesterday evening where at my great surprise I had the same bug (freeze, sound loop) which totally crashed my machine once and only froze it (with a recovery after a few seconds) twice.
I checked that no update occurred on the game files, on the steam runtime and on my OS between the days when it worked flawlessly and yesterday when it crashed 3 time in 15 minutes.
So if it's not only related to files, could it be related to the hardware? Could it be a faulty card (HD7970), or maybe a mix between a faulty hardware and some software instruction?
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #62 |
Faulty hardware doesn't make any sense, because:
- It only happens on Linux;
- It only happens with specific combinations of Mesa/LLVM/
- It doesn't happen with the proprietary drivers
In freedesktop.org Bugzilla #93649, Hofmann-zachary (hofmann-zachary) wrote : | #63 |
(In reply to Amarildo from comment #45)
> Faulty hardware doesn't make any sense, because:
>
> - It only happens on Linux;
> - It only happens with specific combinations of Mesa/LLVM/
> - It doesn't happen with the proprietary drivers
It's probably not the exact same crash, but FWIW I also get crashes with the proprietary driver and TF2 when I tested it last. I just don't want people to get their hopes up only to have them let down.
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #64 |
In all honesty, this is one of the most interesting bugs I know. Within all the people that have it, there are variations to which causes it in the first place.
What works for me (Debian Jessie with Mesa/libc6 from Backports, for example) might still cause the crash for some people.
What I do know is that it's not caused by faulty hardware. It could be for some, but seriously doubt it it's the cause for 99.99% of people experiencing the issue.
In freedesktop.org Bugzilla #93649, Marek Olšák (maraeo) wrote : | #65 |
Does this fix the hangs?
https:/
It changes the HTILE (HyperZ) allocation function to r600_aligned_
In freedesktop.org Bugzilla #93649, Marek Olšák (maraeo) wrote : | #66 |
(In reply to Marek Olšák from comment #48)
> Does this fix the hangs?
> https:/
> ?id=d4d9ec55c58
>
> It changes the HTILE (HyperZ) allocation function to
> r600_aligned_
> (Tahiti/
> happens when TTM decides to move HTILE to a different location with an
> unaligned physical address (which is pretty random). The hardware tries to
> access the unaligned address and boom.
Actually, I think that commit only affects Hawaii and Fiji. Other GPUs might be unaffected, which means the Tahiti hangs are due to a different bug.
In freedesktop.org Bugzilla #93649, Nhdls-matthew-8d0ze (nhdls-matthew-8d0ze) wrote : | #67 |
(In reply to Marek Olšák from comment #49)
> (In reply to Marek Olšák from comment #48)
> > Does this fix the hangs?
> > https:/
> > ?id=d4d9ec55c58
> >
> > It changes the HTILE (HyperZ) allocation function to
> > r600_aligned_
> > (Tahiti/
> > happens when TTM decides to move HTILE to a different location with an
> > unaligned physical address (which is pretty random). The hardware tries to
> > access the unaligned address and boom.
>
> Actually, I think that commit only affects Hawaii and Fiji. Other GPUs might
> be unaffected, which means the Tahiti hangs are due to a different bug.
I've previously tried disabling hyperz on Tahiti with no luck in side stepping this bug, so I don't think this is the issue.
Could there be other buffers that need similar treatment that are being ignored? Is there an easy way to test this locally?
In freedesktop.org Bugzilla #93649, Marek Olšák (maraeo) wrote : | #68 |
You can try this:
diff --git a/src/gallium/
index a15d559..ab95bae 100644
--- a/src/gallium/
+++ b/src/gallium/
@@ -939,7 +939,7 @@ radeon_
struct radeon_drm_winsys *ws = radeon_
struct radeon_bo *bo;
unsigned usage = 0, pb_cache_bucket;
-
+alignment *= 2;
/* Only 32-bit sizes are supported. */
if (size > UINT_MAX)
return NULL;
It will only affect radeon, not amdgpu.
In freedesktop.org Bugzilla #93649, Hofmann-zachary (hofmann-zachary) wrote : | #69 |
Unless the changed code works independently of the nohyperz option I don't think it will help, since disabling hyperz on verde doesn't help either.
In freedesktop.org Bugzilla #93649, dungeon (smoki00790) wrote : | #70 |
It might be possible that game fixes something, as i see there was game update 3 days ago with the following mentioned in changelog:
"Improved several aspects of texture handling for OS X and Linux clients
This should reduce the rate of "Out of memory" errors for players on high texture settings, especially on level change
Players still encountering this error can reduce texture quality to medium or lower to greatly improve stability pending further improvements"
http://
Just wild guessing that this might change something, since game started to be unstable on radeonsi when streaming textures and reduction of mem was introduced last year.
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #71 |
I remember disabling stream textures and still having the issue, as well as setting all graphic settings to minimal.
Can anyone confirm the status of this bug on Pitcairn + Mesa-git + amdgpu kernel driver?
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #72 |
Seems that hang handling wasn't implemented at all for some GPU's: https:/
I haven't yet tried playing TF2 with amd-staging-4.7 (though I have been using it for a few days). I'll try it this morning.
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #73 |
Didn't work, hang is still there. I couldn't even go to tty2 this time.
amd-staging-4.7 compiled this morning
mesa-git
llvm-git
In freedesktop.org Bugzilla #93649, Hofmann-zachary (hofmann-zachary) wrote : | #74 |
As smoki mentioned, many of the troubles started after Valve's texture streaming changes to TF2. They'd certainly know what changed in their code, but for someone like me they're impossible to get a hold of.
In freedesktop.org Bugzilla #93649, Pandiculationfinch (pandiculationfinch) wrote : | #75 |
Created attachment 127704
package update history that lead to a change in behaviour
Last night the freezes I've been having changed their behaviour. They use to just cause the system to completely freeze up. Now my system does a immediate shutdown.
this is interesting because I had just updated linux and mesa-git so I potentially have a commit range in mesa/llvm which has code related to the problem. I'm going to rollback my kernel/headers tonight and reboot to rule that out. And if that doesn't cause the hang to re-appear I'll roll back mesa tomorrow. and then I'll rollback llvm.
In the meantime I've attached the package update history for the last few days in case that helps any of the developers.
In freedesktop.org Bugzilla #93649, Pandiculationfinch (pandiculationfinch) wrote : | #76 |
sigh turns something else must have caused the shutdowns, the game is back to just freezing the system today. =/
In freedesktop.org Bugzilla #93649, Roscofdporg (roscofdporg) wrote : | #77 |
Some people are reporting that they can reproduce the bug on windows 7.
https:/
Are we absolutely sure that it is not a hardware problem?
In freedesktop.org Bugzilla #93649, Hofmann-zachary (hofmann-zachary) wrote : | #78 |
I haven't seen anything to rule out it being a hardware problem, but Valve's overwhelming silence on the matter isn't exactly helpful.
In freedesktop.org Bugzilla #93649, Pandiculationfinch (pandiculationfinch) wrote : | #79 |
I finally found the root cause for my problems.
Turns out my CPU was overheating. But I only stressed it enough when playing games and nothing showed up in the logs about a shutdown due to heat. Once i resolved the overheating all my games ran smoothly with no crashes. apologies for the noise.
Wish I had found it sooner.
In freedesktop.org Bugzilla #93649, The-analogkid (the-analogkid) wrote : | #80 |
I am also see my system completely crash after running Team Fortress 2 for typically 5-20 minutes. In the last three occurrences, I've seen the following:
1. Freeze and system reboot within 10 seconds. I did not see anything in the logs.
2. Successful playing for ~30 minutes without issue.
3. Freeze and sound loop. The screen resets and sound loop changes every 10-20 seconds, which I believe is when the system is trying to reset the GPU. However, it never succeeds, and the system becomes completely non-responsive. The keyboard does not seem to accept input (num lock is frozen, can't switch to console). The only thing I can do is a hard restart. This scenario happens almost every time.
Output from journalctl looks like this:
Nov 24 21:26:42 fedora kernel: radeon 0000:01:00.0: ring 3 stalled for more than 10181msec
Nov 24 21:26:42 fedora kernel: radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000075bec last fence id 0x0000000000075bf7 on ring 3)
Backtrace starts like this:
Nov 24 21:26:42 fedora /usr/libexec/
Nov 24 21:26:42 fedora /usr/libexec/
Nov 24 21:26:42 fedora /usr/libexec/
Nov 24 21:26:42 fedora /usr/libexec/
4ec0927739]
Nov 24 21:26:43 fedora /usr/libexec/
ons_virtio_
...
Nov 24 21:26:43 fedora /usr/libexec/
Nov 24 21:26:43 fedora /usr/libexec/
Nov 24 21:26:43 fedora /usr/libexec/
Nov 24 21:26:43 fedora /usr/libexec/
Nov 24 21:26:43 fedora /usr/libexec/
Nov 24 21:26:43 fedora /usr/libexec/
Nov 24 21:26:43 fedora /usr/libexec/
Nov 24 21:26:43 fedora /usr/libexec/
Nov 24 21:26:43 fedora /usr/libexec/
Nov 24 21:26:43 fedora /usr/libexec/
Nov 24 21:26:43 fedora /usr/libexec/
I am running Fedora 24 with the latest updates:
Hardware:
CPU: AMD Athlon II x3 450
GPU: Sapphire / AMD Radeon R7 350 w/ 2GB GDDR5
GPU chipset: Cape Verde
Kernel: 4.8.7-200.
Mesa: 12.0.3
LLVM: 3.8.0
DRM: 2.46.0
Driver: radeonsi
I have played a couple other Valve games for several hours with no problems: Portal,...
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #81 |
Have any of you tried this? https:/
In freedesktop.org Bugzilla #93649, Marek Olšák (maraeo) wrote : | #82 |
(In reply to Amarildo from comment #27)
> What I really think is that VALVe should investigate this since this problem
> started happening after they introduced mandatory Texture Streaming.
If you are right about texture streaming, the cso commit might fix it.
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #83 |
OH MY LORD
Been playing for 25 minutes so far, no hangs at all.
I'll test more!
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #84 |
45 minutes, not a single crash. I believe it's fixed.
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #85 |
Played 2 sessions of 1 hour each, no hangs at all.
To me, this is fixed.
"Thanks", I guess? 1 years is still better than nothing, AMD :P
In freedesktop.org Bugzilla #93649, Michel Dänzer (michel-daenzer) wrote : | #86 |
FWIW, the fundamental problem caught by Marek (good catch!) was there for almost 9 years. It just might not have had quite as severe consequences with other drivers.
In freedesktop.org Bugzilla #93649, Hofmann-zachary (hofmann-zachary) wrote : | #87 |
Well of course it needs more testing to be sure, but I'll probably be doing this soon.
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #88 |
It would be really unfortunate if this didn't fix the issue for everybody.
In freedesktop.org Bugzilla #93649, U-null32 (u-null32) wrote : | #89 |
RX470 here, I've been playing for more than 1 hour and no crash so far. Thank you!
In freedesktop.org Bugzilla #93649, Hofmann-zachary (hofmann-zachary) wrote : | #90 |
One hour is not enough testing. I applied this patch to mesa 13.0.2 and the game still locks up.
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #91 |
(In reply to hofmann.zachary from comment #73)
> One hour is not enough testing. I applied this patch to mesa 13.0.2 and the
> game still locks up.
I believe you need mesa-git and llvm-svn for it to work.
In freedesktop.org Bugzilla #93649, U-null32 (u-null32) wrote : | #92 |
(In reply to hofmann.zachary from comment #73)
> One hour is not enough testing. I applied this patch to mesa 13.0.2 and the
> game still locks up.
Make sure you're using a patched version of the 32 bit libraries too. I managed to play almost 3 hours in a row in a full server and in different maps without issues at all.
These are the packages that I'm using:
* linux 4.8.12-2
* linux-firmware 20161005.9c71af9-1
* mesa-git 13.1.0_
* lib32-mesa-git 13.1.0_
* llvm-svn 4.0.0svn_r289147-1
* lib32-llvm-svn 4.0.0svn_r289117-1
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #93 |
(In reply to null32 from comment #75)
> (In reply to hofmann.zachary from comment #73)
> > One hour is not enough testing. I applied this patch to mesa 13.0.2 and the
> > game still locks up.
>
> Make sure you're using a patched version of the 32 bit libraries too. I
> managed to play almost 3 hours in a row in a full server and in different
> maps without issues at all.
>
> These are the packages that I'm using:
>
> * linux 4.8.12-2
> * linux-firmware 20161005.9c71af9-1
>
> * mesa-git 13.1.0_
> * lib32-mesa-git 13.1.0_
>
> * llvm-svn 4.0.0svn_r289147-1
> * lib32-llvm-svn 4.0.0svn_r289117-1
He confirmed it working :D
https:/
In freedesktop.org Bugzilla #93649, Hofmann-zachary (hofmann-zachary) wrote : | #94 |
Oops, forgot to confirm the patch working here too. Yes, the game works without crashing now.
In freedesktop.org Bugzilla #93649, Marek Olšák (maraeo) wrote : | #95 |
undefined (undefined) wrote : | #1 |
undefined (undefined) wrote : | #2 |
undefined (undefined) wrote : | #3 |
Ubuntu Foundations Team Bug Bot (crichton) wrote : | #4 |
The attachment "mesa source package patch" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.
[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]
tags: | added: patch |
undefined (undefined) wrote : | #5 |
after rebuilding mesa with the patch applied and installing the resulting packages (specifically libgl1-
Changed in mesa (Ubuntu): | |
importance: | Undecided → Medium |
Timo Aaltonen (tjaalton) wrote : | #6 |
fixed in zesty which has 13.0.3
Changed in mesa (Ubuntu): | |
status: | New → Fix Released |
Timo Aaltonen (tjaalton) wrote : | #7 |
there is another sru pending review for xenial and yakkety, but this could be added to them
Changed in mesa (Ubuntu Xenial): | |
importance: | Undecided → Medium |
Changed in mesa (Ubuntu Yakkety): | |
importance: | Undecided → Medium |
Timo Aaltonen (tjaalton) wrote : | #8 |
upstream 12.0.6-rc has this commit
Adam Conrad (adconrad) wrote : Please test proposed package | #9 |
Hello undefined, or anyone else affected,
Accepted mesa into yakkety-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-
Further information regarding the verification process can be found at https:/
Changed in mesa (Ubuntu Yakkety): | |
status: | New → Fix Committed |
tags: | added: verification-needed |
Changed in mesa (Ubuntu Xenial): | |
status: | New → Fix Committed |
Adam Conrad (adconrad) wrote : | #10 |
Hello undefined, or anyone else affected,
Accepted mesa into xenial-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-
Further information regarding the verification process can be found at https:/
Timo Jyrinki (timo-jyrinki) wrote : | #11 |
I happen to have Radeon 7750 and I'm happy with the new mesa 12.0.6-0ubuntu0 from proposed. I played Team Fortress 2 among else.
tags: |
added: verification-done removed: verification-needed |
Robie Basak (racb) wrote : | #12 |
12.0.6-0ubuntu0 doesn't exist. Which package versions were tested? 12.0.6-
Launchpad Janitor (janitor) wrote : | #13 |
This bug was fixed in the package mesa - 12.0.6-
---------------
mesa (12.0.6-
* New bugfix release. (LP: #1652564, #1652486)
* Backport to xenial. (LP: #1643789)
mesa (12.0.4-2ubuntu1) zesty; urgency=medium
* Merge from Debian
- New upstream bugfix release. (LP: #1641017)
* dri3-fix-
-- Timo Aaltonen <email address hidden> Fri, 20 Jan 2017 00:22:11 +0200
Changed in mesa (Ubuntu Xenial): | |
status: | Fix Committed → Fix Released |
Chris J Arges (arges) wrote : Update Released | #14 |
The verification of the Stable Release Update for mesa has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.
Robie Basak (racb) wrote : | #15 |
This has a couple of reverse dependency autopkgtest failures in Yakkety. I have requested retries as it isn't clear to me if they are intermittent or not.
Launchpad Janitor (janitor) wrote : | #16 |
This bug was fixed in the package mesa - 12.0.6-
---------------
mesa (12.0.6-
* New bugfix release. (LP: #1652486)
* Backport to yakkety.
mesa (12.0.4-2ubuntu1) zesty; urgency=medium
* Merge from Debian
- New upstream bugfix release. (LP: #1641017)
* dri3-fix-
mesa (12.0.4-2) unstable; urgency=medium
* Limit new glx symbols to !hurd-any. Should fix FTBFS on hurd.
mesa (12.0.4-1) unstable; urgency=medium
* New upstream release.
* not-installed: wglext.h got dropped from the tarball.
* mesa-common-dev: Remove mesa_glinterop.h, upstream doesn't install
it anymore.
* Update symbols of libegl1-mesa and libgl1-mesa-glx.
mesa (12.0.3-3) unstable; urgency=medium
* Limit libgbm1 dependency to !hurd-any (Closes: #841774). Thanks,
Samuel Thibault!
mesa (12.0.3-2) unstable; urgency=medium
* control: Add libtxc-dxtn-s2tc as an alternative in libgl1-mesa-dri's
Recommends (Closes: #839658).
* control: Add strictly versioned dependency on libgbm1 to libegl1-
mesa.
-- Timo Aaltonen <email address hidden> Wed, 18 Jan 2017 16:41:11 +0200
Changed in mesa (Ubuntu Yakkety): | |
status: | Fix Committed → Fix Released |
In freedesktop.org Bugzilla #93649, Timothy Arceri (t-fridey) wrote : | #96 |
*** Bug 95308 has been marked as a duplicate of this bug. ***
Changed in mesa: | |
importance: | Unknown → Medium |
status: | Unknown → Fix Released |
In freedesktop.org Bugzilla #93649, Amarildo-geral (amarildo-geral) wrote : | #97 |
Uh oh. This bug may be back.
I'm back on Linux. First time playing for more than 30 mins (my little sister was playing) PC hangs.
Will test it to see whether it's this hellish bug or not.
In freedesktop.org Bugzilla #93649, Alexdeucher (alexdeucher) wrote : | #98 |
(In reply to Amarildo from comment #80)
> Uh oh. This bug may be back.
>
> I'm back on Linux. First time playing for more than 30 mins (my little
> sister was playing) PC hangs.
>
> Will test it to see whether it's this hellish bug or not.
Not likely to be the same issue if there is a hang. Please file a new bug report.
Created attachment 120925
Kernel dmesg around the time of the lockup.
After a period of time playing the latest version of TF2, my GPU locks up. After the kernel tries to reset, the X becomes stuck and won't work. The rest of the system is fine however. Sometimes, the GPU will reset successfully and continue working, only to lockup later, eventually freezing X.
Hardware:
GPU: Gigabyte Radeon HD 7970 Ghz edition OC
CPU: AMD Phenom ii X6 1100T
MB: Asus Crosshair IV Formula
Software:
Mesa: 11.1.0
DRM: 2.4.65
LLVM: 3.7.0
X: 1.17.4
DDX: 7.6.1
Kernel: 4.3.3
I have a dmesg with debug turned on and a strace of X from around the time it crashes (attached). I reduced the log file to the relevant bits, as they are quite large. I'll retry with latest git, see if it helps anywhere.