Xorg freeze (nouveau/ttm use-after-free with full-screen video)

Bug #1940154 reported by Oisín Mac Fhearaí
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Since upgrading to Ubuntu 21.04 yesterday, I've noticed several hangs during full-screen video playback (watching Youtube videos in Chromium). Often, it happens just after or during the switch to full-screen.

The system stays running and I can connect via ssh, but restarting gdm3 has no effect and most of the time I've had to reboot. The latest hang recovered automatically when I left the machine alone for about 2 minutes, and produced the following messages in dmesg (I'll attach a more complete log):

```
[ 3520.856803] [TTM] Buffer eviction failed
[ 3520.856856] ------------[ cut here ]------------
[ 3520.856859] refcount_t: underflow; use-after-free.
[ 3520.856888] WARNING: CPU: 3 PID: 6842 at lib/refcount.c:28 refcount_warn_saturate+0xae/0xf0
...
[ 3520.857292] Call Trace:
[ 3520.857305] ttm_bo_put+0x3f/0x50 [ttm]
[ 3520.857331] nouveau_gem_new+0xc4/0x100 [nouveau]
[ 3520.857611] ? nouveau_gem_new+0x100/0x100 [nouveau]
[ 3520.857871] nouveau_gem_ioctl_new+0x5b/0x100 [nouveau]
[ 3520.858133] ? nouveau_gem_new+0x100/0x100 [nouveau]
[ 3520.858398] drm_ioctl_kernel+0xae/0xf0 [drm]
[ 3520.858500] drm_ioctl+0x245/0x400 [drm]
[ 3520.858586] ? nouveau_gem_new+0x100/0x100 [nouveau]
[ 3520.858850] ? __fget_files+0x5f/0x90
[ 3520.858864] ? __fget_light+0x32/0x80
[ 3520.858879] nouveau_drm_ioctl+0x66/0xc0 [nouveau]
[ 3520.859145] __x64_sys_ioctl+0x91/0xc0
[ 3520.859158] do_syscall_64+0x38/0x90
[ 3520.859170] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3520.859182] RIP: 0033:0x7fce8ceb0317
```

ProblemType: Bug
DistroRelease: Ubuntu 21.04
Package: xorg 1:7.7+22ubuntu1
ProcVersionSignature: Ubuntu 5.11.0-31.33-generic 5.11.22
Uname: Linux 5.11.0-31-generic x86_64
NonfreeKernelModules: wl
.tmp.unity_support_test.0:

ApportVersion: 2.20.11-0ubuntu65.1
Architecture: amd64
CasperMD5CheckResult: unknown
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: None
CurrentDesktop: ubuntu:GNOME
Date: Mon Aug 16 23:36:04 2021
DistUpgraded: 2021-08-15 02:10:41,733 DEBUG Running PostInstallScript: '/usr/lib/ubuntu-advantage/upgrade_lts_contract.py'
DistroCodename: hirsute
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes, including running git bisection searches
GpuHangFrequency: Several times a day
GpuHangReproducibility: Occurs more often under certain circumstances
GpuHangStarted: Immediately after installing this version of Ubuntu
GraphicsCard:
 NVIDIA Corporation GT218M [GeForce 310M] [10de:0a70] (rev a2) (prog-if 00 [VGA controller])
   Subsystem: Samsung Electronics Co Ltd GT218M [GeForce 310M] [144d:c079]
InstallationDate: Installed on 2014-06-27 (2607 days ago)
InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Release amd64 (20140417)
MachineType: SAMSUNG ELECTRONICS CO., LTD. Q430/Q530
ProcEnviron:
 LANGUAGE=en_IE:en
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_IE.UTF-8
 SHELL=/usr/bin/zsh
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.11.0-31-generic root=UUID=fc20db83-0138-4e0c-b7e9-04da38c1e9e9 ro quiet splash acpi_backlight=vendor crashkernel=512M-:192M acpi_enforce_resources=lax crashkernel=512M-:192M vt.handoff=7
SourcePackage: xorg
Symptom: display
Title: Xorg freeze
UpgradeStatus: Upgraded to hirsute on 2021-08-15 (1 days ago)
dmi.bios.date: 05/29/2010
dmi.bios.release: 2.0
dmi.bios.vendor: Phoenix Technologies Ltd.
dmi.bios.version: 02KF.M008.20100529.KSJ
dmi.board.asset.tag: Tag 12345
dmi.board.name: Q430/Q530
dmi.board.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.board.version: Not Applicable
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 9
dmi.chassis.vendor: SAMSUNG ELECTRONICS CO., LTD.
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnPhoenixTechnologiesLtd.:bvr02KF.M008.20100529.KSJ:bd05/29/2010:br2.0:svnSAMSUNGELECTRONICSCO.,LTD.:pnQ430/Q530:pvrNotApplicable:rvnSAMSUNGELECTRONICSCO.,LTD.:rnQ430/Q530:rvrNotApplicable:cvnSAMSUNGELECTRONICSCO.,LTD.:ct9:cvrN/A:
dmi.product.name: Q430/Q530
dmi.product.version: Not Applicable
dmi.sys.vendor: SAMSUNG ELECTRONICS CO., LTD.
version.compiz: compiz 1:0.9.14.1+20.10.20200813-0ubuntu4
version.libdrm2: libdrm2 2.4.105-3~21.04.1
version.libgl1-mesa-dri: libgl1-mesa-dri 21.0.3-0ubuntu0.2
version.libgl1-mesa-glx: libgl1-mesa-glx 21.0.3-0ubuntu0.2
version.xserver-xorg-core: xserver-xorg-core 2:1.20.11-1ubuntu1.1
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.10.6-2build1
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-2
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20200714-1ubuntu1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.17-1
xserver.bootTime: Sat Aug 14 22:37:51 2021
xserver.configfile: default
xserver.errors:

xserver.logfile: /var/log/Xorg.0.log
xserver.outputs:

xserver.version: 2:1.20.11-1ubuntu1~20.04.2

Revision history for this message
Oisín Mac Fhearaí (denpashogai) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thanks for the bug report. That's a kernel crash in the nouveau driver. But that driver is not well supported so we would usually recommend the official Nvidia driver instead. Unfortunately:

1. Your GeForce 310M GPU requires the Nvidia 340 driver only. Newer drivers don't support that GPU; and

2. The Nvidia 340 driver is no longer supported by Nvidia and no longer shipped in Ubuntu 21.04.

I suggest using only Ubuntu 20.04 on this machine, which has a working version of the Nvidia 340 driver that you can install via the 'Additional Drivers' app.

affects: xorg (Ubuntu) → linux (Ubuntu)
Revision history for this message
Oisín Mac Fhearaí (denpashogai) wrote :

Hi Daniel, thanks for your quick reply. Do you mean the nouveau driver is abandoned, or just doesn't have much activity lately?

Rolling back to an old kernel version (and Ubuntu distro) really feels like a failure. I'd be happy to try to debug the nouveau driver, although it sounds like I'm on my own and I don't really know anything about graphics drivers. Do you know of a good resource for getting started?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The nouveau driver has never been well supported. It remains buggy for years even for old hardware so the outlook is not great there. If you would like to debug the nouveau driver then it lives in the Linux kernel at:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/nouveau

The other information I've been referring to comes from these pages:

https://www.nvidia.com/Download/index.aspx?lang=en-us
https://launchpad.net/ubuntu/+source/nvidia-graphics-drivers-340
https://launchpad.net/ubuntu/+source/linux
https://launchpad.net/ubuntu/+source/linux-hwe-5.11
https://nvidia.custhelp.com/app/answers/detail/a_id/3142/~/support-timeframes-for-unix-legacy-gpu-releases

Nvidia has never supported their own chips for more than a few years and your Samsung machine appears to be 11 years old. You would have less trouble if the machine had AMD or Intel graphics. Otherwise if you use Nvidia you will either need newer hardware or to stay on an older OS like Ubuntu 20.04.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Oisín Mac Fhearaí (denpashogai) wrote :

Thanks Daniel. I've had on-and-off problems with both the nouveau and NVidia closed-source drivers pretty much since I got this thing and at some point vowed that my next machine would use an AMD card, since I've heard the open-source AMD driver is good.

I'll take a look at those links and see if I can do some debugging, since there might be other people out there who are affected by this too, and it'd be a shame to have to stop upgrading their Linux systems as a result. Unless a machine breaks down completely, I don't see why it shouldn't run Linux for more than 11 years.

I've noticed at least two flavours of error stacktrace that happen, so there may be two separate bugs. The first (as in the opening report) starts with "refcount_t: underflow; use-after-free" in lib/refcount.c:28 refcount_warn_saturate+0xae/0xf0, and the second starts with "Trying to vfree() bad address" in mm/vmalloc.c:2263 __vunmap+0x263/0x280.

The rest of the stack trace is somewhat different, although they both pass through "ttm_bo_put". They both look to be memory management errors (use-after-free and possibly a double-free?).

Revision history for this message
Oisín Mac Fhearaí (denpashogai) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

> I don't see why it shouldn't run Linux for more than 11 years.

Linux itself is not the problem. The problem is just with Nvidia GPUs more than a few years old. Nvidia stops supporting older GPUs in their current drivers after a few years. And the older drivers that do support the older GPUs only support older kernel versions. So you would need to be using Ubuntu 20.04 with kernel 5.4 on this machine and then you would have the ability to use the stable Nvidia-340 driver. Also Ubuntu 20.04 is supported for much longer than 21.04 (https://wiki.ubuntu.com/Releases).

The free nouveau driver that's built into the Linux kernel is frankly unfinished and causes a lot of bugs. Even when it's not causing bugs, the performance of that driver does not compare to Nvidia's own proprietary drivers.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.