Nouveau TTM Failed to find memory space for buffer

Bug #1406401 reported by linas
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
xorg (Ubuntu)
Expired
Low
Unassigned

Bug Description

I've been wrestling with a 3D graphics driver problem for the last few months, Perhaps its time to report it.

This appears to be some sort of bad interaction between Xorg and the nouveau kernel drivers. Similar bugs have been seen by others, on other OS'es and spcifically on other graphics cards (e.g. Radeon). Seem to be few/no reports for nouveau.

Symptoms: X11 background fails to repaint. Eventually X11 becomes unresponsive. Sometimes restarting X is enough to defer the problem, sometimes a reboot is required. Usually takes a week or so for the problem to recurr. dmesg and /var/log/syslog fill up with vast numbers of this:

Dec 29 14:30:17 fanny kernel: [258314.187016] [TTM] Failed to find memory space for buffer 0x
ffff880808666c00 eviction
Dec 29 14:30:17 fanny kernel: [258314.187022] [TTM] No space for ffff880808666c00 (884 pages,
 3536K, 3M)
Dec 29 14:30:17 fanny kernel: [258314.187024] [TTM] placement[0]=0x00070002 (1)
Dec 29 14:30:17 fanny kernel: [258314.187026] [TTM] has_type: 1
Dec 29 14:30:17 fanny kernel: [258314.187027] [TTM] use_type: 1
Dec 29 14:30:17 fanny kernel: [258314.187028] [TTM] flags: 0x0000000A
Dec 29 14:30:17 fanny kernel: [258314.187030] [TTM] gpu_offset: 0x00000000
Dec 29 14:30:17 fanny kernel: [258314.187031] [TTM] size: 131072
Dec 29 14:30:17 fanny kernel: [258314.187032] [TTM] available_caching: 0x00070000
Dec 29 14:30:17 fanny kernel: [258314.187034] [TTM] default_caching: 0x00010000
Dec 29 14:30:17 fanny kernel: [258314.187037] nouveau E[Xorg[2474]] fail ttm_validate
Dec 29 14:30:17 fanny kernel: [258314.187039] nouveau E[Xorg[2474]] validate vram_list
Dec 29 14:30:17 fanny kernel: [258314.187042] nouveau E[Xorg[2474]] validate: -12

TTM appears to be a kernel module that performs memory management of the memory on the graphics card. presuably X11 and/or Mesa is failing to release unused memory.

My setup is unusual: I am using TWO graphics cards, not one:

01:03.0 VGA compatible controller: NVIDIA Corporation NV44A [GeForce 6200] (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation G72 [GeForce 7300 LE] (rev a1)

It appears only the second one is affected, as that is where the visual symptoms show up.

Setup has three! monitors. Xrandr is NOT running, since Xrandr appears to be unable to deal with the monitor setup (two of the monitors are rotated, the third is not; this seems to drive xrandr crazy, so it does not come up). I mention this only because I'm wondering if the graphics card RAM me=nagement is done differrently if xrandr is present.

Kernel is the stock kernel:
Linux version 3.13.0-40-generic (buildd@comet) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #69-Ubuntu SMP Thu Nov 13 17:53:56 UTC 2014

System has 32GB RAM. Here is some more random info:
From the boot dmesg:
[ 29.506436] nouveau [ DEVICE][0000:02:00.0] BOOT0 : 0x046100a3
[ 29.506471] nouveau [ DEVICE][0000:02:00.0] Chipset: G72 (NV46)
[ 29.506503] nouveau [ DEVICE][0000:02:00.0] Family : NV40
[ 29.507685] nouveau [ VBIOS][0000:02:00.0] checking PRAMIN for image...
[ 29.602531] nouveau [ VBIOS][0000:02:00.0] ... appears to be valid
[ 29.602559] nouveau [ VBIOS][0000:02:00.0] using image from PRAMIN
[ 29.602696] nouveau [ VBIOS][0000:02:00.0] BIT signature found
[ 29.602721] nouveau [ VBIOS][0000:02:00.0] version 05.72.22.49.09
[ 29.602941] nouveau 0000:02:00.0: irq 79 for MSI/MSI-X
[ 29.602950] nouveau [ PMC][0000:02:00.0] MSI interrupts enabled
[ 29.603001] nouveau [ PFB][0000:02:00.0] RAM type: DDR2
[ 29.603024] nouveau [ PFB][0000:02:00.0] RAM size: 128 MiB
[ 29.603047] nouveau [ PFB][0000:02:00.0] ZCOMP: 0 tags
[ 29.643916] nouveau [ PTHERM][0000:02:00.0] FAN control: none / external
[ 29.643960] nouveau [ PTHERM][0000:02:00.0] fan management: automatic
[ 29.643987] nouveau [ PTHERM][0000:02:00.0] internal sensor: yes
[ 29.663876] nouveau [ CLK][0000:02:00.0] 20: core 450 MHz shader 450 MHz memory 648 MHz
[ 29.663916] nouveau [ CLK][0000:02:00.0] --: core 199 MHz memory 391 MHz
[ 29.663976] nouveau [ DRM] VRAM: 124 MiB
[ 29.663994] nouveau [ DRM] GART: 512 MiB
[ 29.664761] nouveau [ DRM] TMDS table version 1.1
[ 29.665472] nouveau W[ DRM] TMDS table script pointers not stubbed
[ 29.666179] nouveau [ DRM] DCB version 3.0
[ 29.666884] nouveau [ DRM] DCB outp 00: 01000300 00000028
[ 29.667588] nouveau [ DRM] DCB outp 01: 02011310 00000028
[ 29.668282] nouveau [ DRM] DCB outp 02: 01011312 00000000
[ 29.668963] nouveau [ DRM] DCB outp 03: 020223f1 00c0c080
[ 29.669627] nouveau [ DRM] DCB conn 00: 0000
[ 29.670273] nouveau [ DRM] DCB conn 01: 2130
[ 29.670904] nouveau [ DRM] DCB conn 02: 0210
[ 29.671529] nouveau [ DRM] DCB conn 03: 0211
[ 29.672151] nouveau [ DRM] DCB conn 04: 0213
[ 29.675135] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 29.675758] [drm] No driver support for vblank timestamp query.
[ 29.676441] nouveau [ DRM] 0xC633: Parsing digital output script table
[ 29.730082] nouveau [ DRM] MM: using M2MF for buffer copies
[ 29.730700] nouveau [ DRM] Setting dpms mode 3 on TV encoder (output 3)
[ 29.889953] nouveau [ DRM] allocated 1920x1080 fb: 0x9000, bo ffff88082a59f000
[ 29.890900] fbcon: nouveaufb (fb1) is primary device
[ 29.890901] fbcon: Remapping primary device, fb1, to tty 1-63
[ 29.931768] nouveau [ DRM] 0xC633: Parsing digital output script table
[ 30.336258] nouveau 0000:02:00.0: fb1: nouveaufb frame buffer device
[ 30.336286] [drm] Initialized nouveau 1.1.2 20120801 for 0000:02:00.0 on minor 1
[ 30.336366] nouveau 0000:01:03.0: enabling device (0004 -> 0006)
[ 30.336682] [drm] hdmi device not found 1 3 1
[ 30.336782] nouveau [ DEVICE][0000:01:03.0] BOOT0 : 0x04a100a1
[ 30.336789] nouveau [ DEVICE][0000:01:03.0] Chipset: NV44A (NV4A)
[ 30.336795] nouveau [ DEVICE][0000:01:03.0] Family : NV40
[ 30.337817] nouveau [ VBIOS][0000:01:03.0] checking PRAMIN for image...
[ 30.337826] nouveau [ VBIOS][0000:01:03.0] ... signature not found
[ 30.337833] nouveau [ VBIOS][0000:01:03.0] checking PROM for image...
[ 30.688047] nouveau [ VBIOS][0000:01:03.0] ... appears to be valid
[ 30.688054] nouveau [ VBIOS][0000:01:03.0] using image from PROM
[ 30.688179] nouveau [ VBIOS][0000:01:03.0] BIT signature found
[ 30.688185] nouveau [ VBIOS][0000:01:03.0] version 05.44.a2.10.01
[ 30.688304] nouveau [ DEVINIT][0000:01:03.0] adaptor not initialised
[ 30.688319] nouveau [ VBIOS][0000:01:03.0] running init tables
[ 30.705576] nouveau [ PFB][0000:01:03.0] RAM type: DDR2
[ 30.705902] nouveau [ PFB][0000:01:03.0] RAM size: 512 MiB
[ 30.706367] nouveau [ PFB][0000:01:03.0] ZCOMP: 0 tags
[ 30.797629] nouveau [ PTHERM][0000:01:03.0] FAN control: toggle
[ 30.797958] nouveau [ PTHERM][0000:01:03.0] fan management: automatic
[ 30.798278] nouveau [ PTHERM][0000:01:03.0] internal sensor: yes
[ 30.819351] nouveau [ CLK][0000:01:03.0] 20: core 300 MHz shader 300 MHz memory 532 MHz
[ 30.819696] nouveau [ CLK][0000:01:03.0] --: core 200 MHz memory 400 MHz
[ 30.820093] nouveau [ DRM] VRAM: 508 MiB
[ 30.821312] nouveau [ DRM] GART: 512 MiB
[ 30.822555] nouveau [ DRM] TMDS table version 1.1
[ 30.823779] nouveau [ DRM] DCB version 3.0
[ 30.825007] nouveau [ DRM] DCB outp 00: 02001310 00000028
[ 30.826249] nouveau [ DRM] DCB outp 01: 01001312 00000020
[ 30.827476] nouveau [ DRM] DCB outp 02: 01010300 00000028
[ 30.828699] nouveau [ DRM] DCB outp 03: 020223f1 00c0c030
[ 30.829921] nouveau [ DRM] DCB conn 00: 0000
[ 30.831127] nouveau [ DRM] DCB conn 01: 2230
[ 30.832195] nouveau [ DRM] DCB conn 02: 0110
[ 30.833386] nouveau [ DRM] DCB conn 03: 0111
[ 30.834483] nouveau [ DRM] DCB conn 04: 0113
[ 30.835700] nouveau [ DRM] Adaptor not initialised, running VBIOS init t
ables.
[ 30.837058] nouveau [ DRM] Saving VGA fonts
[ 30.975047] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 30.975372] [drm] No driver support for vblank timestamp query.
[ 30.975700] nouveau [ DRM] 0xD2AC: Parsing digital output script table
[ 31.036070] nouveau [ DRM] MM: using M2MF for buffer copies
[ 31.036405] nouveau [ DRM] Setting dpms mode 3 on TV encoder (output 3)
[ 31.130437] nouveau [ DRM] allocated 1920x1080 fb: 0x9000, bo ffff880829bb3000
[ 31.130948] nouveau 0000:01:03.0: fb2: nouveaufb frame buffer device
[ 31.131282] [drm] Initialized nouveau 1.1.2 20120801 for 0000:01:03.0 on minor 2

lspci shows an iommu:
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 Northbridge only dual slot (2x16) PCI-e GFX Hydra part (rev 02)
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD/ATI] RD990 I/O Memory Management Unit (IOMMU)
00:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (PCI express gpp port H)
00:0a.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (external gfx1 port A)
00:0b.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RD890 PCI to PCI bridge (NB-SB link)

CPU is model name : AMD Opteron(tm) Processor 6344
with oodles of cores. 32GB system RAM

kernel commandline is
BOOT_IMAGE=/boot/vmlinuz-3.13.0-40-generic ro vram_pushbuf=1 nomdmonddf nomdmonisw

/var/log/Xorg.0.log doesn't have much interesting in it ..

Tags: bot-comment
Revision history for this message
linas (linasvepstas) wrote :

Also, about the IOMMU:

[ 2.112650] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[ 2.112653] AMD-Vi: Interrupt remapping enabled
[ 2.112673] pci 0000:00:00.2: irq 72 for MSI/MSI-X
[ 2.123906] AMD-Vi: Lazy IO/TLB flushing enabled
[ 2.206164] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[ 2.206170] software IO TLB [mem 0x9bf80000-0x9ff80000] (64MB) mapped at [ffff88009bf80000-ffff88009ff7ffff]

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1406401/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
affects: ubuntu → xorg (Ubuntu)
Revision history for this message
penalvch (penalvch) wrote :

linas, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal as it will automatically gather and attach updated debug information to this report:

apport-collect -p xorg 1406401

Please ensure you have xdiagnose installed, and that you click the Yes button for attaching additional debugging information.

As well, given the information from the prior release is already available, testing a release prior to the development one would not be helpful.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

Changed in xorg (Ubuntu):
importance: Undecided → Low
status: New → Incomplete
Revision history for this message
linas (linasvepstas) wrote :

Hi Christopher,

I spent 2 or 3 16-hour days trying to bisect this bug every-which way. I eventually punted, and just bought a newer graphics card -- they're cheap -- a few hundred $$ and that "fixed" the problem.

Summary: the old card works great under ubuntu 12.04 but has issues, as described above, under 14.04

My debug efforts centered on doing a git-bisect of the kernel git sources. Somewhere early in the kernel-3.0 series, the card works fine. Somwhere a bit before linux 3.13, the card either would not run X11 at all (and hose the tty as well) or x11 would come up for a few hours, and then hit the TTM messages.

I reported the git bisects (two of them, one for each bug) to the nouveau folks, but did not get impression that they were able to do anything about it.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for xorg (Ubuntu) because there has been no activity for 60 days.]

Changed in xorg (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.