Bug #565981 “[KMS] gem objects not deallocated” : Bugs : xorg-server package : Ubuntu

Revision history for this message

Tormod Volden (tormodvolden) wrote on 2010-04-18:

#1

Dependencies.txt Edit (3.6 KiB, text/plain; charset="utf-8")
RelatedPackageVersions.txt Edit (285 bytes, text/plain; charset="utf-8")

affects:	xorg (Ubuntu) → linux (Ubuntu)
summary:	- gem objects not deallocated + [KMS] gem objects not deallocated

Revision history for this message

Tormod Volden (tormodvolden) wrote on 2010-04-18:

#2

Restarting Xorg brings the gem object bytes down again, so I am not sure xorg-server (or driver) is not to blame.
$ cat /sys/kernel/debug/dri/0/gem_objects
156 objects
54063104 object bytes

Revision history for this message

marcinq (marcinq) wrote on 2010-04-19:

#3

I can confirm this on my rs690 (radeon x1200).

With kubuntu and normal use 5 hours are enough to skyrock ram usage from initial 300-400 mb to 1.2 gb. In the end it all ends up with heavy swaping or a lockup.

No such problems with UMS.

Revision history for this message

Bryce Harrington (bryce) wrote on 2010-04-20:

#4

Anyone happen to know when this issue first began?

Can someone try booting earlier kernels to find which are affected? (Assuming it to be a regression in the kernel)

If it sounds more like xorg-server, try booting earlier versions of it. There were a number of changes between April 15th and now which looked safe when we pulled them but would be worth ruling in or out as possibilities.

Changed in linux (Ubuntu Lucid):
importance:	Undecided → Critical

Revision history for this message

William Grant (wgrant) wrote on 2010-04-20:

#5

xorg-server -2ubuntu1 is good, -2ubuntu2 is bad. 114_dri2_make_sure_x_drawable_exists.patch is probably to blame.

Revision history for this message

Tormod Volden (tormodvolden) wrote on 2010-04-20:

#6

In that case I think the discussion here is relevant: https://bugs.freedesktop.org/show_bug.cgi?id=26394 and the "Track DRI2 drawables as resources, not privates" thread on xorg-devel ML, with four glx commits on xserver master 2010-04-16.

Revision history for this message

Tormod Volden (tormodvolden) wrote on 2010-04-20:

#7

A link to the ML thread: http://thread.gmane.org/gmane.comp.freedesktop.xorg.devel/6829/focus=7229

William Grant (wgrant) on 2010-04-20

affects:

linux (Ubuntu Lucid) → xorg-server (Ubuntu Lucid)

Revision history for this message

Vish (vish) wrote on 2010-04-20:

#8

Downgrading to xorg-server -2ubuntu1 does prevent the memory problems .

Changed in xorg-server (Ubuntu Lucid):
status:	New → Confirmed

Revision history for this message

Sebastian Martinez (tychocity) wrote on 2010-04-20:

#9

perhaps is my problem too

cat /sys/kernel/debug/dri/0/gem_objects
3640 objects
1269358592 object bytes
4 pinned
13766656 pin bytes
111054848 gtt bytes
234881024 gtt total

Revision history for this message

Tournier Mathieu (mathieutournier) wrote on 2010-04-20:

#10

I think this bug https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/564636 is a duplicate of this bug

Revision history for this message

Tormod Volden (tormodvolden) wrote on 2010-04-20:

#11

Running xorg-edgers xserver 1.8 now with 114_dri2_make_sure_x_drawable_exists.patch dropped (thanks Sarvatt!) and the problem can not be reproduced.

Revision history for this message

Robert Hooker (sarvatt) wrote on 2010-04-20:

#12

Well there are two options here, backport the changes from xserver master that fix this instead of using the 114 patch, or dropping the 2 glx 1.4 enablement patches and 114 completely. Just dropping 114 is not an option because it will regress things horribly to the point where closing clutter apps crashes the server.

I tried my hand at backporting the 2 commits here (xorg-server - 2:1.7.6-2ubuntu7.5) -
https://edge.launchpad.net/~sarvatt/+archive/bugs/+packages

but the patches need some *serious* review and it is only compile tested at the moment. This only affects people using the glx 1.4 enablement backports to xserver 1.7.x so it's not really upstream material.

The two patches:
http://sarvatt.com/downloads/patches/119_dri2_drawables.patch
http://sarvatt.com/downloads/patches/120_glx_drop_destroywindow.patch

The slightly more sane option I see at the moment is to revert the 2 glx 1.4 enablement patches as well as the 114 patch that only mattered for things using that. I have uploaded that combination to x-updates here -

https://edge.launchpad.net/~ubuntu-x-swat/+archive/x-updates

Revision history for this message

Robert Hooker (sarvatt) wrote on 2010-04-20:

#13

By the way, this is the upstream bug where the 114 patch originated from

https://bugs.freedesktop.org/show_bug.cgi?id=26394

Revision history for this message

Robert Hooker (sarvatt) wrote on 2010-04-20:

#14

And this is the bug which the 114 patch fixed from which will regress if it is just dropped without dropping 03_fedora_glx_versioning.diff and 04_fedora_glx14-swrast.diff

https://bugs.edge.launchpad.net/ubuntu/+bug/550218

Revision history for this message

Vish (vish) wrote on 2010-04-20:

#15

I tried both the ppa's
https://edge.launchpad.net/~ubuntu-x-swat/+archive/x-updates (xorg-server - 2:1.7.6-2ubuntu7~xup)
https://edge.launchpad.net/~sarvatt/+archive/bugs/+packages (xorg-server - 2:1.7.6-2ubuntu7.5)

Both cause severe problems for me and couldnt use the system for more than 5mins.
Everything would lockup at some compiz use and i can not do anything , the screen just froze and I couldnt return to VT nor did Alt+SysRq+K help.
Hard to hard shutdown the system.
Probably not of much use but , from /var/log/messages http://paste.ubuntu.com/419425/ http://paste.ubuntu.com/419432/
[i had to hard shutdown nearly 10times , seems the SAK worked a couple of times in the background , but the screen was frozen ]

Revision history for this message

Robert Hooker (sarvatt) wrote on 2010-04-20:

#16

Yeah sorry about that, the one I uploaded to ubuntu-x-swat had local changes in it by mistake that would have made it make no difference. The fixed one is in there now (2ubuntu7~xup2)

Revision history for this message

Adam Lyall (magicmyth) wrote on 2010-04-21:

#17

Just confirming that Robert's xorg packages (2ubuntu7~xup2) have fixed the issue I mentioned in #564636. So far no stability issues. I will report back on whether it fixes the ATI GPU issue when I get the chance to test that.

Revision history for this message

Petar Velkovski (pvelkovski) wrote on 2010-04-21:

#18

Preliminary testing shows that Robert's xorg packages from https://launchpad.net/~ubuntu-x-swat/+archive/x-updates fix this bug for me too. (Intel graphic)

Revision history for this message

Martin Pitt (pitti) wrote on 2010-04-21:

#19

Robert, from the current discussion it seems that it's quite safe to roll back the two glx 1.4 and the 114 patch. Personally I would rather like to see this fixed in final, since it's such a notable regression and the 114 patch was just introduced a few days ago.

I heard that the rdepends were tested how they behave wrt. rolling back GLX from 1.4 to 1.2. From a more theoretical standpoint, what does that change entail? Does it drop a few GLX features which would help performance improvements in some cases? What do client apps do if those functions are suddenly not available any more?

Revision history for this message

Martin Pitt (pitti) wrote on 2010-04-21:

#20

I started a testing wiki page at https://wiki.ubuntu.com/X/Testing/GEMLeak

I'll send a call for testing to ubuntu-devel@.

Revision history for this message

Alessandro Ghersi (alessandro-ghersi) wrote on 2010-04-21:

#21

I added the ppa, I did the upgrade and reboot but glxinfo | grep "GLX version" still says 1.4
Is it right? I ask because the wiki says "Please verify that glxinfo | grep "GLX version" says "1.2", not "1.4"."
I've xserver-common and xserver-xorg-core 2:1.7.6-2ubuntu7~xup2 installed.

Revision history for this message

Timo Jyrinki (timo-jyrinki) wrote on 2010-04-21:

#22

As a small note, applications requiring GLX 1.4 generally do not start if the 1.4-specific extensions are not available. I never got to studying which are such applications, even though it crossed my mind. Probably games, some professional proprietary applications etc. Quite a few applications depend on those, since GLX 1.4 is ca. 10 years old, even though the free software 3D stack hasn't supported it.

Revision history for this message

Erick Brunzell (lbsolost) wrote on 2010-04-21:

#23

I caught wind of this at the forums:

http://ubuntuforums.org/showthread.php?p=9154355&posted=1#post9154355

And as I said there, "Please excuse me for being a pain but being visually impaired I can sometimes overlook the obvious, but my blind old self can't see is how to add myself and my machine to "testing" here":

https://wiki.ubuntu.com/X/Testing/GEMLeak

So I'll just be using this bug report in the interim.

I can tell you that I'm using Intel 82945G/GZ Integrated Graphics and <glxinfo | grep "GLX version"> produces
"GLX version: 1.4" in an UNAFFECTED fresh install from 04/01/2010, but it DOES affect an upgrade from Karmic to Lucid that was performed less than 48 hours ago!

I'll leave this well working install alone and start testing the other.

Uptime on the non-effected install previously mentioned:

lance@lance-desktop:~$ uptime
13:02:16 up 4:34, 2 users, load average: 0.34, 0.60, 0.49

13+ hours UP and np problem with "GLX version: 1.4"!

The other had been UP not nearly as long!

I'll stay available!

Revision history for this message

Erick Brunzell (lbsolost) wrote on 2010-04-21:

#24

Something possibly helpful, or not, since I'm visually impaired one of the first things I do is right click the desktop, adjust fonts, etc, and particularly DISABLE the 3D stuff. That is, even if enabled by default, if Visual Effects shows "Normal", I change it to "None".

That's how the unaffected Lucid desktop was prior to booting into the affected one. I tried to change it back to "Normal" and it wouldn't!

I booted into the affected one and checked. It is also set to "None" and I think I'll leave it alone and boot into one of the installs I did during iso testing to try the "patch/revert". That way I'll still be able to gather info from the one that slowed to a crawl and the one that seems unaffected.

Someone much smarter than I am would have to tell me what info gather, and how to gather it.

Revision history for this message

Vish (vish) wrote on 2010-04-21:

#25

Using 2ubuntu7~xup2 , I'm not having any memory problems i had reported earlier Bug #563400 (thanks Sarvatt!)
This is on an ATI X1400 Mobility radeon [RV515]

Revision history for this message

Luka (luka-mrovlje-gmail) wrote on 2010-04-21:

#26

i'd like to report that I do not notice this regressions. My packages are update. I do not use proposed ppa to downgrade glx to 1.2.

My video adapter:
VGA compatible controller: Intel Corporation Mobile GME965/GLE960 Integrated Graphics Controller (rev 0c)

os: lucid lynx 10.04 32bit Desktop which was upgraded from karmic.

important parts of glxinfo report:
direct rendering: Yes
server glx version string: 1.4
client glx version string: 1.4
GLX version: 1.4

Running compiz, my uptime is about 4 days and I use suspend that works great. There are no speed regressions to report. In fact this is the first time that I am able to play opengl games like nexuiz and penumba rather smoothly. But I do experience occasional xorg restart, but only when playing penumbra and that is probably not related to this bug.

I am sure that doesn't help much those affected, but I hope someone will have some use from a good report as well.

Revision history for this message

Erick Brunzell (lbsolost) wrote on 2010-04-21:

#27

OK, while installing packages on my "third" Lucid so I can just use it, I browsed the apt history files of the others to see if that might shed some light. It did to me, but it may not be helpful to you.

On my "main" Lucid (the one NOT affected) I'd let update mangler remove "compiz" and "compiz-gnome" on 04/09/2010 (obviously because I don't use them anyway) so that's why it was NOT affected.

Of course we don't want everyone to remove compiz so I'll keep running Lucid #3 with the reversion/patch + desktop effects enabled and report back tomorrow.

Revision history for this message

Andy (andy-xillean) wrote on 2010-04-21:

#28

I am running Lucid in Virtualbox with Compiz enabled. Host Machine is Nvidia. I just ran all the updates
then rebooted and checked my version

ubuntu@lucid-test:~$ glxinfo|grep "version"
server glx version string: 1.2 Chromium
client glx version string: 1.2 Chromium
GLX version: 1.3
OpenGL version string: 2.0 Chromium 1.9

I ran the update manager again and it said the system is up to date. I am not noticing any problems
But shouldn't it be showing 1.4 instead of 1.3 ?

Revision history for this message

W00ster (svein-brostigen) wrote on 2010-04-21:

#29

Running Lenovo ThinkPad T400, model 6475ZN2 with the Xserver from https://launchpad.net/~ubuntu-x-swat/+archive/x-updates.

After rebooting and starting the new Xserver, running some 1080p video in full screen and some other applications normally taxing the Xerver, GLX and memory quite hard, the number of bytes used by GEM objects have dropped to a more normal level:
1198 objects
126042112 object bytes
6 pinned
16838656 pin bytes
79060992 gtt bytes
234881024 gtt total

The number of object bytes used to be larger by a factor of 10.

Bryce Harrington (bryce) on 2010-04-21

description:

updated

Bryce Harrington (bryce) on 2010-04-21

description:

updated

Bryce Harrington (bryce) on 2010-04-21

description:	updated
description:	updated

Kees Cook (kees) on 2010-04-21

description:

updated

Revision history for this message

Bryce Harrington (bryce) wrote on 2010-04-21: Re: [Bug 565981] Re: [KMS] gem objects not deallocated

#30

On Wed, Apr 21, 2010 at 09:19:49PM -0000, Andy wrote:
> I am running Lucid in Virtualbox with Compiz enabled. Host Machine is Nvidia. I just ran all the updates
> then rebooted and checked my version

The -nvidia binary driver includes its own GLX library, so this bug is
completely irrelevant in that case. Virtualbox also includes its own
video driver, although dunno what it does for glx. In any case, your
configuration does not sound like one that requires being tested, but
thanks for the feedback.

Revision history for this message

discord (colin.williams) wrote on 2010-04-22:

#31

I've been running the beta for a couple of months,

00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)

and I haven't been affected either. I tried the script above to test, but the script didn't work. Seems there's no bug on the 945 and 965 chipsets.

Revision history for this message

discord (colin.williams) wrote on 2010-04-22:

#32

whoops, thought I wasn't affected, but after running a clip through vlc for half an hour, my memory usage started to increase sharply. I'm not sure this has always been a problem since I've been running testing for quite awhile and watched dvds without issue a couple of times..

Revision history for this message

Conn O Griofa (psyke83) wrote on 2010-04-22:

#33

Unfortunately, this proposed X server update has not resolved the problem on my system. I'm using a stock Ubuntu Lucid installation, compiz enabled, and with the proposed X server update.

conn@nx9010:~$ lspci | grep VGA
01:05.0 VGA compatible controller: ATI Technologies Inc Radeon IGP 330M/340M/350M

conn@nx9010:~$ glxinfo | grep "GLX version"
GLX version: 1.2

After approximately one hour uptime (watching a Flash video [1]), I noticed the system becoming sluggish, especially scrolling pages in Firefox. This is the output which I captured shortly after noticing the sluggishness had begun:

conn@nx9010:~$ pid=`pidof X` ; for t in `seq 1 10`; do eog /usr/share/backgrounds ; echo `grep "object bytes" /sys/kernel/debug/dri/0/gem_objects` `ps ocomm,vsz,rss $pid |grep X`; done
290811904 object bytes Xorg 32120 24264
292790272 object bytes Xorg 32120 24264
294481920 object bytes Xorg 32120 24264
295972864 object bytes Xorg 32120 24264
297984000 object bytes Xorg 32120 24264
302874624 object bytes Xorg 32120 24264
304979968 object bytes Xorg 32888 24840
306962432 object bytes Xorg 32120 24264
308514816 object bytes Xorg 32120 24264
310161408 object bytes Xorg 32120 24264

There also seems to be something strange with the GEM pinned/gtt counts:

conn@nx9010:~$ cat /sys/kernel/debug/dri/0/gem_objects
1212 objects
308838400 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total

I was aware of the memory leak in KMS for some time, but only discovered this bug report today. Aside from disabling KMS, the only way in which I was able to stop this memory leak was to use a combination of the xorg-edgers packages and the mainline kernel 2.6.34-rc5.

I am unsure which specific component eliminated the problem, but the problem disappeared only after installing kernel 2.6.34-rc5 [2] (which may or may not be coincidence, as an update in one of the xorg-edgers packages may have really solved the issue). I will now test the xorg-edgers packages against the official 2.6.32-21-generic kernel to try to isolate the problem.

If you require any more information or a separate bug report filed, please let me know.

[1] http://www.southparkstudios.com/episodes/103922/
[2] http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.34-rc5-lucid/