Switching to another user and then to anything else causes freeze in drm_intel_bo_unreference ()

Bug #348428 reported by ichudov on 2009-03-25
72
This bug affects 6 people
Affects Status Importance Assigned to Milestone
xf86-video-intel
Fix Released
Medium
xserver-xorg-video-intel (Ubuntu)
High
Unassigned
Jaunty
High
Unassigned

Bug Description

Binary package hint: gdm

I am using Jaunty Jackalope.

If I log on, then "switch user" to another user, and do anything to get out of that second session, laptop freezes up. It does not respond to any keyboard or mouse movement.

Steps to reproduce:

1) Log on as myself
2) Open a "Guest session" (but this will work with ANY second session)
3) Do anything to get out of the guest session, for example Ctrl-Alt-F1 or "Switch user"

The laptop freezes up and becomes unusable. However, linux kernel continues to run and I can SSH to the machine.

If I ssh, su to root, and do /etc/init.d/gdm restart, all sessions disappear and things return to normal as if no one was logged on.

This is a very obnoxious bug, as the laptop is used by my entire family and is almost unusable due to this bug.

This does NOT in any way depend on whether I run compositing window manager.

Searches in Xorg.*.log yield messages in the last session's log (which I am not sure have much to do with behavior):

[ 8.153714] (II) intel(0): Modeline "640x400"x85.1 31.50 640 672 736 832 400 401 404 445 -hsync +vsync (37.9 kHz)
[ 8.153743] (II) intel(0): Modeline "640x350"x85.1 31.50 640 672 736 832 350 382 385 445 +hsync -vsync (37.9 kHz)
[ 8.163164] (II) intel(0): EDID for output TMDS-1
[ 8.165385] (II) intel(0): xf86BindGARTMemory: bind key 7 at 0x0212c000 (pgoffset 8492)
[ 8.382084] (II) intel(0): xf86UnbindGARTMemory: unbind key 7
[ 8.383335] (II) intel(0): EDID for output TV
exaCopyDirty: Pending damage region empty!
[ 26.855631] (II) intel(0): xf86UnbindGARTMemory: unbind key 2
[ 26.855658] (II) intel(0): xf86UnbindGARTMemory: unbind key 3
[ 26.855717] (II) intel(0): xf86UnbindGARTMemory: unbind key 4
[ 26.855871] (II) intel(0): xf86UnbindGARTMemory: unbind key 5
[ 26.855880] (II) intel(0): xf86UnbindGARTMemory: unbind key 6
block already free
block already free
block already free

Also, there is nothing of interest in /var/log/messages. I did save straces of two gdms that were running, and did not find anything exciting there.

potato:root:~ ###apt-cache policy gdm
gdm:
  Installed: 2.20.10-0ubuntu1
  Candidate: 2.20.10-0ubuntu1
  Version table:
 *** 2.20.10-0ubuntu1 0
        500 http://us.archive.ubuntu.com jaunty/main Packages
        100 /var/lib/dpkg/status

[backtrace]
Program received signal SIGABRT, Aborted.
0xb800f430 in __kernel_vsyscall ()
(gdb) backtrace full
#0 0xb800f430 in __kernel_vsyscall ()
#1 0xb7bef6d0 in raise () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7bf1098 in abort () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7c2d24d in ?? () from /lib/tls/i686/cmov/libc.so.6
#4 0xb7c33604 in ?? () from /lib/tls/i686/cmov/libc.so.6
#5 0xb7c355b6 in free () from /lib/tls/i686/cmov/libc.so.6
#6 0xb7871837 in ?? () from /usr/lib/libdrm_intel.so.1
#7 0xb7871bca in ?? () from /usr/lib/libdrm_intel.so.1
#8 0xb7871bf4 in ?? () from /usr/lib/libdrm_intel.so.1
#9 0xb7871cb2 in ?? () from /usr/lib/libdrm_intel.so.1
#10 0xb7870336 in drm_intel_bo_unreference () from /usr/lib/libdrm_intel.so.1
#11 0xb78e11dc in gen4_render_state_cleanup (pScrn=0xa026460) at ../../src/i965_render.c:1727
 render_state = (struct gen4_render_state *) 0xc
 i = 0
#12 0xb78b465d in I830LeaveVT (scrnIndex=0, flags=0) at ../../src/i830_driver.c:3624
 pScrn = (ScrnInfoPtr) 0xa026460
 pI830 = (I830Ptr) 0xa026a70
#13 0x080de20a in ?? ()
#14 0x080c82a7 in xf86Wakeup ()
#15 0x08091352 in WakeupHandler ()
#16 0x0813284b in WaitForSomething ()
#17 0x0808d2ee in Dispatch ()
#18 0x0807231d in main ()

ichudov (igor-chudov) wrote :
ichudov (igor-chudov) wrote :
ichudov (igor-chudov) wrote :

Sorry for two copies of :20.log.

ichudov (igor-chudov) wrote :
Sebastien Bacher (seb128) wrote :

the issue is a libdrm one and I can confirm it on jaunty intel

Changed in libdrm (Ubuntu):
importance: Undecided → High
assignee: nobody → canonical-desktop-team
Sebastien Bacher (seb128) wrote :

log snippet that show the bug

"exaCopyDirty: Pending damage region empty!
*** glibc detected *** /usr/bin/X: double free or corruption (out): 0x0d73de98 ***
======= Backtrace: =========
/lib/tls/i686/cmov/libc.so.6[0xb7ba8604]
/lib/tls/i686/cmov/libc.so.6(cfree+0x96)[0xb7baa5b6]
/usr/lib/libdrm_intel.so.1[0xb77e6837]
/usr/lib/libdrm_intel.so.1[0xb77e6bca]
/usr/lib/libdrm_intel.so.1[0xb77e6bf4]
/usr/lib/libdrm_intel.so.1[0xb77e6cb2]
/usr/lib/libdrm_intel.so.1(drm_intel_bo_unreference+0x16)[0xb77e5336]
/usr/lib/xorg/modules/drivers//intel_drv.so(gen4_render_state_cleanup+0x5c)[0xb78561dc]
/usr/lib/xorg/modules/drivers//intel_drv.so[0xb782965d]
/usr/bin/X[0x80de20a]
/usr/bin/X(xf86Wakeup+0x287)[0x80c82a7]
/usr/bin/X(WakeupHandler+0x52)[0x8091352]
/usr/bin/X(WaitForSomething+0x1bb)[0x813284b]
/usr/bin/X(Dispatch+0x7e)[0x808d2ee]
/usr/bin/X(main+0x3bd)[0x807231d]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xb7b4f775]
/usr/bin/X[0x80717d1]"

Tormod Volden (tormodvolden) wrote :

"exaCopyDirty: Pending damage region empty!" is a harmless message. The corruption could be due to bug #328035 so please test with xorg-server packages from https://launchpad.net/~bryceharrington/+archive/ppa or my PPA.

Tormod Volden (tormodvolden) wrote :

Please also attach the full Xorg.0.log

Tormod Volden (tormodvolden) wrote :

That is not the gdm log?

ichudov (igor-chudov) wrote :

Ok, OK, here's the Xorg.0.log from another try (just now).

I reproduced it once more, to the same detrimental result. Logged on as self, switched to Guest, pressed Ctrl-Alt-F1.

ichudov (igor-chudov) wrote :

Actually this time it is weird. Instead of Ctrl-Alt-F1 locking the screen, after 2 seconds of blinking I return to the logon prompt of my main user. (instead of going to virtual terminal 1).

Sebastien Bacher (seb128) wrote :

the bug is different of the timestamp one, it doesn't include any clock shift or long delay, ie you can get it directly after a fresh boot and user switching, the ppa xserver-xorg-core doesn't fix the bug

Tormod Volden (tormodvolden) wrote :

Can you please install the -dbg packages and also run X through gdb over ssh? See https://wiki.ubuntu.com/X/Backtracing

Sebastien Bacher (seb128) wrote :

I only have my laptop with me this week so I can't get a ssh log, valgrind seems to not be working correctly either

Tormod Volden (tormodvolden) wrote :

Full logs (Xorg.0.log and Xorg.20.log, i.e. from both user sessions during the same crash) with all debug packages would be nice as well.

ichudov, thanks for the new log. Can you please attach the corresponding Xorg.20.log? BTW, your log snippet in the bug description, was it modified one way or another? Can you please attach that full file as-is?

Anyway, "block already free" most likely comes from libdrm/intel/mm.c and suggests heap corruption.

Not sure it is related, but on my machine (-ati) I get thrown back to the original user immediately, probably related to "error setting MTRR (base = 0xd0000000, size = 0x08000000, type = 1) Invalid argument (22)" in my Xorg.20.log.

Tormod Volden (tormodvolden) wrote :

No, my own problem was not related, sorry (local configuration conflict).

What you also can try is to switch to a virtual console, log in and start an X session manually:
 startx -- :19
Does this cause the same crash? (Log would be Xorg.19.log)

ichudov (igor-chudov) wrote :

So, guys, do you want me (the original submitter) to do anything further? I am a little lost in the conversation and who wants whom to try what.

Thanks

Igor

Tormod Volden (tormodvolden) wrote :

ichudov, I asked some questions specifically to you, and the other requests I had it would be nice if you can answer as well, since you are the original submitter and also are familiar with ssh.

On Thu, Mar 26, 2009 at 09:12:10PM -0000, Tormod Volden wrote:
> ichudov, I asked some questions specifically to you, and the other
> requests I had it would be nice if you can answer as well, since you are
> the original submitter and also are familiar with ssh.

OK, I will do it tonight. I believe that you are referring to a
request to run X from ssh.

Igor

(reassigning to -intel package, many similar issues have been identified in the driver)

It would be useful to know if this issue has already been solved upstream. There is a newer -intel package in the xorg-edgers PPA at https://launchpad.net/~xorg-edgers/+archive/ppa (you might need the libdrm2 package from the drm-snapshot source as well). Can you please test it? See https://wiki.ubuntu.com/XorgOnTheEdge for how to uninstall these afterwards.

You can also try Option "AccelMethod" "UXA" in your xorg.conf to see if it helps.

Changed in xserver-xorg-video-intel (Ubuntu Jaunty):
status: New → Confirmed
Tormod Volden (tormodvolden) wrote :

Please also try Option "NoAccel" "true" which was reported to help in similar bug 345714.

ichudov (igor-chudov) wrote :

I did run the X server that crashes (the guest one) under GDB, as outlined on https://wiki.ubuntu.com/X/Backtracing

I save output to file (attached) and did say backtrace full. So here it is.

Looks like it occurs in drm_intel_bo_unreference, which, sadly, likely suggests a memory corruption elsewhere. drm_intel_bo_unreference is called by gen4_render_state_cleanup.

Changed in xserver-xorg-video-intel (Ubuntu Jaunty):
assignee: canonical-desktop-team → bryceharrington
Bryce Harrington (bryce) on 2009-03-27
description: updated
Tormod Volden (tormodvolden) wrote :

Yes, render_state = 0xc definitely looks wrong. Can you please install xserver-xorg-core-dbg and libdrm2-dbg and get a new full backtrace?

On Fri, Mar 27, 2009 at 06:45:28PM -0000, Tormod Volden wrote:
> Yes, render_state = 0xc definitely looks wrong. Can you please install
> xserver-xorg-core-dbg and libdrm2-dbg and get a new full backtrace?
>

Hi, I am doing a an aptitude update now and do not see them
coming. Could you tell me which version should I wait for?

igor

"sudo apt-get install xserver-xorg-core-dbg libdrm2-dbg" should work, their versions must match the xserver-xorg-core and libdrm2. Did you also try "NoAccel" or UXA or the xorg-edgers packages?

ichudov (igor-chudov) wrote :

I have installed xserver-xorg-core-dbg libdrm2-dbg and rebooted.

Started a second session.

From SSH, I attached to the second session and pressed Ctrl-Alt-F1.

While the same result occurred, I captured output of GDB, attached here. Looks like there is a lot more information.

Let me know, guys, if you want me to contact someone. This bug, if not fixed, would be a showstopper for many users.

Steven Walter (stevenrwalter) wrote :

I'm also seeing this issue on Jaunty with a Dell D630 laptop. Would it be useful for me to provide -dbg backtraces from gdb?

Tormod Volden (tormodvolden) wrote :

Thanks ichudov. For completeness, install libdrm-intel1-dbg as well. Can you please run this in gdb before quitting it:
 frame 12
 print *pI830
(that is a capital i before the eight)

I think then we will have enough information to forward this bug upstream.

Steven, thanks, it would be nice if you attach your backtrace as well.

Created an attachment (id=24378)
gdb session with full backtrace

Forwarded from Ubuntu https://bugs.launchpad.net/bugs/348428

When running two X servers (fast user switching) the second session will crash when switching to another VT:

*** glibc detected *** /usr/bin/X: double free or corruption (out): 0x0d73de98 ***

Snippet from gdb (note the render_state value):

#10 0xb792d336 in drm_intel_bo_unreference () from /usr/lib/libdrm_intel.so.1
No symbol table info available.
#11 0xb799e1dc in gen4_render_state_cleanup (pScrn=0x98f8760)
    at ../../src/i965_render.c:1727
 render_state = (struct gen4_render_state *) 0xc
 i = 0
#12 0xb797165d in I830LeaveVT (scrnIndex=0, flags=0)
    at ../../src/i830_driver.c:3624
 pScrn = (ScrnInfoPtr) 0x98f8760
 pI830 = (I830Ptr) 0x98f8dd8

This happens with -intel from master, mesa 7.4 and xserver 1.6.

From another commenter:

the crash doesn't happen when using noaccel and for some reason the libdrm-intel-dbg package doesn't work correctly

stacktrace after doing a local debug rebuild:

"0xb80b7430 in __kernel_vsyscall ()
(gdb) bt
#0 0xb80b7430 in __kernel_vsyscall ()
#1 0xb7c936d0 in raise () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7c95098 in abort () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7cd124d in ?? () from /lib/tls/i686/cmov/libc.so.6
#4 0xb7cd7604 in ?? () from /lib/tls/i686/cmov/libc.so.6
#5 0xb7cd95b6 in free () from /lib/tls/i686/cmov/libc.so.6
#6 0xb7914e25 in free_block (bufmgr_fake=0x9c6f0f0, block=0xd513498)
    at ../../../libdrm/intel/intel_bufmgr_fake.c:473
#7 0xb7915dd7 in drm_intel_fake_bo_unreference_locked (bo=0x9c78ac0)
    at ../../../libdrm/intel/intel_bufmgr_fake.c:875
#8 0xb7915e0a in drm_intel_fake_bo_unreference_locked (bo=0x9c78d80)
    at ../../../libdrm/intel/intel_bufmgr_fake.c:879
#9 0xb7915e98 in drm_intel_fake_bo_unreference (bo=0x9c78d80)
    at ../../../libdrm/intel/intel_bufmgr_fake.c:894
#10 0xb7914417 in drm_intel_bo_unreference (bo=0x9c78d80)
    at ../../../libdrm/intel/intel_bufmgr.c:73
#11 0xb798a1dc in gen4_render_state_cleanup (pScrn=0x9c22d80)
    at ../../src/i965_render.c:1727
#12 0xb795d65d in I830LeaveVT (scrnIndex=0, flags=0)
    at ../../src/i830_driver.c:3624
#13 0x080de1da in xf86XVLeaveVT (index=0, flags=0)
    at ../../../../hw/xfree86/common/xf86xv.c:1269
#14 0x080c8277 in xf86Wakeup (blockData=0x0, err=-1, pReadmask=0x81f72c0)
---Type <return> to continue, or q <return> to quit---
    at ../../../../hw/xfree86/common/xf86Events.c:551
#15 0x08091322 in WakeupHandler (result=-1, pReadmask=0x81f72c0)
    at ../../dix/dixutils.c:418
#16 0x081329eb in WaitForSomething (pClientsReady=0xd47e530)
    at ../../os/WaitFor.c:231
#17 0x0808d2be in Dispatch () at ../../dix/dispatch.c:367
#18 0x080722ed in main (argc=10, argv=0xbffd3d64, envp=Cannot access memory at address 0x51dd
)
    at ../../dix/main.c:397

the server which crashes is the guest session one and it corrupts the screen

using uxa xorg crashes when opening the guest session

Program received signal SIGABRT, Aborted.
0xb7ee7430 in __kernel_vsyscall ()
(gdb) bt
#0 0xb7ee7430 in __kernel_vsyscall ()
#1 0xb7ac36d0 in raise () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7ac5098 in abort () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7abc5ce in __assert_fail () from /lib/tls/i686/cmov/libc.so.6
#4 0xb77b17da in I830EXACopy (pDstPixmap=0xc2854a0, src_x1=889, src_y1=0,
    dst_x1=887, dst_y1=0, w=72, h=24) at ../../src/i830_batchbuffer.h:107
#5 0xb77c9d5e in uxa_copy_n_to_n (pSrcDrawable=0xc2854a0,
    pDstDrawable=0xc2854a0, pGC=0x0, pbox=0xbfd03590, nbox=1, dx=2, dy=0,
    reverse=0, upsidedown=0, bitplane=0, closure=0x0)
    at ../../uxa/uxa-accel.c:458
#6 0xb7636721 in fbCopyRegion (pSrcDrawable=0xc2854a0,
    pDstDrawable=0xc2854a0, pGC=0x0, pDstRegion=0xbfd03590, dx=2, dy=0,
    copyProc=0xb77c9710 <uxa_copy_n_to_n>, bitPlane=0, closure=0x0)
    at ../../fb/fbcopy.c:396
#7 0xb77c95e0 in uxa_copy_window (pWin=0xc3c1ad0, ptOldOrg={x = 889, y = 0},
    prgnSrc=0xc3bfed0) at ../../uxa/uxa-accel.c:843
#8 0x08180670 in damageCopyWindow (pWindow=0xc3c1ad0, ptOldOrg=
      {x = 889, y = 0}, prgnSrc=0xc3bfed0)
    at ../../../miext/damage/damage.c:1774
#9 0x0812533d in miSpriteCopyWindow (pWindow=0xc3c1ad0, ptOldOrg=
      {x = 889, y = 0}, prgnSrc=0xc3bfed0) at ../../mi/misprite.c:480
#10 0x08146d50 in compCopyWindow (pWin=0xc3c1ad0, ptOldOrg={x = 889, y = 0},
    prgnSrc=0xc3bfed0) at ../../composite/compwindow.c:577
#11 0x0812cf33 in miSlideAndSizeWindow (pWin=0xc3c1ad0, x=887, y=0,
    w=<value optimized out>, h=<value optimized out>, pSib=0xc458150)
    at ../../mi/miwindow.c:633
#12 0x081472a8 in compResizeWindow (pWin=0xc3c1ad0, x=887, y=0, w=74, h=24,
    pSib=0xc458150) at ../../composite/compwindow.c:406
#13 0x08078d11 in ConfigureWindow (pWin=0xc3c1ad0, mask=15, vlist=0xc483fe4,
    client=0xc2d1188) at ../../dix/window.c:2403
#14 0x0808cb32 in ProcConfigureWindow (client=0xc2d1188)
    at ../../dix/dispatch.c:741
#15 0x0808d57f in Dispatch () at ../../dix/dispatch.c:437
#16 0x080722ed in main (argc=10, argv=0xbfd03a94, envp=Cannot access memory at address 0x1b0e
)
    at ../../dix/main.c:397

ichudov (igor-chudov) wrote :

This is a response to Tormod's message https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/348428/comments/30.

I have reupdated everything, installed libdrm-dbg, rebooted and reproduced the bug again.

Attached is the gdb-Xorg.txt file.

Sebastien Bacher (seb128) wrote :

the crash doesn't happen when using noaccel and for some reason the libdrm-intel-dbg package doesn't work correctly

stacktrace after doing a local debug rebuild:

"0xb80b7430 in __kernel_vsyscall ()
(gdb) bt
#0 0xb80b7430 in __kernel_vsyscall ()
#1 0xb7c936d0 in raise () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7c95098 in abort () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7cd124d in ?? () from /lib/tls/i686/cmov/libc.so.6
#4 0xb7cd7604 in ?? () from /lib/tls/i686/cmov/libc.so.6
#5 0xb7cd95b6 in free () from /lib/tls/i686/cmov/libc.so.6
#6 0xb7914e25 in free_block (bufmgr_fake=0x9c6f0f0, block=0xd513498)
    at ../../../libdrm/intel/intel_bufmgr_fake.c:473
#7 0xb7915dd7 in drm_intel_fake_bo_unreference_locked (bo=0x9c78ac0)
    at ../../../libdrm/intel/intel_bufmgr_fake.c:875
#8 0xb7915e0a in drm_intel_fake_bo_unreference_locked (bo=0x9c78d80)
    at ../../../libdrm/intel/intel_bufmgr_fake.c:879
#9 0xb7915e98 in drm_intel_fake_bo_unreference (bo=0x9c78d80)
    at ../../../libdrm/intel/intel_bufmgr_fake.c:894
#10 0xb7914417 in drm_intel_bo_unreference (bo=0x9c78d80)
    at ../../../libdrm/intel/intel_bufmgr.c:73
#11 0xb798a1dc in gen4_render_state_cleanup (pScrn=0x9c22d80)
    at ../../src/i965_render.c:1727
#12 0xb795d65d in I830LeaveVT (scrnIndex=0, flags=0)
    at ../../src/i830_driver.c:3624
#13 0x080de1da in xf86XVLeaveVT (index=0, flags=0)
    at ../../../../hw/xfree86/common/xf86xv.c:1269
#14 0x080c8277 in xf86Wakeup (blockData=0x0, err=-1, pReadmask=0x81f72c0)
---Type <return> to continue, or q <return> to quit---
    at ../../../../hw/xfree86/common/xf86Events.c:551
#15 0x08091322 in WakeupHandler (result=-1, pReadmask=0x81f72c0)
    at ../../dix/dixutils.c:418
#16 0x081329eb in WaitForSomething (pClientsReady=0xd47e530)
    at ../../os/WaitFor.c:231
#17 0x0808d2be in Dispatch () at ../../dix/dispatch.c:367
#18 0x080722ed in main (argc=10, argv=0xbffd3d64, envp=Cannot access memory at address 0x51dd
)
    at ../../dix/main.c:397"

the server which crashes is the guest session one and it corrupts the screen

Sebastien Bacher (seb128) wrote :

the new stracktrace is using exa and current jaunty version

Sebastien Bacher (seb128) wrote :

the crash is still there using libdrm2 libdrm-intel1 xserver-xorg-video-intel from the ppa described there

ichudov (igor-chudov) wrote :

I just wanted to add that at home, counting laptops, I have six Ubuntu machines. Laptops, desktops, etc. Dells, nonames, Fujitsu and so on. A heterogeneous environment in terms of hardware. Most run Intrepid.

And, I am sad to add, fast user switching works very poorly on all of them. It is always one thing or another, that is screwed up, sharing of sound, switching, etc. It is also screwed up in a variety of ways, not just one.

I think that user switching is just something that is not properly tested because various developers sit in front of their accounts all day long and do not need to switch users. But I have a family at home and three people use these computers (me, my spouse and our 7 year old). And it is very frustrating to deal with fast user switching, especially for the 7 year old. It really is a big turn off.

This message is partly to vent, and partly to suggest that it is an area that needs a lot of testing.

Tormod Volden (tormodvolden) wrote :

Sebastien, can you please attach the full backtrace (bt full) from your own debug build?

Ichudov, yes this is a relative new feature and it exposes old and new bugs in the system that were not exercised before. I hope it will quickly improve. Your testing and bug reports are very valuable.

From what I can see, in I830LeaveVT() pI830.gen4_render_state is 0x994ed80 which sounds like a reasonable pointer, but when it gets down to gen4_render_state_cleanup() what should have been the same pointer is now 0xc which sounds corrupted. You can verify this by printing out *pI830 from frame 11 as well.

If you are experienced with gdb, you can set a watchpoint on this data location to see when it is being overwritten. But anyway I think we can forward this upstream. Maybe they will understand what goes wrong without digging further with gdb.

Sebastien, which line of code does it crash (use the "list" command in gdb). The trace says line 1727 but I can not be sure that is the same as in my source. If it does not crash in the first unreference, I am a bit surprised. You can also print *pI830->gen4_render_state in both frame 11 and 12. Or set a breakpoint and check it before it is calling gen4_render_state_cleanup(). Argh, remote gdb debugging via a bug tracker is just frustrating :)

ichudov (igor-chudov) wrote :

Tormod, thanks for understanding.

I will do what I can to help improve fast user switching.

I am downloading source of xserver-xorg-video-intel and will poke around the code a little bit. I will try to set watchpoints.
I am a C++ programmer, but I rarely use debuggers, that's just the way I am.

Nevertheless, I think that this situation is ripe for forwarding the info upstream.

I will post updates if I find anything or if anyone wants me to try new ideas.

Tormod Volden (tormodvolden) wrote :

I see that Sebastien's backtrace is a bit different. Sometimes optimised compilation makes it difficult for gdb to extract local variables, so my above analysis does not need to be correct. A debug build without optimisation is the best. ichudov, if you have downloaded the source of -intel and libdrm, you can build them with:
 DEB_BUILD_OPTIONS="noopt debug nostrip" debuild -b -us -uc

ichudov (igor-chudov) wrote :

Tormod, when I try to do build after apt-get source ..., by going to that directory and typing the above command, I get:

dpkg-checkbuilddeps: Unmet build dependencies: quilt xserver-xorg-dev (>= 2:1.5.99.901) x11proto-gl-dev x11proto-video-dev libgl1-mesa-dev | libgl-dev libxvmc-dev (>= 1:1.0.1) x11proto-fonts-dev libdrm-dev (>= 2.4.5) x11proto-xf86dri-dev libpciaccess-dev (>= 0.8.0+git20071002)
dpkg-buildpackage: warning: Build dependencies/conflicts unsatisfied; aborting.

is there some way to make it download its dependencies. I may have a chance to try it tonight, I need to go now and do something

Changed in xserver-xorg-video-intel:
status: Unknown → Confirmed
Tormod Volden (tormodvolden) wrote :

sudo apt-get build-dep libdrm xserver-xorg-video-intel

I am the person who submitted this bug to ubuntu's Launchpad.

Tormod sent me a link to this bug, which is an upstream version of the original ubuntu bug report that I submitted.

I would like to state here that I am willing to be a guinea pig for any possible testing that is needed to fix this bug.

I am a computer programmer and write scripts also so I will be able to provide reasonable level of help.

Created an attachment (id=24408)
backtrace from debug build without optimisation

I could not build it due to lint warnings.

Anyway, has this bug been forwarded to upstream?

Let us know.

Thanks a lot.

Tormod Volden (tormodvolden) wrote :

The lint warnings should not be critical. Yes, I forwarded it upstream, I thought you all would receive the link per bug mail, but here it is:

I've forwarded this bug upstream for you to https://bugs.freedesktop.org/show_bug.cgi?id=20956 . Could you please subscribe yourself to this bug, in case they need further information or wish you to test something. Thanks ahead of time!

Sebastien Bacher (seb128) wrote :
Download full text (12.7 KiB)

the crash is on "free(block);" in free_block()

the full bt

0xb8005430 in __kernel_vsyscall ()
(gdb) bt full
#0 0xb8005430 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb7be16d0 in raise () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb7be3098 in abort () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3 0xb7c1f24d in ?? () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#4 0xb7c25604 in ?? () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#5 0xb7c275b6 in free () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#6 0xb7862e25 in free_block (bufmgr_fake=0x93730f0, block=0xcc956c8)
    at ../../../libdrm/intel/intel_bufmgr_fake.c:473
 bo_fake = (drm_intel_bo_fake *) 0x937cac0
#7 0xb7863dd7 in drm_intel_fake_bo_unreference_locked (bo=0x937cac0)
    at ../../../libdrm/intel/intel_bufmgr_fake.c:875
 bufmgr_fake = (drm_intel_bufmgr_fake *) 0x93730f0
 bo_fake = (drm_intel_bo_fake *) 0x937cac0
 i = 154652032
 __PRETTY_FUNCTION__ = "drm_intel_fake_bo_unreference_locked"
#8 0xb7863e0a in drm_intel_fake_bo_unreference_locked (bo=0x937cd80)
    at ../../../libdrm/intel/intel_bufmgr_fake.c:879
---Type <return> to continue, or q <return> to quit---
 bufmgr_fake = (drm_intel_bufmgr_fake *) 0x93730f0
 bo_fake = (drm_intel_bo_fake *) 0x937cd80
 i = 0
 __PRETTY_FUNCTION__ = "drm_intel_fake_bo_unreference_locked"
#9 0xb7863e98 in drm_intel_fake_bo_unreference (bo=0x937cd80)
    at ../../../libdrm/intel/intel_bufmgr_fake.c:894
 bufmgr_fake = (drm_intel_bufmgr_fake *) 0x93730f0
#10 0xb7862417 in drm_intel_bo_unreference (bo=0x937cd80)
    at ../../../libdrm/intel/intel_bufmgr.c:73
No locals.
#11 0xb78d81dc in gen4_render_state_cleanup (pScrn=0x9326d80)
    at ../../src/i965_render.c:1727
 render_state = (struct gen4_render_state *) 0x937c240
 i = 0
#12 0xb78ab65d in I830LeaveVT (scrnIndex=0, flags=0)
    at ../../src/i830_driver.c:3624
 pScrn = (ScrnInfoPtr) 0x9326d80
 pI830 = (I830Ptr) 0x93273e0
#13 0x080de1da in xf86XVLeaveVT (index=0, flags=0)
    at ../../../../hw/xfree86/common/xf86xv.c:1269
 pxvs = (XvScreenPtr) 0xc9ec648
 pAdaptor = (XvAdaptorPtr) 0xc9ed4ac
 pPriv = (XvPortRecPrivatePtr) 0xc9ee640
---Type <return> to continue, or q <return> to quit---
 i = 2
 j = 1
#14 0x080c8277 in xf86Wakeup (blockData=0x0, err=-1, pReadmask=0x81f72c0)
    at ../../../../hw/xfree86/common/xf86Events.c:551
 devicesWithInput = {fds_bits = {1, 213397752, -1208852492,
    -1081994872, -1209026216, 213397752, -1208852492, -1081994872,
    -1209028076, 11259375, -1081994876, 0, -1208937201, 213397752,
    -1208852492, -1081994824, -1209019522, 11259375, 7, 0, -1209018673,
    136208372, 128, 0, 136250328, 136278720, 1, -1081994792, 134921617,
    213397752, 0, 0}}
 pInfo = <value optimized out>
#15 0x08091322 in WakeupHandler (result=-1, pReadmask=0x81f72c0)
    at ../../dix/dixutils.c:418
 i = 0
#16 0x081329eb in WaitForSomething (pClientsReady=0xcb82530)
    at ../../os/WaitFor.c:231
 i = -1
 waittime = {tv_sec = 583, tv_usec = 350643}
 wt = (struct timeval *) 0xbf821460
 timeout = <value optimized out>
 clientsReadable = {fds_bits = {0 <repeats 3...

ichudov (igor-chudov) wrote :

I subscribed to the bug at freedesktop and volunteered to be a guinea pig.

Thanks.

Martin Pitt (pitti) on 2009-04-01
summary: - Swithing to another user and then to anything else, freezes laptop.
- Jaunty
+ Switching to another user and then to anything else freezes laptop
tags: added: jaunty regression-potential

seb128's backtraces do not match up to the original reported bug. Probably some unrelated problem that has similar symptoms.

summary: - Switching to another user and then to anything else freezes laptop
+ Switching to another user and then to anything else causes freeze in
+ drm_intel_bo_unreference ()
Bryce Harrington (bryce) wrote :

Actually I take it back, only his UXA backtraces are not matches. The ones with libdrm symbols look quite handy.

Bryce Harrington (bryce) wrote :

Okay so piecing everything together...

#5 0xb7c275b6 in free () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#6 0xb7862e25 in free_block (bufmgr_fake=0x93730f0, block=0xcc956c8)
    at ../../../libdrm/intel/intel_bufmgr_fake.c:473
 bo_fake = (drm_intel_bo_fake *) 0x937cac0

static void free_block(drm_intel_bufmgr_fake *bufmgr_fake, struct block *block)
{
   ...
      mmFreeMem(block->mem);
      free(block); /* <-- boom */
   }
}

*** glibc detected *** /usr/bin/X: double free or corruption (out): 0x0d73de98 ***

Sounds like a double free. Could 2 sessions cause 2 frees? Or is it something else. Probably needs additional debug statements to track the frees. Maybe upstream can provide better insights when they respond.

ichudov (igor-chudov) wrote :

I am thinking, the second session did not need to allocate a shared resource, and so it did not allocate, but it tries to free it. This is my hypothesis.

Christoph (christoph-thomas) wrote :

Hi Ichudov,

in my environment (desktop) this bug occurs since xserver-xorg-video-intel 2.6.3-0ubuntu2. After downgrading to xserver-xorg-video-intel 2.6.3-0ubuntu1 everything works fine. Maybe this helps your family ;-)

Chris

You're using the fake bufmgr, which means no GEM. I'll have to build a new kernel w/o GEM to test this... Given the backtrace it should be pretty easy to track down once I have that.

Christoph (christoph-thomas) wrote :

Hi Nicolo,

EXA and server-xorg-video-intel 2.6.3-0ubuntu1 works fine.

EXA and server-xorg-video-intel 2.6.3-0ubuntu2 to server-xorg-video-intel 2.6.3-0ubuntu4 , and
UXA and server-xorg-video-intel 2.6.3-0ubuntu1 to server-xorg-video-intel 2.6.3-0ubuntu4 shows the described bug.

Meanwhile, testers narrowed the regression to these two patches:

Fix Xv crash with overlay video :

http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=2026c57cf0a352d9e6f9d208cfb7d4d550614477

Fix XV with non-GEM kernels by allocating a larger fake bufmgr. :

http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=fb6e00f40f713a87c760fc7603159ed11ea9b0d5

These were cherrypicked for fixing the following bug, which I've reopened for Ubuntu:
[i855] xserver-xorg-video-intel-2.6.3 : Only green window when playing movies with XV extension
https://bugs.edge.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/344740

https://bugs.freedesktop.org/show_bug.cgi?id=21025 is most likely connected and has a complete backtrace.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xserver-xorg-video-intel - 2:2.6.3-0ubuntu5

---------------
xserver-xorg-video-intel (2:2.6.3-0ubuntu5) jaunty; urgency=low

  * Disable 114_fix_xv_with_non_gem.patch: At the time we accepted it, it
    sounded a little risky, so I took it on the condition that it didn't
    cause regressions, which apparently we have proof that it does.
    (LP: #348428) (Reopen 344740)

 -- Bryce Harrington <email address hidden> Fri, 03 Apr 2009 20:03:39 -0700

Changed in xserver-xorg-video-intel (Ubuntu Jaunty):
status: Confirmed → Fix Released
Bryce Harrington (bryce) wrote :

The change from 0u1 to 0u2 is the inclusion of these two patches cherrypicked from upstream, to solve bug #344740 which affects i855 chipsets:

114_fix_xv_with_non_gem.patch
115_fix_crash_xv_overlay.patch

Both are just tiny one-liners, but as Yoda says, "Size matters not!"

Anyway, I was a touch skeptical at the time that these patches could lead to regressions. Christoph's testing seems to confirm that is indeed the case here.

Of the two, 114 seems the most suspicious since it deals with the memory manager. 115 sounds like it corrects a crash situation, so if it is innocent here it would be nice to keep it.

I have uploaded a fix which disables 114. Please test and verify this resolves the user switching issue. If not, we can also try reverting 115.

unggnu (unggnu) wrote :

This is most likely a duplicate of Bug #345796.

Robbie Williamson (robbiew) wrote :

I can confirm that the fix resolved my issue reported in duplicate bug 345714.

Christoph (christoph-thomas) wrote :

Hi Nicolo,

EXA and server-xorg-video-intel 2.6.3-0ubuntu5 works fine.

UXA and server-xorg-video-intel 2.6.3-0ubuntu5 shows not exactly the described bug, if I open a new session, I get directly a black Screen. Previosly I could open a new session and the black screen occurd when switching back to the first session (I'm not shure if this was only with EXA, if necessary I could retest).

I'm not shure if this helps, but what works is to open the first session with UXA (then change xorg.conf to EXA) and open the second session with EXA.

ichudov (igor-chudov) wrote :

Seems to be good now, I am switching users like crazy, from my good personality user to my bad personality user. Just kidding. Thanks to all. I hope that it does not reoccur. Let's update the freedesktop bug now.

Hm, seems to work ok with a 2.6.29ish kernel... I'll try to get your package combo...

Ok reproduced it with 2.6.28... now to fix it...

Created an attachment (id=24654)
NULL fake bo block when freeing in evict_all

Can you give this patch a try? If the gen4 bo ends up on the LRU, we'll free it at evict_all time, but a later unref of the object will try to free it again unless we NULL the block pointer.

unggnu (unggnu) wrote :

Upstream has published a patch. I have adapted it to work against the Ubuntu libdrm package so who wants to give it a try?
Compilation works but since I didn't have this problem I can't confirm the patch myself.

Steven Walter (stevenrwalter) wrote :

I can verify that with xserver-xorg-video-intel 2:2.6.3-0ubuntu8 I'm not getting a crash on user-switching now.

commit 11b60973bca1bc9bbda44be4c695e22d28d8ca4a
Author: Jesse Barnes <email address hidden>
Date: Tue Apr 21 17:13:16 2009 -0700

    intel: NULL fake bo block when freeing in evict_all

    Fixes assertion failures on later use of the object.

MrAuer (mr-auer) wrote :

I dont know whether I should comment here - https://bugs.launchpad.net/bugs/337116
or on this one...

I now installed Xubuntu Jaunty on my Eee pc 901 after having had crashes described in bug linked above.
At first I thought everything was ok, but today X has crashed twice, screen goes black, gdm restarts and Im back at the login screen. Seemingly at random.

Snippet from syslog:

Apr 27 23:59:44 Nokkalaama kernel: [ 4370.303321] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0
Apr 27 23:59:44 Nokkalaama gdm[3011]: WARNING: gdm_slave_xioerror_handler: Fatal X error - Restarting :0
Apr 27 23:59:46 Nokkalaama acpid: client connected from 7350[0:0]
Apr 27 23:59:47 Nokkalaama kernel: [ 4373.932997] [drm:i915_setparam] *ERROR* unknown parameter 4
Apr 27 23:59:47 Nokkalaama kernel: [ 4373.934549] [drm:i915_getparam] *ERROR* Unknown parameter 6
Apr 27 23:59:49 Nokkalaama kernel: [ 4375.319637] [drm:i915_getparam] *ERROR* Unknown parameter 6

kern.log:
Apr 27 23:12:24 Nokkalaama kernel: [ 1530.143499] [drm:i915_getparam] *ERROR* Unknown parameter 6
Apr 27 23:34:13 Nokkalaama kernel: [ 2839.118634] EXT2-fs warning: mounting unchecked fs, running e2fsck is recommended
Apr 27 23:59:44 Nokkalaama kernel: [ 4370.303321] [drm:i915_get_vblank_counter] *ERROR* trying to get vblank count for disabled pipe 0
Apr 27 23:59:47 Nokkalaama kernel: [ 4373.932997] [drm:i915_setparam] *ERROR* unknown parameter 4
Apr 27 23:59:47 Nokkalaama kernel: [ 4373.934549] [drm:i915_getparam] *ERROR* Unknown parameter 6
Apr 27 23:59:49 Nokkalaama kernel: [ 4375.319637] [drm:i915_getparam] *ERROR* Unknown parameter 6

Ill see about doing a backtrace tomorrow. I havent done that before and dont have the programs installed.

Changed in xserver-xorg-video-intel:
status: Confirmed → Fix Released
Hellbourne (amit.finkler) wrote :

I don't have an Intel video adapter but an Nvidia one and this bug affects me as well. The released fix obviously hasn't solved the bug on my system. I have a Quadro NVS 135M card using the non-free nvidia driver.

On Sun, May 17, 2009 at 04:13:21PM -0000, Hellbourne wrote:
> I don't have an Intel video adapter but an Nvidia one and this bug
> affects me as well. The released fix obviously hasn't solved the bug on
> my system. I have a Quadro NVS 135M card using the non-free nvidia
> driver.

Given that the backtrace for this bug involves routines specific to the
intel driver, actually if you're using the -nvidia driver there is zero
chance you have this bug. You probably just have a bug with similar
symptoms.

Changed in xserver-xorg-video-intel (Ubuntu Jaunty):
assignee: Bryce Harrington (bryceharrington) → nobody
Changed in xserver-xorg-video-intel (Ubuntu):
assignee: Bryce Harrington (bryceharrington) → nobody
Changed in xserver-xorg-video-intel:
importance: Unknown → Medium
Changed in xserver-xorg-video-intel:
importance: Medium → Unknown
Changed in xserver-xorg-video-intel:
importance: Unknown → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.