system locks up when running "strace gdmsetup"

Bug #336771 reported by Martin Olsson
2
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
New
Undecided
Unassigned

Bug Description

I found this really strange but reproducible bug:

1. boot live CD (I used jaunty x86 alpha5 on a x64 machine)
2. open a terminal and run "sudo su"
3. run "strace gdmsetup"
4. if it succeeds without locking up the system, close gdmsetup with the close button and then re-run the command. the bug triggers about 1 out of 5 times I run this command.

I can still ssh into the box so kernel seems reasonably undamaged. If I connect gdb to X then the stack looks like a normal health X.org stack so it's probably not a video driver lockup (I got a intel X4500HD card btw, with G45-chipset). I also attached to compiz.real and that looks fine as well. However, even though X and compiz both look like they are working, the machine is completely unresponsive. I cant move any windows, start any program or anything, the mouse cursor doesnt even turn back to normal it just stays in the "" (I can move the mouse though). Even the CPU graph for the gnome system monitor applet and conky etc stop updating when I repro this on my installed system (without the live cd).

If I "sudo killall -9 strace" from the network ssh session (connected from another working box), I can resume using the machine.

Revision history for this message
Martin Olsson (mnemo) wrote :

I've also been able to repro this on a much older desktop machine which has an ATI radeon card attached over AGP. However, on this machine I had to re-run "strace gdmsetup" around 15 times before it froze the machine. I think it's a kernel issue.

When the machine was hung I attached gdb to strace and saw this stack:
#0 0xb7f9c430 in __kernel_vsyscall ()
#1 0xb7ef2e83 in write () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7e89cdc in _IO_file_write () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7e8ae57 in _IO_do_write () from /lib/tls/i686/cmov/libc.so.6
#4 0xb7e8a6fa in _IO_file_sync () from /lib/tls/i686/cmov/libc.so.6
#5 0xb7e7e4d9 in fflush () from /lib/tls/i686/cmov/libc.so.6
#6 0x0804d0fd in ?? ()
#7 0x0804c1c5 in ?? ()
#8 0xb7e35775 in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
#9 0x080497a1 in ?? ()

Revision history for this message
Martin Olsson (mnemo) wrote :

I can also repro this bug on a third machine, it's a HP DV6000 laptop with intel 965, 2GB RAM running intrepid using the .27-11 kernel. It was a lot harder to get the bug to trigger on this machine, I had to re-run the command at least 20 times. It's probably some race condition?

Revision history for this message
Martin Olsson (mnemo) wrote :
Download full text (3.7 KiB)

I booted Fedora 10 on the first machine and I can't repro the issue there. However, it seemed that "gdmsetup" isn't available on Fedora. However, after some fiddling I found out that I can also repro the bug on the intrepid machine by repeated by running the command "strace system-config-printer" and of course system-config-printer is indeed available under Fedora. Even though the intrepid machine froze up after running "strace system-config-printer" less than 20 times in a row, the Fedora machine just keeps going even if I run it like over 30 times.

However, interestingly enough the Fedora machine reports that strace crashes due to some memory corruption detected by glibc:

lstat("/usr/share/icons/gnome/16x16/devices/printer.png", {st_mode=S_IFREG|0644, st_size=516, ...}) = 0
lstat("/usr/share/icons/gnome/32x32/devices/printer.png", {st_mode=S_IFREG|0644, st_size=1070, ...}) = 0
select(5, [4], [4], NULL, NULL) = 1 (out [4])
writev(4, [{"<\2\2\0001\0\0\3\225\4\5\0002\0\0\0030\0\0\3&\0\0\0\0\0\0\0006\4\2\0000"..., 4068}], 1) = 4068
read(4, 0x1882ef4, 4096) = -1 EAGAIN (Resource temporarily unavailable)
select(5, [4], [4], NULL, NULL) = 1 (out [4])
writev(4, [{""..., 0}, {"\22\0004\0229\0\0\3\357\0\0\0\6\0\0\0 \1\310\0.\22\0\0\26\0\0\0\26\0\0\0\0"..., 16384}, {"\266\275\272\377\266\275\272\377\266\275\272\377\266\275\272\377\266\275\272\377\266\275\272\377\266\275\272\377\266\275\272\377\266"..., 2256}], 3) = 18640
shmget(IPC_PRIVATE, 393216, IPC_CREAT|0600) = 8749071
shmat(8749071, 0, 0) = ?
*** glibc detected *** strace: malloc(): memory corruption (fast): 0x0000000001864460 ***
======= Backtrace: =========
/lib64/libc.so.6[0x6cde98]
/lib64/libc.so.6[0x6d1531]
/lib64/libc.so.6(__libc_malloc+0x98)[0x6d2a08]
strace[0x408728]
strace[0x40598e]
strace[0x404696]
/lib64/libc.so.6(__libc_start_main+0xe6)[0x674546]
strace[0x401e69]
======= Memory map: ========
00110000-00130000 r-xp 00000000 fd:00 32773 /lib64/ld-2.9.so
0032f000-00330000 r--p 0001f000 fd:00 32773 /lib64/ld-2.9.so
00330000-00331000 rw-p 00020000 fd:00 32773 /lib64/ld-2.9.so
00400000-00447000 r-xp 00000000 fd:00 163841 /usr/bin/strace
00647000-00648000 rw-p 00047000 fd:00 163841 /usr/bin/strace
00648000-00656000 rw-p 00648000 00:00 0
00656000-007be000 r-xp 00000000 fd:00 32780 /lib64/libc-2.9.so
007be000-009be000 ---p 00168000 fd:00 32780 /lib64/libc-2.9.so
009be000-009c2000 r--p 00168000 fd:00 32780 /lib64/libc-2.9.so
009c2000-009c3000 rw-p 0016c000 fd:00 32780 /lib64/libc-2.9.so
009c3000-009c8000 rw-p 009c3000 00:00 0
009c8000-009de000 r-xp 00000000 fd:00 32770 /lib64/libgcc_s-4.3.2-20081105.so.1
009de000-00bde000 ---p 00016000 fd:00 32770 /lib64/libgcc_s-4.3.2-20081105.so.1
00bde000-00bdf000 rw-p 00016000 fd:00 32770 /lib64/libgcc_s-4.3.2-20081105.so.1
01864000-01885000 rw-p 01864000 00:00 0 ...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.