evolution-exchange hungs

Bug #385562 reported by Ivan Brezina on 2009-06-10
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
evolution-exchange (Ubuntu)
Medium
Unassigned

Bug Description

Description: Ubuntu 9.04
Release: 9.04
evolution-exchange: 2.26.0-0ubuntu3

This is probably not a bug on evolution-exchange, maybe root cause is insanity of glib itself.
Strace for evolution-exchange shows this:
poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=23, events=POLLIN}, {fd=24, events=POLLIN}, {fd=22, events=POLLIN}], 7, -1) = 1 ([{fd=24, revents=POLLNVAL}])
read(3, 0x8725c60, 4096) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}, {fd=8, events=POLLIN|POLLPRI}, {fd=11, events=POLLIN|POLLPRI}, {fd=23, events=POLLIN}, {fd=24, events=POLLIN}, {fd=22, events=POLLIN}], 7, -1) = 1 ([{fd=24, revents=POLLNVAL}])
read(3, 0x8725c60, 4096) = -1 EAGAIN (Resource temporarily unavailable)

lsof for evolution-exchange is here:
evolution 31918 ibrezina 0u CHR 1,3 2977 /dev/null
evolution 31918 ibrezina 1u CHR 1,3 2977 /dev/null
evolution 31918 ibrezina 2u CHR 1,3 2977 /dev/null
evolution 31918 ibrezina 3u unix 0xf2295a80 2442614 socket
evolution 31918 ibrezina 4r FIFO 0,6 2442616 pipe
evolution 31918 ibrezina 5w FIFO 0,6 2442616 pipe
evolution 31918 ibrezina 6r FIFO 0,6 2442617 pipe
evolution 31918 ibrezina 7w FIFO 0,6 2442617 pipe
evolution 31918 ibrezina 8r FIFO 0,6 2442618 pipe
evolution 31918 ibrezina 9w FIFO 0,6 2442618 pipe
evolution 31918 ibrezina 10u unix 0xf2295e00 2442619 socket
evolution 31918 ibrezina 11u unix 0xf2154700 2442621 socket
evolution 31918 ibrezina 12u unix 0xf22b8380 2442623 socket
evolution 31918 ibrezina 13u unix 0xf22b81c0 2442626 socket
evolution 31918 ibrezina 14u unix 0xf5fb4540 2442628 /tmp/orbit-ibrezina/linc-7cae-0-69b61c444d053
evolution 31918 ibrezina 15u unix 0xf228d000 2442631 /tmp/orbit-ibrezina/linc-7cae-0-69b61c444d053
evolution 31918 ibrezina 16u unix 0xf2294e00 2442635 /tmp/orbit-ibrezina/linc-7cae-0-69b61c444d053
evolution 31918 ibrezina 17u unix 0xf228d8c0 2442632 socket
evolution 31918 ibrezina 18r FIFO 0,6 2442636 pipe
evolution 31918 ibrezina 19w FIFO 0,6 2442636 pipe
evolution 31918 ibrezina 20r FIFO 0,6 2442637 pipe
evolution 31918 ibrezina 21w FIFO 0,6 2442637 pipe
evolution 31918 ibrezina 22u IPv4 2444693 UDP *:31919
evolution 31918 ibrezina 23u unix 0xf2294380 2442642 /tmp/.exchange-ibrezina/ibrezina@XXX
evolution 31918 ibrezina 25u unix 0xf21391c0 2442666 /tmp/.exchange-ibrezina/ibrezina@XXX
evolution 31918 ibrezina 47u sock 0,4 2444620 can't identify protocol

As you see here, there is no socket no 24, but even of there is no socket no. 24 evolution process tries continuesly
loop through pool and read. I wander why is it trying to read from socket no. 3 if the pool syscall returned an error.

Here is stacktrace too:
(gdb) bt
#0 0xb7f1a430 in __kernel_vsyscall ()
#1 0xb6e54ae7 in poll () from /lib/tls/i686/cmov/libc.so.6
#2 0xb6f2874b in IA__g_poll (fds=0x87db858, nfds=7, timeout=-1) at /build/buildd/glib2.0-2.20.1/glib/gpoll.c:127
#3 0xb6f1af82 in g_main_context_iterate (context=0x8737a08, block=1, dispatch=1, self=0x87074d0) at /build/buildd/glib2.0-2.20.1/glib/gmain.c:2761
#4 0xb6f1b5ba in IA__g_main_loop_run (loop=0x878ada8) at /build/buildd/glib2.0-2.20.1/glib/gmain.c:2656
#5 0xb7168cc3 in bonobo_main () from /usr/lib/libbonobo-2.so.0
#6 0x0805bb9b in main (argc=3, argv=Cannot access memory at address 0xb
) at main.c:278

Ivan Brezina (ibrezina) on 2009-07-30
tags: added: glib2
Ivan Brezina (ibrezina) wrote :

I just went through glib2 sources and now I am quite sure, that this is a bug in glib2 not in evolution.

The man page for poll says: "On success, a positive number is returned; this is the number of structures which have non-zero revents fields (in other words, those descriptors with events or errors reported)". So if the peer closes a TCP connection while the thread is waiting for poll, then the poll call "succeeds" - it returns positive number 1. The errno is set on affected fd - POLLNVAL or other flag is set.

Unfortunately there is no code in glib2(gmain.c gpoll.c) that handles such a situation. I guess that the function g_main_context_check should if there is really something to read from the socket and possibly remove failed/closed TCP connection from the fd list.

Pedro Villavicencio (pedro) wrote :

Thanks for your bug report. Please try to obtain a backtrace http://wiki.ubuntu.com/DebuggingProgramCrash and attach the file to the bug report. This will greatly help us in tracking down your problem.

Changed in evolution-exchange (Ubuntu):
importance: Undecided → Medium
status: New → Incomplete
Pedro Villavicencio (pedro) wrote :

We are closing this bug report because it lacks the information we need to investigate the problem, as described in the previous comments. Please reopen it if you can give us the missing information, and don't hesitate to submit bug reports in the future. To reopen the bug report you can click on the current status, under the Status column, and change the Status back to New. Thanks again!.

Changed in evolution-exchange (Ubuntu):
status: Incomplete → Invalid
Ivan Brezina (ibrezina) wrote :

Hi I finally managed to create a stacktrace - the newer version of evolution-exchange does not hang so often.
Here is an output from "thread apply all bt full".
(gdb) info threads
  2 Thread 0xb5e0ab90 (LWP 13896) 0xb7fa7430 in __kernel_vsyscall ()
  1 Thread 0xb5ffd9a0 (LWP 13894) 0xb7c71a35 in pthread_mutex_lock () from /lib/tls/i686/cmov/libpthread.so.0
(gdb) thread apply all bt full
Thread 2 (Thread 0xb5e0ab90 (LWP 13896)):
#0 0xb7fa7430 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb6ee1ae7 in poll () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb6fb574b in IA__g_poll (fds=0x8576a18, nfds=7, timeout=-1) at /build/buildd/glib2.0-2.20.1/glib/gpoll.c:127
No locals.
#3 0xb6fa7f82 in g_main_context_iterate (context=0x856a918, block=1, dispatch=1, self=0x856ab20)
    at /build/buildd/glib2.0-2.20.1/glib/gmain.c:2761
 max_priority = 2147483647
 timeout = -1
 some_ready = <value optimized out>
 nfds = 7
 allocated_nfds = <value optimized out>
 fds = (GPollFD *) 0x8576a18
 __PRETTY_FUNCTION__ = "g_main_context_iterate"
#4 0xb6fa85ba in IA__g_main_loop_run (loop=0x856a998) at /build/buildd/glib2.0-2.20.1/glib/gmain.c:2656
 self = (GThread *) 0x856ab20
 __PRETTY_FUNCTION__ = "IA__g_main_loop_run"
#5 0xb71a88c0 in ?? () from /usr/lib/libORBit-2.so.0
No symbol table info available.
#6 0xb6fcf7bf in g_thread_create_proxy (data=0x856ab20) at /build/buildd/glib2.0-2.20.1/glib/gthread.c:635
 __PRETTY_FUNCTION__ = "g_thread_create_proxy"
#7 0xb7c704ff in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#8 0xb6eec49e in clone () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.

Thread 1 (Thread 0xb5ffd9a0 (LWP 13894)):
#0 0xb7c71a35 in pthread_mutex_lock () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#1 0xb6fa79fc in IA__g_main_context_prepare (context=0x8535cd0, priority=0xbfbc5308)
    at /build/buildd/glib2.0-2.20.1/glib/gmain.c:2136
 result = 0
 prepare = <value optimized out>
 source_timeout = -1
 i = <value optimized out>
 n_ready = 0
 current_priority = 2147483647
 source = (GSource *) 0x857c118
#2 0xb6fa7dda in g_main_context_iterate (context=0x8535cd0, block=1, dispatch=1, self=0x85064d0)
    at /build/buildd/glib2.0-2.20.1/glib/gmain.c:2428
 max_priority = <value optimized out>
 timeout = <value optimized out>
 some_ready = <value optimized out>
 nfds = <value optimized out>
 allocated_nfds = <value optimized out>
 fds = (GPollFD *) 0x8598a20
 __PRETTY_FUNCTION__ = "g_main_context_iterate"
#3 0xb6fa85ba in IA__g_main_loop_run (loop=0x856d4b0) at /build/buildd/glib2.0-2.20.1/glib/gmain.c:2656
 self = (GThread *) 0x85064d0
 __PRETTY_FUNCTION__ = "IA__g_main_loop_run"
#4 0xb71f5cc3 in bonobo_main () from /usr/lib/libbonobo-2.so.0
No symbol table info available.
#5 0x0805bb9b in main (argc=3, argv=Cannot access memory at address 0x5) at main.c:278
 path = <value optimized out>
#0 0xb7c71a35 in pthread_mutex_lock () from /lib/tls/i686/cmov/libpthread.so.0

This error(hung) usually occurs when connection to OWA is lost.

Denys Rtveliashvili (rtvd) wrote :

I believe the last comment has the information necessary to re-open the bug.

Changed in evolution-exchange (Ubuntu):
status: Invalid → New
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers