Ubuntu

"File Open" dialogues for gedit, etc., take 25 seconds to appear

Reported by Ralph Corderoy on 2009-05-18
46
This bug affects 5 people
Affects Status Importance Assigned to Milestone
tracker (Ubuntu)
Medium
Unassigned

Bug Description

Binary package hint: tracker

Ubuntu 9.04, tracker 0.6.93-0ubuntu1. trackerd and tracker-indexer
reached deadlock over a couple of ~/.cache/tracker files resulting in 25
second timeouts before the "File Open" dialogue of various programs
would appear.

A friend reported the "File Open" dialogues of various GUI programs,
e.g. gedit, Scite, ooffice, was very slow in opening, e.g. taking more
than 20 seconds. I strace'd gedit whilst he did a "File Open".

    15520 22.5956 gettimeofday({1242636766, 167011}, NULL) = 0
    15520 22.5963 writev(15, [{"l\1\0\1\34\0\0\0\r\0\0\0\177\0\0\0\1\1o\0\25\0\0\0/org/free"..., 144}, {"\27\0\0\0org.freedesktop.Tracker\0"..., 28}], 2) = 172
    15520 22.5966 gettimeofday({1242636766, 167998}, NULL) = 0
    15520 22.5967 writev(15, [{"l\1\0\1\0\0\0\0\16\0\0\0{\0\0\0\1\1o\0\30\0\0\0/org/free"..., 144}, {""..., 0}], 2) = 144
    15520 22.5968 gettimeofday({1242636766, 168246}, NULL) = 0
    15520 22.5976 poll([{fd=15, events=POLLIN}], 1, 25000) = 1 ([{fd=15, revents=POLLIN}])
    15520 22.5977 read(15, "l\2\1\1\0\0\0\0\3\0\0\0005\0\0\0\6\1s\0\6\0\0\0:1.160\0\0\5"..., 2048) = 882
    15520 22.5981 read(15, 0xf82e20, 2048) = -1 EAGAIN (Resource temporarily unavailable)
    15520 22.5982 gettimeofday({1242636766, 169625}, NULL) = 0
    15520 22.5983 poll([{fd=15, events=POLLIN}], 1, 24999) = 0 (Timeout)
    15520 47.6211 gettimeofday({1242636791, 192561}, NULL) = 0
    15520 47.6215 writev(15, [{"l\1\1\1w\0\0\0\17\0\0\0\177\0\0\0\1\1o\0\25\0\0\0/org/free"..., 144}, {"r\0\0\0type='signal',sender='org.fre"..., 119}], 2) = 263
    15520 47.6219 writev(15, [{"l\1\1\1\211\0\0\0\20\0\0\0\177\0\0\0\1\1o\0\25\0\0\0/org/free"..., 144}, {"\204\0\0\0type='signal',sender='org.fre"..., 137}], 2) = 281
    15520 47.6221 writev(15, [{"l\1\1\1\211\0\0\0\21\0\0\0\177\0\0\0\1\1o\0\25\0\0\0/org/free"..., 144}, {"\204\0\0\0type='signal',sender='org.fre"..., 137}], 2) = 281

Note the 25.0228 second gap between the start of the poll() and the next
system call; this matches the 25 second delay the user sees. gedit's
file descriptor 15 is a socket.

    $ lsof -p 15520
    gedit 15520 peter 15u unix 0xffff88005317ec00 2003719 socket

After a bit of poking around, I realised it was trying to read from
trackerd.

    $ ps l $(pidof trackerd) $(pidof tracker-indexer)
    F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
    0 1000 5339 5196 39 19 310584 10184 hrtime SN ? 0:58 /usr/lib/tracker/trackerd
    0 1000 5370 1 39 19 175772 18392 hrtime SNl ? 1:02 /usr/lib/tracker/tracker-indexer
    $

strace showed trackerd was endlessly attempting to get a read lock on
its FD 12.

    $ strace -tt -p 5339
    10:16:42.095702 fcntl(12, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824, len=1}) = -1 EAGAIN (Resource temporarily unavailable)
    10:16:42.096032 nanosleep({0, 100000000}, NULL) = 0
    10:16:42.196494 fcntl(12, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824, len=1}) = -1 EAGAIN (Resource temporarily unavailable)
    10:16:42.196816 nanosleep({0, 100000000}, NULL) = 0

trackerd's FD 12 is file-meta.db.

    $ lsof -p 5339
    trackerd 5339 peter 12u REG 8,52 2077696 17186898 /home/peter/.cache/tracker/file-meta.db

And one other program, tracker-indexer, has that file open.

    $ lsof /home/peter/.cache/tracker/file-meta.db
    COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
    trackerd 5339 peter 12u REG 8,52 2077696 17186898 /home/peter/.cache/tracker/file-meta.db
    tracker-i 5370 peter 15uw REG 8,52 2077696 17186898 /home/peter/.cache/tracker/file-meta.db
    $

It's endlessly trying to get a write lock on its FD 13,

    $ strace -tt -p 5370
    10:18:46.653514 fcntl(13, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=1073741826, len=510}) = -1 EAGAIN (Resource temporarily unavailable)
    10:18:46.653655 nanosleep({0, 100000000}, NULL) = 0
    10:18:46.753864 fcntl(13, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=1073741826, len=510}) = -1 EAGAIN (Resource temporarily unavailable)
    10:18:46.754004 nanosleep({0, 100000000}, NULL) = 0

which is common.db,

    $ lsof -p 5370
    tracker-i 5370 peter 13ur REG 8,52 84992 17186897 /home/peter/.local/share/tracker/data/common.db

and trackerd is the other program that has that file open.

    $ lsof /home/peter/.local/share/tracker/data/common.db
    COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
    trackerd 5339 peter 10ur REG 8,52 84992 17186897 /home/peter/.local/share/tracker/data/common.db
    tracker-i 5370 peter 13ur REG 8,52 84992 17186897 /home/peter/.local/share/tracker/data/common.db
    $

The inode numbers of the two files,

    $ ls -i /home/peter/.cache/tracker/file-meta.db \
    > /home/peter/.local/share/tracker/data/common.db
    17186898 /home/peter/.cache/tracker/file-meta.db
    17186897 /home/peter/.local/share/tracker/data/common.db

which appear in the list of all the locks held.

    $ cat /proc/locks
    1: POSIX ADVISORY READ 8029 08:34:17187002 1073741826 1073742335
    2: POSIX ADVISORY READ 15938 08:32:2810628 0 EOF
    3: POSIX ADVISORY READ 15938 08:32:2771634 0 EOF
    4: POSIX ADVISORY READ 15938 08:32:2771633 0 EOF
    5: POSIX ADVISORY READ 15938 08:32:2810417 0 EOF
    6: POSIX ADVISORY READ 15938 08:32:2810616 0 EOF
    7: POSIX ADVISORY READ 15938 08:32:2810627 0 EOF
    8: POSIX ADVISORY READ 15938 08:32:2810416 0 EOF
    9: POSIX ADVISORY WRITE 8029 08:34:17186986 1073741824 1073742335
    10: POSIX ADVISORY WRITE 8029 08:34:17186981 0 EOF
    11: POSIX ADVISORY READ 5339 08:34:17186897 1073741826 1073742335
    12: POSIX ADVISORY WRITE 5370 08:34:17186897 1073741824 1073741825
    13: POSIX ADVISORY WRITE 5370 08:34:17186899 1073741824 1073742335
    14: POSIX ADVISORY WRITE 5370 08:34:17186898 1073741824 1073742335
    15: POSIX ADVISORY READ 5370 08:34:17186897 1073741826 1073742335
    16: FLOCK ADVISORY WRITE 4885 00:12:10848 0 EOF
    17: POSIX ADVISORY WRITE 4858 00:12:10817 0 EOF
    $

Re-ording that a bit and adding in the lock attempts from above, we have

    common.db 1073741824 len=2 WRITE tracker-indexer
    common.db 1073741826 len=510 READ trackerd
    common.db 1073741826 len=510 READ tracker-indexer
    common.db 1073741826 len=510 F_WRLCK tracker-indexer

    file-meta.db 1073741824 len=512 WRITE tracker-indexer
    file-meta.db 1073741824 len=1 F_RDLCK trackerd

It's clear that both are trying to get a lock that's blocked by an
existing lock the other has on the same file. This bug could be
confirmed based solely on the above investigation IMHO, it does not need
someone else to suffer the same bug *and* get to the point where they
realise it's the above causing it. This thread,
http://ubuntuforums.org/showthread.php?t=811795, has users suffering
similar problems. Their solution is to purge the tracker package!

Killing tracker-indexer frees some of the file locks, tracker-indexer gets restarted, and "File Open" dialogues then only take a few seconds to appear, like normal.

Chris Coulson (chrisccoulson) wrote :

I'm seeing this at the moment too.

Changed in tracker (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed

I'm running 0.7 from GIT here but the issue looks the same. When
trackerd hangs, I see this repeatedly when I strace the process:

20:54:49.814923 fcntl(14, F_SETLK, {type=F_RDLCK, whence=SEEK_SET,
start=1073741824, len=1}) = -1 EAGAIN (Resource temporarily unavailable)
20:54:49.815062 nanosleep({0, 100000000}, NULL) = 0
20:54:49.915266 fcntl(14, F_SETLK, {type=F_RDLCK, whence=SEEK_SET,
start=1073741824, len=1}) = -1 EAGAIN (Resource temporarily unavailable)
20:54:49.915390 nanosleep({0, 100000000}, NULL) = 0
20:54:50.015593 fcntl(14, F_SETLK, {type=F_RDLCK, whence=SEEK_SET,
start=1073741824, len=1}) = -1 EAGAIN (Resource temporarily unavailable)
20:54:50.015747 nanosleep({0, 100000000}, NULL) = 0

The attached trackerd-fd.log shows that fd14 is meta.db. The last few
entries in my trackerd.log shows this:

18 May 2009, 19:23:20: Tracker: Processed 30522/64458, indexed 24006,
module:'files', 05m 22s left, 04m 50s elapsed
18 May 2009, 19:23:23: Tracker: Processed 30742/64458, indexed 24006,
module:'files', 05m 21s left, 04m 53s elapsed
18 May 2009, 19:23:30: Tracker: Statistics cache has expired,
updating...
18 May 2009, 19:23:30: Tracker: New DBus request, pausing indexer
18 May 2009, 19:23:30: Tracker: State change from 'Indexing' -->
'Paused'
18 May 2009, 19:23:30: Tracker: Stopping disk space check
18 May 2009, 19:23:30: Tracker: Requesting statistics from database for
an accurate signal

This shows that the hang is triggered when the stats_cache_timeout ()
callback is called, which is called 300 seconds after starting the
indexer. The attached backtrace is confirming this too

Chris Coulson (chrisccoulson) wrote :

In addition to this, I ran tracker-indexer through GDB, and it doesn't actually seem to be doing anything with meta.db at this time.

Chris Coulson (chrisccoulson) wrote :

I got to the bottom of the issue in 0.7 and it is now fixed upstream. However, the issue in 0.6.93 seems to be a different one, so I need to debug this some more.

Changed in tracker (Ubuntu):
assignee: nobody → Chris Coulson (chrisccoulson)
status: Confirmed → In Progress
Calmarius (calmarius) wrote :

I experience this file open dialog hangings after I save a file on a pen drive and disconnect it. After this file open/save dialogs appear late.

Martin Pitt (pitti) on 2011-02-15
Changed in tracker (Ubuntu):
assignee: Chris Coulson (chrisccoulson) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers