Last.fm plugin causes file descriptor starvation

Bug #1224410 reported by Jussi Pakkanen on 2013-09-12
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
grilo (Ubuntu)
Undecided
Unassigned

Bug Description

The last.fm plugin keeps a libsoup cache for its downloaded files. The default size is 10 MB. Since last.fm downloads small xml files and the cache keeps an open fd on every file, this uses up all available fds.

Attached is a patch. An alternative would be to disable the cache altogether.

Jussi Pakkanen (jpakkane) wrote :

When you say that cache keeps the FD open, I guess you mean libsoup.

Have you really checked that indeed libsoup is keeping the FD open? Or is just a theory? Which version of libsoup are you using? This shouldn't happen (I've asked a libsoup maintainer), so if libsoup keeps the descriptors open, the bug is in libsoup.

Though the patch is correct, it is just a workaround, and doesn't fix the real problem, which would be in the libsoup.

Jussi Pakkanen (jpakkane) wrote :

We have verified this to be the case. If you run many queries against lastfm plugin and look at the open file descriptor count, it keeps growing and growing. All of these open fds point to /tmp/grilo-plugin-cache-XXXX/ files. Libsoup will release all those fds when the cache is deallocated, though.

I'm assuming libsoup keeps a fd open so the files are not deleted by someone else as they are written in the /tmp directory.

Each time you start grilo, and thus lastfm album, it creates a cache on /tmp/grilo-plugin-cache-XXX, with each "XXX" different on each run.

When we exit, for different reasons, we don't remove those directories. So they remain there. But they shouldn't be keep open: the OS should free the descriptors one the application finalizes.

I understand your complain is that the files keep open, even when the application finalizes. Or, that each time a file is added in the cache, libsoup keeps it open, so, for instance, if the max. opened files is 100 and libsoup stores 100 files, if they are kept open by libsoup, it runs out of descriptors.

In one case or another, it seems the bug is beyond Grilo. Unless there's something I'm missing in the explanation.

The attachment "lastfm-cache-size.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Jussi Pakkanen (jpakkane) wrote :

The last one of these is correct. So what happens is that there is the directory /tmp/grilo-plugin-cache-XXXX. When new files are downloaded, libsoup puts them in that directory _and_ keeps an open file descriptor to each written file. So, for example, if there are three files in the directory, libsoup will have three open file descriptors. When there are too many files in the cache, the fd limit is reached and no new files can be opened.

There is no way to tell libsoup to close the fds it keeps open that I can see. Unless libsoup is fixed there are only two ways to work around this:

- do not use libsoup's cache at all
- make the cache small enough so that the fd limit is not reached

In the case of last.fm plugin the XML files it downloads are around 3 kB. Reducing the cache size to 1 MB means that libsoup will keep roughly ~300 open files. The fd limit on a standard install is 1024 so this will prevent fd starvation.

> There is no way to tell libsoup to close the fds it keeps open that I can see. Unless libsoup is fixed there are only two ways
> to work around this:

That's my point: it shouldn't be happening. I've asked a libsoup maintainer, and he told me that libsoup shouldn't keep the files opened. Hence, I think the bug is in libsoup.

Which version of libsoup are you using? Could you try with the latest version of libsoup or ideally with an upstream version?

Jussi Pakkanen (jpakkane) wrote :

It is the newest version that is in saucy, that is package version 2.42.2-6.

Jussi Pakkanen (jpakkane) wrote :
Download full text (4.6 KiB)

There may be a threading issue here. Feel free to try it yourself with the code below. Here are the steps:

- run in gdb, add a break to main
- step over the call to do_test so you are on the line that says "return 0"
- in a terminal determine the process id of the program
- ls /proc/process_id/fd

Sometimes you get this:

lrwx------ 1 jpakkane jpakkane 64 syys 12 16:48 0 -> /dev/pts/4
lrwx------ 1 jpakkane jpakkane 64 syys 12 16:48 1 -> /dev/pts/4
lrwx------ 1 jpakkane jpakkane 64 syys 12 16:48 10 -> socket:[61063]
lrwx------ 1 jpakkane jpakkane 64 syys 12 16:48 11 -> anon_inode:[eventfd]
lrwx------ 1 jpakkane jpakkane 64 syys 12 16:48 12 -> anon_inode:[eventfd]
l-wx------ 1 jpakkane jpakkane 64 syys 12 16:48 13 -> /tmp/grilo-plugin-cache-7ODM3W/1814837434
lrwx------ 1 jpakkane jpakkane 64 syys 12 16:48 2 -> /dev/pts/4
lrwx------ 1 jpakkane jpakkane 64 syys 12 16:48 3 -> socket:[66649]
lrwx------ 1 jpakkane jpakkane 64 syys 12 16:48 4 -> socket:[66650]
lr-x------ 1 jpakkane jpakkane 64 syys 12 16:48 5 -> pipe:[66651]
l-wx------ 1 jpakkane jpakkane 64 syys 12 16:48 6 -> pipe:[66651]
lrwx------ 1 jpakkane jpakkane 64 syys 12 16:48 7 -> anon_inode:[eventfd]

Note the fd open to grilo-plugin cache.

Other times you get this:

lrwx------ 1 jpakkane jpakkane 64 syys 12 16:47 0 -> /dev/pts/4
lrwx------ 1 jpakkane jpakkane 64 syys 12 16:48 1 -> /dev/pts/4
lrwx------ 1 jpakkane jpakkane 64 syys 12 16:48 10 -> socket:[66165]
lrwx------ 1 jpakkane jpakkane 64 syys 12 16:48 11 -> anon_inode:[eventfd]
lrwx------ 1 jpakkane jpakkane 64 syys 12 16:48 12 -> anon_inode:[eventfd]
lrwx------ 1 jpakkane jpakkane 64 syys 12 16:48 2 -> /dev/pts/4
lrwx------ 1 jpakkane jpakkane 64 syys 12 16:48 3 -> socket:[66649]
lrwx------ 1 jpakkane jpakkane 64 syys 12 16:48 4 -> socket:[66650]
lr-x------ 1 jpakkane jpakkane 64 syys 12 16:48 5 -> pipe:[66651]
l-wx------ 1 jpakkane jpakkane 64 syys 12 16:48 6 -> pipe:[66651]
lrwx------ 1 jpakkane jpakkane 64 syys 12 16:48 7 -> anon_inode:[eventfd]

In this case there is no open fd to the cache file. The behaviour is undeteministic.

Here's the code:

#include<cstdio>
#include<grilo.h>
#include<string>

using namespace std;

GrlRegistry* setup() {
  GrlRegistry *registry = grl_registry_get_default();
  if (!registry) {
    printf("Could not get registry.\n");
    return 0;
  }
  GrlConfig *config = grl_config_new("grl-lastfm-albumart", "grl-lastfm-albumart");
  if (!config) {
    printf("Could not create config.\n");
    return 0;
  }
  if(!grl_registry_add_config(registry, config, 0)) {
    printf("Could not add config.\n");
    return 0;
  }
  if(!grl_registry_load_plugin_by_id(registry, "grl-lastfm-albumart", 0)) {
    printf("Could not load plugin.\n");
    return 0;
  }
  return registry;
}

void print_sources() {
  GrlRegistry *registry = setup();
  if(!registry)
    return;
  printf("List of plugins:\n");
  GList *orig_list = grl_registry_get_sources(registry, 0);
  for(GList *list = orig_list; list != 0; list = list->next) {
    GrlSource *src = (GrlSource*)list->data;
    printf("%s\n", grl_source_get_name(src));
  }
  g_list_free(orig_list);
}

void do_test() {
  string id("grl-lastfm-albumart");
...

Read more...

Hm..

In the example I only see one fd opened. Shouldn't be there *lot* of descriptors opened?

Jussi Pakkanen (jpakkane) wrote :

This test does only one lookup so there is only one file.

Paweł Stołowski (stolowski) wrote :

Attaching a valgrind trace obtained by running mediascanner with 'valgrind --track-fds=yes --trace-children=yes' and scanning 74 mp3 files; I interrupted the process with CTRL+C a few seconds after it finished scanning. The trace shows 47 open fds referencing cache files, along with stack traces showing where the origins of fds are.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in grilo-plugins (Ubuntu):
status: New → Confirmed

But are you using threads? Nor grilo or soup-cache are thread safe....

Jussi Pakkanen (jpakkane) wrote :

Mediascanner uses mutexes to guard agains that. But that is irrelevant since the exact same issue happens with the test application above that is single-threaded.

Jussi Pakkanen (jpakkane) wrote :

The issue turned out to be a missing g_object_unref in reply_cb. This is in Grilo and not Grilo-plugins. See attached patch. This part of the code has been renamed somewhere between the version in Ubuntu and current trunk so please check thoroughly before committing.

Great.

I'll check it and fix in upstream. Thanks!

The patch you attached is not entirely correct: the stream must be unreffed later, after closing it.

I'm attaching a new version of the patch. Could you verify if this patch fixes the problem? If so i'll push it upstream.

Michal Hruby (mhr3) wrote :

Both patches are correct, async operations take a ref on the object itself, so unref after starting an async operation is valid and arguably more readable as you don't need to follow all the callbacks to discover when are all the async operations finished (plus no need to worry about errors).

Fixed upstream.

commit a6d96cd56dca6232ffdd92678b4c73adc5466b94
Author: Juan A. Suarez Romero <email address hidden>
Date: Fri Sep 13 16:00:19 2013 +0000

    net: unref stream after using it

    https://bugs.launchpad.net/bugs/1224410

 libs/net/grl-net-wc.c | 1 +
 1 file changed, 1 insertion(+)

Paweł Stołowski (stolowski) wrote :

I've verified the patch against grilo that's currently in saucy (so I had to patch grl-net-soup-unstable.c instead of grl-net-wc.c) and it worked fine just like original patch.

Attached is the patch backported to grillo 0.2.6.

Michal Hruby (mhr3) on 2013-09-13
affects: grilo-plugins (Ubuntu) → grilo (Ubuntu)
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package grilo - 0.2.6-2ubuntu1

---------------
grilo (0.2.6-2ubuntu1) saucy; urgency=low

  * debian/patches/fix-unref-grilo.diff:
    - Cherry-pick fix from upstream for (LP: #1224410)
 -- Lukasz 'sil2100' Zemczak <email address hidden> Fri, 13 Sep 2013 19:05:06 +0200

Changed in grilo (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers