Ubuntu
thumbnailer package

Crash in queueRequest

Bug #1511553 reported by Michi Henning on 2015-10-29

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	thumbnailer (Ubuntu)	Fix Released	High	Michi Henning

Bug Description

https://errors.ubuntu.com/problem/fa22f0b18ff6dc949c4cdf768f32121d64a83b8d

We hit this occasionally, but not very often. I suspect it is because of the way we shut down: we destroy the thumbnailer instance before the dbus interface so we don't hit the race condition where one instance of the service still has the database lock while a second instance of the service is activated by a new incoming request. But, destroying the thumbnailer first means that, if a request arrives at just the right time, the dbu sinterface fires the request at the already-destroyed thumbnailer.

I recently hit a whole slew of issues in a test that destroyed the thumbnailer and dbus interface (in that order) while there were still requests sitting in the scheduler and thread pools, and fixed a bunch of these issues (see the request-cancellation branch.) In effect, via the handler class that calls back into the thumbnailer and the closures that sit in the thread pools, we have created a circular dependency between the dbus interface and thumbnailer, which makes it impossible to shut down either without causing problems.

I think the best solution would be to add a shutdown() method to the dbus interface that prevents new requests from firing. We also need some way to wait for all *received* (not just scheduled) requests to finish executing before physically shutting down.

Does DBus have a way to say "I'm in the process of shutting down. Send the request that just arrived and that I can't process anymore to a new instance of the service"? Ideally, I would like to "close the gate" on incoming requests such that they are transparently re-sent, and then wait for all currently executing requests to finish processing.

See original description

Related branches

lp:~michihenning/thumbnailer/destructor-crash

Merged into lp:thumbnailer/devel at revision 298

James Henstridge: Approve on 2015-11-10

PS Jenkins bot (community): Approve (continuous-integration) on 2015-11-09

lp:~michihenning/thumbnailer/safe-shutdown

Merged into lp:thumbnailer/devel at revision 299

James Henstridge: Approve on 2015-11-10

PS Jenkins bot (community): Approve (continuous-integration) on 2015-11-09

Revision history for this message

Michi Henning (michihenning) wrote on 2015-10-29:

There is an ancient thread about this here: https://bugs.freedesktop.org/show_bug.cgi?id=11454

Another related discussion: http://comments.gmane.org/gmane.comp.freedesktop.dbus/11549

Doesn't look like we can do what we need. It appears to be impossible to shut down without at least causing an error on the client side.

Michi Henning (michihenning) on 2015-10-29

Changed in thumbnailer:
importance:	Undecided → High

Michi Henning (michihenning) on 2015-10-30

description:

updated

Michi Henning (michihenning) on 2015-11-04

Changed in thumbnailer (Ubuntu):
importance:	Undecided → High
no longer affects:	thumbnailer

Revision history for this message

Launchpad Janitor (janitor) wrote on 2015-11-04:

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in thumbnailer (Ubuntu):
status:	New → Confirmed

Revision history for this message

Pete Woods (pete-woods) wrote on 2015-11-04:

The way DBus handles this is very basic. It queues up the requests to the name. So you'll have to release the name before you close the database. I might suggest you could make your service wait to acquire a database lock up to a certain timeout on startup. This would mean that the second instance of your process that gets spawned waits until the first instance has finished messing with the db.

Revision history for this message

Michi Henning (michihenning) wrote on 2015-11-04:

Thanks Pete! It's sort of a catch-22: if I shut down the dbus connection first, a new instance can be started while the old instance is still in the process of shutting down. And if I destroy the db instance first, requests can still be dispatched that now no longer have a valid target.

The problem with the db lock is that it's buried inside leveldb and, technically, we don't even know where that lock file is or how it works. It's a leveldb implementation detail. I'll have a look at adding a separate lock that we can maintain around the leveldb start-up/shut-down code.

The problem really is DBus here: there is no way to say to DBus "I'm in the process of shutting down, hold any requests for me until I tell you that I'm done shutting down and then send those requests to a new service instance."

Revision history for this message

Pete Woods (pete-woods) wrote on 2015-11-05:

Well I'd also argue that level DB is a problem too, if it can't say "try acquire lock up to some timeout". If it can only do an immediate "lock could not be acquired" type failure, I guess it's not the end of the world to try and get the lock every n milliseconds up to the timeout instead.

Revision history for this message

Michi Henning (michihenning) wrote on 2015-11-05:

Yes, fair point, although this functionality is really easy to layer on top so, arguably, not leveldb's job. But what is annoying is that, when the DB is locked, no unique error code is returned for the condition. Instead, the fact that the DB is locked gets lumped into the same status code that is used to indicate dozens of other problems. Sigh… When will people learn that it is necessary for the calling code to tell different error conditions apart without having to parse the text in the error string? :-(

I think I'll just use a lock file of our own around the initialisation/finalization.

Michi Henning (michihenning) on 2015-11-06

Changed in thumbnailer (Ubuntu):
assignee:	nobody → Michi Henning (michihenning)
status:	Confirmed → In Progress

Michi Henning (michihenning) on 2015-11-10

Changed in thumbnailer (Ubuntu):
status:	In Progress → Fix Committed

Michi Henning (michihenning) on 2016-01-07

Changed in thumbnailer (Ubuntu):
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

freedesktop-bugs #11454
[REOPENED] Edit

Bug watches keep track of this bug in other bug trackers.

Ubuntuthumbnailer package

Crash in queueRequest

Bug Description

Related branches

Duplicates of this bug

Other bug subscribers

Remote bug watches

Ubuntu
thumbnailer package