Gearman

Job reduction reduces against jobs being worked on.

Bug #1054378 reported by Eskil Heyn Olsen on 2012-09-21

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Gearman	New	Wishlist	Brian Aker

Bug Description

Not necessarily a bug, but a semantic questionable details.

The job reduction (using the uniqueid) just checks against the queue. However, if the job is actively being worked on, it still gets reduced.

Depending on the worker semantic, that might be troublesome. Example;
   * We transfer DB primary-key as job parameters to a worker that reads the row and updates a cache.
   * We set unique='-'
   * If the worker has gotten the job, read the data, but not written to the cache yet...
   * ... and a table update happens, causing a new job to be issued with the same parameters.
   * The second job gets reduced because it's unique matches.
   * However, the cache now contains stale data.

We could set a unique that's a hash of the data, so it'll be different, but to compute that hash would be very expensive and cause user-facing slowdowns. Which is exactly why we use gearmand to asynchronously do the heavy lifting.

Instead I propose this change that optionally lets you run gearmand in a mode, where job reduction is only done against jobs in the queue, and NOT ones with workers.

Patch attached and at http://bazaar.launchpad.net/~eskil/gearmand/gearmand/revision/650

Revision history for this message

Eskil Heyn Olsen (eskil) wrote on 2012-09-21:

job-reduce-queue-only.diff Edit (8.4 KiB, text/x-diff)

Revision history for this message

Brian Aker (brianaker) wrote on 2012-10-07:

One quick note, are you aware that '-' has a special value as a unique?

Changed in gearmand:
assignee:	nobody → Brian Aker (brianaker)

Revision history for this message

Brian Aker (brianaker) wrote on 2012-10-07:

Ok, I got a chance to walk through what you are doing, and you are aware and understand its meaning (I know that feature is poorly documented).

How do you handle the race condition where if you have a slow worker it will update the cache entry out of order?

Changed in gearmand:
importance:	Undecided → Wishlist

Revision history for this message

Eskil Heyn Olsen (eskil) wrote on 2012-10-08:

Hi Brian,

Yes, we're aware of the '-' id. I had an earlier patch that added a '+' id to accomplish what I wanted to do. It acted like '-', but did not reduce if a job was already being worked. I dropped that, because we now have a rising need to put the time-job-was-submitted into the job arguments (long story short, need to deal with db replication across datacenters and want to pause workers until the db replication has caught up). So even "identical" jobs will always have different time stamps, but can reduce on the remaining arguments.

We deal with race conditions in the worker code. The read the current generation of the cached data. When it writes, it does an atomic test-and-set to bump the generation by one. If the test-and-set fails, the entire job is redone.

However in the common case, the job reduction on unique-id is not reliable in our SQL based setup. A worker might be just about to COMMIT already stale data to a denormalised row (caches). Hence the need for this patch.

Thomas Copeland (copeland-tj) on 2014-02-25

Changed in gearmand:
status:	New → Fix Committed
status:	Fix Committed → New

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

job-reduce-queue-only.diff Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.