Comment 4 for bug 797749

Revision history for this message
Lars Butler (lars-butler) wrote :

I just had a discussion about this with Muharem (https://launchpad.net/~al-maisan). Here's what we talked about:

There are a few issues with my initial solution. For starters, my solution for reducing Redis bloat actually contributes (if only temporarily) to Redis bloat. I think technically my solution works, but there has to be a better way.

One suggestion was to make the Redis garbage collection an asynchronous process (which is either invoked 1 time by the OQ engine when a job completes, or runs periodically as a daemon). This also involves refactoring our key generation scheme a bit; instead of generating GUIDs to guarantee uniqueness, we could simply have a 'next key' function (kind of like a SQL autoincrement). We would use 'incr' (http://redis.io/commands/incr) to handle this. In our Redis store, we would simply have to store a single integer under the key "JOB_KEY"--or something like that--and each time we needed a new job ID we would increment and grab the new value.

With this solution we might also have to store a list of 'completed job IDs', which the GC daemon can query so it knows what to clean up. If we can guarantee uniqueness of job ID tokens through convention in our software (perhaps by surrounding the job ID integer with || || or something), we can simply query Redis to get all keys matching a wildcard (http://redis.io/commands/keys).

I like the idea of running the garbage collection asynchronously. However, I would argue that this is a crucial piece of 'job completion' and therefore could conceivably be part of the OQ engine. (But it doesn't have to be.)

If this is implemented as a separate process/daemon, one part about this I don't like is that it will require additional system configuration for an OpenQuake software deployment (for example, we may have to set up a cron job or something equivalent).