Comment 1 for bug 990394

Revision history for this message
James Westby (james-w) wrote : Re: [Bug 990394] [NEW] importer repeatedly requeues recently published packages

On Sat, 28 Apr 2012 09:52:09 -0000, Max Bowsher <email address hidden> wrote:
> Public bug reported:
>
> The importer currently repeatedly requeues recently published packages
> many times.
>
> Specifically, every 5 minutes, it queues a job to import everything that
> was newly published from 10 minutes *BEFORE* the last package it
> previously knew to be published.
>
> Note the *BEFORE* meaning that subsequent runs of the queue adder script
> deliberately overlap with what the previous run processed.

This was a concious decision when I wrote the code, because I wasn't
sure what guarantees LP would provide about the API responses and the
timestamps they contain.

I unfortunately have no data about whether this check has ever actually
caught something that would have been missed.

I think that we could reduce the impact of this overlap by not adding a
job if a previous run created a job for the id of the publishing record
(or for the (package, version) combo). That would still have a bit of
overlap in case the API isn't monotonic, without causing the issue you
describe.

Alternatively, given that the importer loops over all packages anyway
when not doing anything, it is eventually consistent, so the overlap
could be dropped with only a small chance that the branch will be out of
date for longer than currently if an oddity happens.

> Note: the way the web UI presents the queue, you can't see the buildup,
> because the number presented in the web UI is *unique* queue entries. A
> "sqlite3 meta.db 'SELECT COUNT(*) FROM jobs WHERE active'" shows the
> truth, as does inspecting the driver logs to see how many times some
> packages get re-processed in a single day.

Do you think that the unique constraint should be dropped from the queue
display on the status page?

Thanks,

James