Prune CodeImportResult

Bug #314621 reported by Stuart Bishop
4
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Low
Stuart Bishop

Bug Description

There are 76GB of code import logfiles in the Librarian. Do we really need to keep these forever?

Assuming the answer is no, the CodeImportResult.log_file should have an expiry date set on creation. The DBA should set this for the existing logs.

Do we need the historic CodeImportResult rows at all? A cron job that prunes everything except the last few days worth of records would have the same effect.

Stuart Bishop (stub)
Changed in launchpad-bazaar:
status: New → Triaged
Revision history for this message
Jonathan Lange (jml) wrote :

All good questions! Michael, what do you think?

P.S. Not sure why this bug is "Triaged".

Changed in launchpad-bazaar:
assignee: nobody → mwhudson
status: Triaged → Incomplete
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Yes, we should prune old codeimportresults for sure. But it would be better to prune all but the last $N for each codeimport -- if an import started failing a few weeks ago, I'd want to be able to see the last few results so I could tell why. I don't suppose this would be much harder to arrange.

Changed in launchpad-bazaar:
status: Incomplete → Triaged
Jonathan Lange (jml)
Changed in launchpad-bazaar:
importance: Undecided → Low
Revision history for this message
Jonathan Lange (jml) wrote :

Maybe it would be easiest for you to do it, stub?

Changed in launchpad-bazaar:
assignee: mwhudson → stub
Revision history for this message
Stuart Bishop (stub) wrote :

I have no idea where CodeImportResult rows get created and am not familiar with the tests, so I doubt it is easier for me to do.

I think all that requires being done is doing 'codeimport.log_file.expires = now + timedelta(days=60)' after creating the codeimport object.

Or if you really do want to ensure preservation of the 'last N', do the
following after creating a codeimport:

store.execute("""
UPDATE LibraryFileAlias
SET expires = CURRENT_TIMESTAMP AT TIME ZONE 'UTC' + interval '60 days'
FROM CodeImport
WHERE
    LibraryFileAlias.id = CodeImport.log_file
    AND expires IS NULL
    AND CodeImport.branch = %s
    AND CodeImport.id NOT IN (
        SELECT id
        FROM CodeImport
        WHERE branch = %s
        ORDER BY id DESC
        LIMIT %s)
""" % (codeimport.branch, codeimport.branch, N))

Oh - and me doing the equivalent manually for all the historical records.

Changed in launchpad-bazaar:
assignee: stub → nobody
Revision history for this message
Stuart Bishop (stub) wrote :

The above query should be using CodeImportResult rather than CodeImport of course...

You can similarly prune all-but-the-last-N CodeImportResult rows entirely in a similar fashion:

store.execute("""DELETE FROM CodeImportResult
WHERE
code_import = %s
AND date_created < CURRENT_TIMESTAMP AT TIME ZONE 'UTC' - interval '60 days'
AND id NOT IN (
    SELECT id FROM CodeImportResult
    WHERE code_import = %s
    ORDER BY id DESC
    LIMIT %s)""" % (codeimportresult.code_import, codeimportresult.code_import, N))

Revision history for this message
Stuart Bishop (stub) wrote :

Or perhaps the cronjob is better for deletions as I originally suggested.

I appear to be arguing with myself.

Revision history for this message
Tim Penhey (thumper) wrote : Re: [Bug 314621] Re: Prune CodeImportResult

On Thu, 19 Feb 2009 21:46:04 Stuart Bishop wrote:
> Or perhaps the cronjob is better for deletions as I originally
> suggested.
>
> I appear to be arguing with myself.

Some people are watching though :)

Revision history for this message
Jonathan Lange (jml) wrote :

So, what's the next step?

Revision history for this message
Stuart Bishop (stub) wrote :

I'll implement good old Oscar the Grouch with a 2009 standard name to delete the old records, as I've got some other tables that need pruning too and we don't need to implement this multiple times.

Please confirm the deletion rules. I'm going with 'all CodeImportResult rows created more than 30 days ago, unless they are one of the four most recent CodeImportResults for a branch' unless someone says otherwise. And this is deletion, so don't come crying once the data is gone ;)

Changed in launchpad-bazaar:
assignee: nobody → stub
milestone: none → 2.2.3
Revision history for this message
Tim Penhey (thumper) wrote :

On Thu, 26 Feb 2009 22:53:03 Stuart Bishop wrote:
> I'll implement good old Oscar the Grouch with a 2009 standard name to
> delete the old records, as I've got some other tables that need pruning
> too and we don't need to implement this multiple times.
>
> Please confirm the deletion rules. I'm going with 'all CodeImportResult
> rows created more than 30 days ago, unless they are one of the four most
> recent CodeImportResults for a branch' unless someone says otherwise.
> And this is deletion, so don't come crying once the data is gone ;)
>
> ** Changed in: launchpad-bazaar
> Assignee: (unassigned) => Stuart Bishop (stub)
> Target: None => 2.2.3

Your rules sound fine to me.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Stuart, your rules sound perfect.

Revision history for this message
Francis J. Lacoste (flacoste) wrote :

While reviewing the tables to prune, you might also want to prune OAuthNonce. Deleting all nonces older than a day is pretty safe.

Stuart Bishop (stub)
Changed in launchpad-bazaar:
status: Triaged → In Progress
Stuart Bishop (stub)
Changed in launchpad-bazaar:
status: In Progress → Fix Committed
Tim Penhey (thumper)
Changed in launchpad-bazaar:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.