rosetta-pofile-stats.py needs optimizing

Bug #351011 reported by Stuart Bishop
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
High
Unassigned
Ubuntu Translations
New
Undecided
Unassigned

Bug Description

In today's staging run, rosetta-pofile-stats.py took 6 hours to run. This makes it impractical to perform as part of the staging rollout and impractical to run daily on production.

Stuart Bishop (stub)
Changed in rosetta:
status: New → Triaged
Revision history for this message
Данило Шеган (danilo) wrote :

Can it not be done after staging roll-out (i.e. after staging has started up)? The extra load should not hurt on staging (it might even help emulate the load from production).

Not that we should not optimize it, of course.

Revision history for this message
Данило Шеган (danilo) wrote :

The easy thing we could do: run updateStatistics only on pofiles that have changed in the last week or so (should cover cases where we miss one or two daily runs).

Revision history for this message
Stuart Bishop (stub) wrote :

@danilo - it is being run after staging has started up. The problem is we are approaching 24 hours to do the rollout and run all the batch jobs - we will either need to run the post-update batch jobs in parallel (possibly causing load or locking issues), or update less frequently than daily.

Revision history for this message
Robert Collins (lifeless) wrote :

@Danilo - is it still taking this long with recife, or have you fixed it in passing? I'm triaging to high to reflect the priority this has; if it already fixed thats obviously even better ;)

Changed in launchpad:
importance: Undecided → High
Revision history for this message
Данило Шеган (danilo) wrote :

It's taking even longer today (not related to recife). We do have a pofile-stats-daily (which only runs over a subset of PO files), but that's disabled because of the DBLoopTuner stalls on replication slave rebuilds (bug 622670).

Revision history for this message
Данило Шеган (danilo) wrote :

Also bug 622668 about the actual pofile-stats-daily script being slow due to DBLoopTuner. It is also currently set to run over 7 days of PO files, but it should be enough to do it over 2 days of them if it runs daily (as was the plan).

I am pretty sure we should not aim to fix pofile-stats script performance which runs over all the PO files in our DB, because that simply can't scale.

The problem is that statistics get out of date in the first place even though we do try to keep them up-to-date after every update (we call pofile.updateStatistics after every update, and this might be fixed in recife branch because we are getting rid of multiple points where things are calculated: triggers, property validators and regular LP code). However, even fixing that won't be a full solution since we do need delayed statistics update for sharing PO files, which is where pofile-stats-daily is useful.

Revision history for this message
Stuart Bishop (stub) wrote : Re: [Bug 351011] Re: rosetta-pofile-stats.py needs optimizing

On Wed, Jan 12, 2011 at 1:52 AM, Данило Шеган <email address hidden> wrote:

> It's taking even longer today (not related to recife). We do have a
> pofile-stats-daily (which only runs over a subset of PO files), but
> that's disabled because of the DBLoopTuner stalls on replication slave
> rebuilds (bug 622670).

Given we normally only rebuild a slave once per month, disabling seems
unnecessary.

--
Stuart Bishop <email address hidden>
http://www.stuartbishop.net/

Revision history for this message
Robert Collins (lifeless) wrote :

This has been exacerbated by the fastdowntime process which interrupts all db connections when it is executed.

Revision history for this message
Curtis Hovey (sinzui) wrote :

I am making this critical because it the dupe was critical.

Changed in launchpad:
importance: High → Critical
Revision history for this message
Curtis Hovey (sinzui) wrote :

The script is now run once a week. It take between 2 and 3 days to complete. the log shows it is often blocked by "old xact process_death_row". While it sleeps for 10 minute, it is often blocked by for about 100 minutes to proceed. In one example, it was blocked for 8 hours.

Revision history for this message
Curtis Hovey (sinzui) wrote :

Maybe we can queue all the locked operations so that the loop can continue. The queued items could be tried after the non-locked were completed, or just ignored (assuming they will be processed in the next run)

Airkm (airkm)
information type: Public → Private
William Grant (wgrant)
information type: Private → Public
Changed in launchpad:
importance: Critical → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.