Recife migration script unusably slow
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| Launchpad itself |
High
|
Jeroen T. Vermeulen |
Bug Description
The migrate-
The problem seems to be that all our indexes are partial. The ideal index for this script would probably be one on TranslationMess
As it is, we'll have to try and speed up the script without index changes.
Related branches
- Henning Eggers (community): Approve (code) on 2010-11-30
-
Diff: 14 lines (+2/-2)1 file modifiedlib/lp/translations/scripts/migrate_current_flag.py (+2/-2)
Jeroen T. Vermeulen (jtv) wrote : | #1 |
Jeroen T. Vermeulen (jtv) wrote : | #2 |
Got a test run prepared for Tom to execute in a few minutes.
tags: | added: recife |
Changed in rosetta: | |
status: | New → In Progress |
importance: | Undecided → Critical |
assignee: | nobody → Jeroen T. Vermeulen (jtv) |
milestone: | none → 10.12 |
Changed in rosetta: | |
importance: | Critical → High |
tags: |
added: upstream-translations-sharing removed: recife |
Jeroen T. Vermeulen (jtv) wrote : | #3 |
Working around the Storm bug did fix things, but the script is still not as fast as we'd like: It kicked off at a rate that suggested it would complete in a bit over 7 hours, but then fell asleep to accommodate (AIUI) replication lag.
I expected a lot of the time to go into finding current translationmessages that need to be deactivated to "make room" for the newly activated ones, but it looks to be only a fraction of the time spent. It's not clear to me where the time does go—unless it's the updating and replication itself, in which case there's not much more we can do.
Данило Шеган (danilo) wrote : | #4 |
It's ok if the first run of the script takes eg. the entire weekend. It will progressively have less data to process, and that's exactly what we need to aim for. Since it's DBLoopTuner based, we do need to make sure that no slaves are being rebuilt at the time because that will completely stall the script.
Do note that TranslationMessage constraints are slow to check, so that might be why updating is slow.
The first run can basically take up whatever time it takes before the rollout. If we start the script on Friday, it means 4 full days, and that should be enough. Then, along with the roll-out, we can do another much shorter run while LP is read-only (we should time it before the roll-out to make sure it runs in eg. less than 5 minutes, which I expect it will).
Jeroen T. Vermeulen (jtv) wrote : | #5 |
Thanks for the explanation. I was mostly disappointed at performance (after the fix) because of your references to "a few minutes." Now I understand that that would be just an incremental "patch-up" run after a prior migration of the bulk of the data.
Fixed in stable r12007 <http://
tags: | added: qa-needstesting |
Changed in rosetta: | |
status: | In Progress → Fix Committed |
Launchpad QA Bot (lpqabot) wrote : | #7 |
Fixed in db-stable r10023 <http://
tags: |
added: qa-ok removed: qa-needstesting |
Changed in rosetta: | |
status: | Fix Committed → Fix Released |
The real culprit may be Storm bug 682989.