POFile:+translate timeouts

Bug #711064 reported by Robert Collins
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Robert Collins

Bug Description

 35 SELECT TranslationMessage.comment, TranslationMessage.date_created, TranslationMessage.date_revie ... gstr3, -$INT), COALESCE(msgstr4, -$INT), COALESCE(msgstr5, -$INT), date_created DESC ) LIMIT $INT:
   GET: 33 POST: 2 Robots: 0 Local: 34
      7 https://translations.launchpad.net/leechcraft/trunk/+pots/bittorrent/pl/+translate (POFile:+translate)
       OOPS-1883C722, OOPS-1883D773, OOPS-1883E785, OOPS-1883G750, OOPS-1883O809
      4 https://translations.launchpad.net/tortoisehg/stable/+pots/tortoisehg/pt_BR/+translate (POFile:+translate)
       OOPS-1883D437, OOPS-1883E716, OOPS-1883K558, OOPS-1883L721
      3 https://translations.launchpad.net/chive/trunk/+pots/chive/fi/+translate (POFile:+translate)
       OOPS-1883D2125, OOPS-1883E2003, OOPS-1883L1714
   [15 other URLs]

OOPS-1883C722
Branch: launchpad-rev-12460
Revno: 12460
SQL time: 12696 ms
Non-sql time: 1575 ms
Total time: 14271 ms
Statement Count: 268

late evaluation for PotMsgSet, TranslationMessage, PoTranslation, PoTemplate, TranslationTemplateItem, Person, TranslationRelicensingAgreement, TranslationGroup, ProductSeries and Product

Major time sink is
9 6 6837 1139 5698 SQL-launchpad-main-master
SELECT TranslationMessage.COMMENT, TranslationMessage.date_created,
                                   TranslationMessage.date_reviewed,
                                   TranslationMessage.id,
                                   TranslationMessage.is_current_ubuntu,
                                   TranslationMessage.is_current_upstream,
                                   TranslationMessage.LANGUAGE, TranslationMessage.msgstr0,
                                                                TranslationMessage.msgstr1,
                                                                TranslationMessage.msgstr2,
                                                                TranslationMessage.msgstr3,
                                                                TranslationMessage.msgstr4,
                                                                TranslationMessage.msgstr5,
                                                                TranslationMessage.origin,
                                                                TranslationMessage.potemplate,
                                                                TranslationMessage.potmsgset,
                                                                TranslationMessage.reviewer,
                                                                TranslationMessage.submitter,
                                                                TranslationMessage.validation_status,
                                                                TranslationMessage.was_obsolete_in_last_import
FROM TranslationMessage
WHERE TranslationMessage.id IN
    (SELECT DISTINCT ON (COALESCE(msgstr0, -$INT), COALESCE(msgstr1, -$INT), COALESCE(msgstr2, -$INT), COALESCE(msgstr3, -$INT), COALESCE(msgstr4, -$INT), COALESCE(msgstr5, -$INT)) TranslationMessage.id
     FROM TranslationMessage
     WHERE (is_current_ubuntu IS TRUE
            OR is_current_upstream IS TRUE)
       AND TranslationMessage.LANGUAGE = $INT
       AND TranslationMessage.potmsgset <> $INT
       AND potmsgset IN
         (SELECT POTMsgSet.id
          FROM POTMsgSet
          JOIN TranslationTemplateItem ON TranslationTemplateItem.potmsgset = POTMsgSet.id
          JOIN SuggestivePOTemplate ON TranslationTemplateItem.potemplate = SuggestivePOTemplate.potemplate
          WHERE msgid_singular = $INT)
     ORDER BY COALESCE(msgstr0, -$INT), COALESCE(msgstr1, -$INT), COALESCE(msgstr2, -$INT), COALESCE(msgstr3, -$INT), COALESCE(msgstr4, -$INT), COALESCE(msgstr5, -$INT), date_created DESC)LIMIT $INT

plus a second slow repeated query.
10 6 3741 623 3118
6 instances of this - two look like:

SELECT TranslationMessage.COMMENT, TranslationMessage.date_created,
                                   TranslationMessage.date_reviewed,
                                   TranslationMessage.id,
                                   TranslationMessage.is_current_ubuntu,
                                   TranslationMessage.is_current_upstream,
                                   TranslationMessage.LANGUAGE, TranslationMessage.msgstr0, TranslationMessage.msgstr1, TranslationMessage.msgstr2, TranslationMessage.msgstr3, TranslationMessage.msgstr4, TranslationMessage.msgstr5, TranslationMessage.origin, TranslationMessage.potemplate, TranslationMessage.potmsgset, TranslationMessage.reviewer, TranslationMessage.submitter, TranslationMessage.validation_status, TranslationMessage.was_obsolete_in_last_import
FROM TranslationMessage
WHERE TranslationMessage.id IN
    (SELECT DISTINCT ON (COALESCE(msgstr0, -1), COALESCE(msgstr1, -1), COALESCE(msgstr2, -1), COALESCE(msgstr3, -1), COALESCE(msgstr4, -1), COALESCE(msgstr5, -1)) TranslationMessage.id
     FROM TranslationMessage
     WHERE (is_current_ubuntu IS TRUE
            OR is_current_upstream IS TRUE)
       AND TranslationMessage.LANGUAGE = 333
       AND TranslationMessage.potmsgset <> 9699798
       AND potmsgset IN
         (SELECT POTMsgSet.id
          FROM POTMsgSet
          JOIN TranslationTemplateItem ON TranslationTemplateItem.potmsgset = POTMsgSet.id
          JOIN SuggestivePOTemplate ON TranslationTemplateItem.potemplate = SuggestivePOTemplate.potemplate
          WHERE msgid_singular = 277516)
     ORDER BY COALESCE(msgstr0, -1), COALESCE(msgstr1, -1), COALESCE(msgstr2, -1), COALESCE(msgstr3, -1), COALESCE(msgstr4, -1), COALESCE(msgstr5, -1), date_created DESC)LIMIT 2001

SELECT TranslationMessage.COMMENT, TranslationMessage.date_created,
                                   TranslationMessage.date_reviewed,
                                   TranslationMessage.id,
                                   TranslationMessage.is_current_ubuntu,
                                   TranslationMessage.is_current_upstream,
                                   TranslationMessage.LANGUAGE, TranslationMessage.msgstr0, TranslationMessage.msgstr1, TranslationMessage.msgstr2, TranslationMessage.msgstr3, TranslationMessage.msgstr4, TranslationMessage.msgstr5, TranslationMessage.origin, TranslationMessage.potemplate, TranslationMessage.potmsgset, TranslationMessage.reviewer, TranslationMessage.submitter, TranslationMessage.validation_status, TranslationMessage.was_obsolete_in_last_import
FROM TranslationMessage
WHERE TranslationMessage.id IN
    (SELECT DISTINCT ON (COALESCE(msgstr0, -1), COALESCE(msgstr1, -1), COALESCE(msgstr2, -1), COALESCE(msgstr3, -1), COALESCE(msgstr4, -1), COALESCE(msgstr5, -1)) TranslationMessage.id
     FROM TranslationMessage
     WHERE (NOT (is_current_ubuntu IS TRUE
                 OR is_current_upstream IS TRUE))
       AND TranslationMessage.LANGUAGE = 333
       AND TranslationMessage.potmsgset <> 9699798
       AND potmsgset IN
         (SELECT POTMsgSet.id
          FROM POTMsgSet
          JOIN TranslationTemplateItem ON TranslationTemplateItem.potmsgset = POTMsgSet.id
          JOIN SuggestivePOTemplate ON TranslationTemplateItem.potemplate = SuggestivePOTemplate.potemplate
          WHERE msgid_singular = 277516)
     ORDER BY COALESCE(msgstr0, -1), COALESCE(msgstr1, -1), COALESCE(msgstr2, -1), COALESCE(msgstr3, -1), COALESCE(msgstr4, -1), COALESCE(msgstr5, -1), date_created DESC)LIMIT 2001

Tags: qa-ok timeout

Related branches

Revision history for this message
Robert Collins (lifeless) wrote :

I think we're seeing some cross-python-thread contention as well, but the sql time is undeniably high.

description: updated
description: updated
description: updated
description: updated
Revision history for this message
Robert Collins (lifeless) wrote :

The two detailed cases are equivalent but for one clause - true / not true. Better to grab everything and group in python. It then repeats that pattern with different

potmsgset and msgid_singular.

Revision history for this message
Robert Collins (lifeless) wrote :

Looks like _getExternalTranslationMessages is generating the query.

Revision history for this message
Robert Collins (lifeless) wrote :
Download full text (3.3 KiB)

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (cost=220441.10..222645.54 rows=200 width=76)
   -> Nested Loop (cost=220441.10..222645.54 rows=200 width=76)
         -> HashAggregate (cost=220441.10..220443.10 rows=200 width=4)
               -> Unique (cost=195224.76..218760.00 rows=134488 width=36)
                     -> Sort (cost=195224.76..198586.94 rows=1344871 width=36)
                           Sort Key: (COALESCE(public.translationmessage.msgstr0, (-1))), (COALESCE(public.translationmessage.msgstr1, (-1))), (COALESCE(public.translationmessage.msgstr2, (-1))), (COALESCE(public.translationmessage.msgstr3, (-1))), (COALESCE(public.translationmessage.msgstr4, (-1))), (COALESCE(public.translationmessage.msgstr5, (-1))), public.translationmessage.date_created
                           -> Nested Loop (cost=16161.93..26802.37 rows=1344871 width=36)
                                 -> HashAggregate (cost=16161.93..16170.79 rows=886 width=8)
                                       -> Hash Join (cost=535.77..16154.03 rows=3160 width=8)
                                             Hash Cond: (translationtemplateitem.potemplate = suggestivepotemplate.potemplate)
                                             -> Nested Loop (cost=16.44..15559.65 rows=3160 width=12)
                                                   -> Bitmap Heap Scan on potmsgset (cost=16.44..2420.12 rows=886 width=4)
                                                         Recheck Cond: (msgid_singular = 466)
                                                         -> Bitmap Index Scan on potmsgset_primemsgid_idx (cost=0.00..16.22 rows=886 width=0)
                                                               Index Cond: (msgid_singular = 466)
                                                   -> Index Scan using translationtemplateitem__potmsgset__idx on translationtemplateitem (cost=0.00..14.78 rows=4 width=8)
                                                         Index Cond: (translationtemplateitem.potmsgset = potmsgset.id)
                                             -> Hash (cost=278.59..278.59 rows=19259 width=4)
                                                   -> Seq Scan on suggestivepotemplate (cost=0.00..278.59 rows=19259 width=4)
                                 -> Index Scan using translationmessage__potmsgset__language__idx on translationmessage (cost=0.00..11.99 rows=1 width=40)
                                       Index Cond: ((public.translationmessage.potmsgset = translationtemplateitem.potmsgset) AND (public.translationmessage.language = 333))
                                       Filter: (((public.translationmessage.is_current_ubuntu IS TRUE) OR (public.translationmessage.is_current_upstream IS TRUE)) AND (public.translationmessage.potmsgset <> 9699799))
         -> Index S...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote :

note the sort in the middle - its expecting up to 1.3M rows, and materializing the full result set to do the sort.

Revision history for this message
Robert Collins (lifeless) wrote :

It actually finds (actual time=13597.388..13597.444 rows=212 loops=1)
rows.

It really does look like bringing it all in and massaging in python will be faster.

Revision history for this message
Robert Collins (lifeless) wrote :

Twin properties:
getExternallyUsedTranslationMessages
getExternallySuggestedTranslationMessages

are the entry points for the pairs of calls.

Revision history for this message
Robert Collins (lifeless) wrote :
Download full text (5.0 KiB)

modified combined query:
SELECT TranslationMessage.COMMENT, TranslationMessage.date_created,
                                   TranslationMessage.date_reviewed,
                                   TranslationMessage.id,
                                   TranslationMessage.is_current_ubuntu,
                                   TranslationMessage.is_current_upstream,
                                   TranslationMessage.LANGUAGE, TranslationMessage.msgstr0, TranslationMessage.msgstr1, TranslationMessage.msgstr2, TranslationMessage.msgstr3, TranslationMessage.msgstr4, TranslationMessage.msgstr5, TranslationMessage.origin, TranslationMessage.potemplate, TranslationMessage.potmsgset, TranslationMessage.reviewer, TranslationMessage.submitter, TranslationMessage.validation_status, TranslationMessage.was_obsolete_in_last_import
FROM TranslationMessage
WHERE TranslationMessage.id IN
    (SELECT DISTINCT ON (COALESCE(msgstr0, -1), COALESCE(msgstr1, -1), COALESCE(msgstr2, -1), COALESCE(msgstr3, -1), COALESCE(msgstr4, -1), COALESCE(msgstr5, -1)) TranslationMessage.id
     FROM TranslationMessage
     WHERE TranslationMessage.LANGUAGE = 333
       AND TranslationMessage.potmsgset <> 9699798
       AND potmsgset IN
         (SELECT POTMsgSet.id
          FROM POTMsgSet
          JOIN TranslationTemplateItem ON TranslationTemplateItem.potmsgset = POTMsgSet.id
          JOIN SuggestivePOTemplate ON TranslationTemplateItem.potemplate = SuggestivePOTemplate.potemplate
          WHERE msgid_singular = 277516)
     ORDER BY COALESCE(msgstr0, -1), COALESCE(msgstr1, -1), COALESCE(msgstr2, -1), COALESCE(msgstr3, -1), COALESCE(msgstr4, -1), COALESCE(msgstr5, -1), date_created DESC)LIMIT 2001

explain analyze on qastaging:

 Limit (cost=204880.26..207081.00 rows=200 width=76) (actual time=9.192..9.235 rows=3 loops=1)
   -> Nested Loop (cost=204880.26..207081.00 rows=200 width=76) (actual time=9.190..9.226 rows=3 loops=1)
         -> HashAggregate (cost=204880.26..204882.26 rows=200 width=4) (actual time=9.168..9.171 rows=3 loops=1)
               -> Unique (cost=179070.25..203159.58 rows=137654 width=36) (actual time=9.082..9.156 rows=3 loops=1)
                     -> Sort (cost=179070.25..182511.59 rows=1376533 width=36) (actual time=9.080..9.104 rows=67 loops=1)
                           Sort Key: (COALESCE(public.translationmessage.msgstr0, (-1))), (COALESCE(public.translationmessage.msgstr1, (-1))), (COALESCE(public.translationmessage.msgstr2, (-1))), (COALESCE(public.translationmessage.msgstr3, (-1))), (COALESCE(public.translationmessage.msgstr4, (-1))), (COALESCE(public.translationmessage.msgstr5, (-1))), public.translationmessage.date_created
                           Sort Method: quicksort Memory: 30kB
                           -> Nested Loop (cost=3982.62..6449.77 rows=1376533 width=36) (actual time=6.524..8.773 rows=67 loops=1)
                                 -> HashAggregate (cost=3982.62..3984.67 rows=205 width=8) (actual time=6.464..6.533 rows=138 loops=1)
                                       -> Nested Loop (cost=0.00..3980.79 rows=731 width=8) (actual time=0.117..6.173 rows=269 loops=1)
                                          ...

Read more...

Revision history for this message
Данило Шеган (danilo) wrote :

Yes, you've correctly found the slowest query on POFile:+translate page :) When it starts behaving very badly, we can turn it off using a config option "global_suggestions_enabled: True" (yes, sometimes it gets so slow that we had to have an option of turning it off for a few years now; for instance, we've disabled them with postgres 8.4 upgrade so we could optimize the query to work well with it). That should be turned into a feature flag.

However, as you also noted, the problem is that these queries are then repeated for every single message (different potmsgset). This is due to the way POFile:+translate view is constructed as a set of CurrentTranslationMessage:+translate views, which made pre-loading much harder. We've did some work on the side to refactor the +translate view, but never got far enough to actually allow it to do a smaller set of queries.

The ultimate goal should be for POFile:+translate page to do a single query on TranslationMessage table, which should make it's worst-case performance (with cold-caches) on the order of 5s (which you've seen). If we can't speed external suggestion lookup further, then we'd have to load external suggestions async.

(Note that the next stop after this would be to optimize all the "filtering" and "search" queries: they are the actual next biggest time sink after the external suggestions)

Revision history for this message
Данило Шеган (danilo) wrote :

Also, if it'll help anyone debug this, external suggestion fetching is mostly very slow for those English strings which appear often in different packages (like "File", "New" etc). In your example, you've got POMsgID 277516 which is "Date:" appearing in 234 POTMsgSets (on staging, at least). "File" is POMsgID 717 and it appears in 725 POTMsgSets, so if you use that msgid, I expect the query to be even slower. And it all works today simply because the chances of two very common messages appearing on a single 10-message batch are low.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 711064] Re: POFile:+translate timeouts

Thanks for the feedback Danilo, thats very helpful. I had another
question, which is that the browser code re-sorts these suggestions,
but the lower level code has a complex sort order (which impacts the
query efficiency). Do you know why both layers are sorting?

Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
Changed in launchpad:
assignee: nobody → Robert Collins (lifeless)
milestone: none → 11.03
tags: added: qa-needstesting
Changed in launchpad:
status: Triaged → Fix Committed
tags: added: qa-ok
removed: qa-needstesting
Revision history for this message
Данило Шеган (danilo) wrote :

I don't know from the top of my head, but glancing over it, it seems
likely that it's doing that to ensure better match gets picked up first
when we've got multiple almost-identical TMs in the DB (can happen
because our model is denormalized and is common with historical data,
especially due to historical bugs — I am pretending that we've got no
new bugs ;). "Almost-identical" would mean that they've got same plural
forms up to a number (i.e. msgstr0, msgstr1 are identical, one has a
msgstr2 and other has a NULL msgstr2).

Revision history for this message
Robert Collins (lifeless) wrote :

On Thu, Mar 3, 2011 at 6:02 AM, Данило Шеган <email address hidden> wrote:
> I don't know from the top of my head, but glancing over it, it seems
> likely that it's doing that to ensure better match gets picked up first
> when we've got multiple almost-identical TMs in the DB (can happen
> because our model is denormalized and is common with historical data,
> especially due to historical bugs — I am pretending that we've got no
> new bugs ;).  "Almost-identical" would mean that they've got same plural
> forms up to a number (i.e. msgstr0, msgstr1 are identical, one has a
> msgstr2 and other has a NULL msgstr2).

Ok; I still don't follow the double-differing-sort though. Or do we
have more than 2000 TM's and we want to get the 'best' of those 2000 ?

William Grant (wgrant)
Changed in launchpad:
status: Fix Committed → Fix Released
Revision history for this message
Данило Шеган (danilo) wrote :

У сре, 02. 03 2011. у 18:19 +0000, Robert Collins пише:
>
> Ok; I still don't follow the double-differing-sort though. Or do we
> have more than 2000 TM's and we want to get the 'best' of those 2000 ?

We used to have, but not with message sharing anymore (or at least, very
rarely). It'd be fine to move the ordering to a single place only,
though we'd have to make sure we don't break stuff in the process. The
ordering in the query might be there for the sake of tests as well
(though that's a very wild guess).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.