Search timeouts

Bug #371609 reported by Jeroen T. Vermeulen on 2009-05-04
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Launchpad itself
High
Данило Шеган

Bug Description

We're seeing some timeouts on POFile:+translate pages, coming from translations search. Message sharing introduced an extra join into the search query.

The query is huge, and the plan atrocious. Surprisingly, the unacceptable costs all seem to be in the POTMsgSet search—not in the translations search. There are several seq scans on potmsgset, all inside nested loops. The subplan that identifies the POTMsgSets for the incumbent template is not materialized and reused.

Low-hanging fruit:
 * Materialize the scan for (non-obsolete) POTMsgSets in the template, instead of duplicating it. (Saves one POTMsgSet seq scan and some bitmap scans on TranslationTemplateItem, and removes the rest out of the loop).
 * When checking "msgid_{singular,plural} IN (...)," don't check for "msgid_{singular,plural} IS NOT NULL" as well. (One seq scan on POTMsgSet happens there).
 * The POMsgID seq scan for matches is slow. For plurals, the plan first limits the search to ones belonging to relevant POTMsgSets which makes it much faster. See if we can get the same in the plan for singulars, or if we can't, either materialize into a temp table or replace the union of the msgid_singular and msgid_plural subqueries with a single query (using either OR or a double LEFT JOIN).

OOPS-1360G1367 OOPS-1360C2163 OOPS-1360F1125

У пон, 04. 05 2009. у 11:25 +0000, Jeroen T. Vermeulen пише:
>
> * When checking "msgid_{singular,plural} IN (...)," don't check for
> "msgid_{singular,plural} IS NOT NULL" as well. (One seq scan on
> POTMsgSet happens there).

We can probably avoid seqscans just with clause reordering. That's what
usually helped.

> * The POMsgID seq scan for matches is slow. For plurals, the plan
> first limits the search to ones belonging to relevant POTMsgSets which
> makes it much faster. See if we can get the same in the plan for
> singulars, or if we can't, either materialize into a temp table or
> replace the union of the msgid_singular and msgid_plural subqueries
> with a single query (using either OR or a double LEFT JOIN).

Actually, it used to be a LEFT JOIN, but UNION performed much better on
staging during initial testing, which is why I rewrote the query as a
UNION.

Considering we are seeing only a few of these, I wouldn't consider this
very important unless we get >10 timeouts a day.

tags: added: timeout

У пон, 04. 05 2009. у 11:25 +0000, Jeroen T. Vermeulen пише:
>
> * When checking "msgid_{singular,plural} IN (...)," don't check for
> "msgid_{singular,plural} IS NOT NULL" as well. (One seq scan on
> POTMsgSet happens there).

We can probably avoid seqscans just with clause reordering. That's what
usually helped.

> * The POMsgID seq scan for matches is slow. For plurals, the plan
> first limits the search to ones belonging to relevant POTMsgSets which
> makes it much faster. See if we can get the same in the plan for
> singulars, or if we can't, either materialize into a temp table or
> replace the union of the msgid_singular and msgid_plural subqueries
> with a single query (using either OR or a double LEFT JOIN).

Actually, it used to be a LEFT JOIN, but UNION performed much better on
staging during initial testing, which is why I rewrote the query as a
UNION.

Considering we are seeing only a few of these, I wouldn't consider this
very important unless we get >10 timeouts a day.

Changed in rosetta:
importance: Undecided → Medium
status: New → Triaged

I've seen exactly what Jeroen has reported as well: most time is spent in matching on msgid_singular. This bit of the query needs optimization:

SELECT POTMsgSet.id FROM POTMsgSet JOIN TranslationTemplateItem ON TranslationTemplateItem.potmsgset=POTMsgSet.id AND TranslationTemplateItem.potemplate=702 WHERE (POTMsgSet.msgid_singular IS NOT NULL AND POTMsgSet.msgid_singular IN ( SELECT POMsgID.id FROM POMsgID WHERE id IN ( SELECT DISTINCT(msgid_singular) FROM POTMsgSet JOIN TranslationTemplateItem ON TranslationTemplateItem.potmsgset = POTMsgSet.id WHERE TranslationTemplateItem.potemplate=702 AND TranslationTemplateItem.sequence > 0 ) AND msgid ILIKE '%' || 'Media Management and Playback application' || '%'));

Changed in rosetta:
milestone: none → 3.0
importance: Medium → High
description: updated
Changed in rosetta:
assignee: nobody → Данило Шеган (danilo)
Download full text (4.0 KiB)

Trying to execute the above query results in different query plans on staging and production. Surprisingly, though, it runs in 2s on staging, but 8s on production.

Staging run is suboptimal because it does a full seqscan on pomsgid using ILIKE (subselects above are designed exactly to avoid such behaviour), and the production run does a full seqscan on potmsgset table.

explain analyze
SELECT POTMsgSet.id
  FROM POTMsgSet
  JOIN TranslationTemplateItem
    ON TranslationTemplateItem.potmsgset=POTMsgSet.id
       AND TranslationTemplateItem.potemplate=702
  WHERE (
    POTMsgSet.msgid_singular IS NOT NULL
    AND POTMsgSet.msgid_singular IN (
      SELECT POMsgID.id FROM POMsgID
        WHERE
        id IN (
          SELECT (msgid_singular)
            FROM POTMsgSet
            JOIN TranslationTemplateItem
              ON TranslationTemplateItem.potmsgset = POTMsgSet.id
            WHERE TranslationTemplateItem.potemplate=702
                  AND TranslationTemplateItem.sequence > 0 )
        AND msgid ILIKE '%' || 'Media Management and Playback application' || '%'));
                                                                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop (cost=253887.25..254114.66 rows=1 width=4) (actual time=9017.276..9017.465 rows=5 loops=1)
   -> Nested Loop (cost=253887.25..254102.16 rows=16 width=4) (actual time=9017.219..9017.320 rows=5 loops=1)
         -> HashAggregate (cost=253887.25..253887.30 rows=5 width=8) (actual time=9017.088..9017.093 rows=5 loops=1)
               -> Hash IN Join (cost=222355.53..253887.24 rows=5 width=8) (actual time=8773.518..9017.060 rows=5 loops=1)
                     Hash Cond: (pomsgid.id = public.potmsgset.msgid_singular)
                     -> Seq Scan on pomsgid (cost=0.00..31346.40 rows=127 width=4) (actual time=1352.051..1595.569 rows=6 loops=1)
                           Filter: (msgid ~~* '%Media Management and Playback application%'::text)
                     -> Hash (cost=221744.94..221744.94 rows=48847 width=4) (actual time=7421.417..7421.417 rows=51630 loops=1)
                           -> Hash Join (cost=182046.18..221744.94 rows=48847 width=4) (actual time=6678.880..7377.252 rows=51630 loops=1)
                                 Hash Cond: (public.translationtemplateitem.potmsgset = public.potmsgset.id)
                                 -> Bitmap Heap Scan on translationtemplateitem (cost=1116.28..23893.98 rows=48847 width=4) (actual time=7.308..41.628 rows=51630 loops=1)
                                       Recheck Cond: ((potemplate = 702) AND (sequence > 0))
                                       -> Bitmap Index Scan on translationtemplateitem__potemplate__sequence__key (cost=0.00..1104.07 rows=48847 width=0) (actual time=7.255..7.255 rows=51630 loops=1)
                                             Index Cond: ((potemplate = 702) AND (seq...

Read more...

ok, the rendering above is not working very well; see http://paste.ubuntu.com/275968/

On Tue, Sep 22, 2009 at 11:58 PM, Данило Шеган <email address hidden> wrote:
> ok, the rendering above is not working very well; see
> http://paste.ubuntu.com/275968/

I think the following is a much simpler and faster variant of the above query:

SELECT PotMsgSet.id
FROM
    PotMsgSet,
    TranslationTemplateItem,
    PoMsgId AS SingPoMsgId,
    POTMsgSet AS SingPotMsgSet,
    TranslationTemplateItem AS SingTTI
WHERE
    PotMsgSet.id = TranslationTemplateItem.potmsgset
    AND TranslationTemplateItem.potemplate=702
    AND SingPoMsgId.id = PotMsgSet.msgid_singular
    AND SingPoMsgId.id = SingPotMsgSet.msgid_singular
    AND SingTTI.potmsgset = SingPotMsgSet.id
    AND SingTTI.potemplate=702
    AND SingTTI.sequence > 0
    AND lower(SingPoMsgId.msgid) LIKE
        '%' || lower('Media Management and Playback application') || '%'
ORDER BY PotMsgSet.id;

--
Stuart Bishop <email address hidden>
http://www.stuartbishop.net/

Stuart Bishop (stub) wrote :

(Without the order by of course)

У сре, 23. 09 2009. у 09:31 +0000, Stuart Bishop пише:
>
> AND lower(SingPoMsgId.msgid) LIKE
> '%' || lower('Media Management and Playback application') ||
> '%'

Strangely enough, this is exactly the bit that makes it use a better
query plan on staging. If I change that back to SingPOMsgId.msgid ILIKE
'%' ..., I get ~8s runtime. With the lower() variant, it runs in ~2s
which is acceptable. Also, initial run on staging for matching 'Media'
took 60s, so it seems it's more prone to caching than...

With a materialized view, I was able to go down to 1.3s on staging for
even bigger result sets — matching on 'da' (all with explain analyze,
which does introduce significant overhead).

Changed in rosetta:
status: Triaged → In Progress
Changed in rosetta:
milestone: 3.0 → 3.1.10

After configuration changes, this is working sufficiently well. However, if it reappears, we can open a new bug and just go ahead with the materialized views which will bring performance improvements anyway.

Changed in rosetta:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers