Search timeouts
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| Launchpad itself |
High
|
Данило Шеган |
Bug Description
We're seeing some timeouts on POFile:+translate pages, coming from translations search. Message sharing introduced an extra join into the search query.
The query is huge, and the plan atrocious. Surprisingly, the unacceptable costs all seem to be in the POTMsgSet search—not in the translations search. There are several seq scans on potmsgset, all inside nested loops. The subplan that identifies the POTMsgSets for the incumbent template is not materialized and reused.
Low-hanging fruit:
* Materialize the scan for (non-obsolete) POTMsgSets in the template, instead of duplicating it. (Saves one POTMsgSet seq scan and some bitmap scans on TranslationTemp
* When checking "msgid_
* The POMsgID seq scan for matches is slow. For plurals, the plan first limits the search to ones belonging to relevant POTMsgSets which makes it much faster. See if we can get the same in the plan for singulars, or if we can't, either materialize into a temp table or replace the union of the msgid_singular and msgid_plural subqueries with a single query (using either OR or a double LEFT JOIN).
OOPS-1360G1367 OOPS-1360C2163 OOPS-1360F1125
tags: | added: timeout |
Данило Шеган (danilo) wrote : | #2 |
У пон, 04. 05 2009. у 11:25 +0000, Jeroen T. Vermeulen пише:
>
> * When checking "msgid_
> "msgid_
> POTMsgSet happens there).
We can probably avoid seqscans just with clause reordering. That's what
usually helped.
> * The POMsgID seq scan for matches is slow. For plurals, the plan
> first limits the search to ones belonging to relevant POTMsgSets which
> makes it much faster. See if we can get the same in the plan for
> singulars, or if we can't, either materialize into a temp table or
> replace the union of the msgid_singular and msgid_plural subqueries
> with a single query (using either OR or a double LEFT JOIN).
Actually, it used to be a LEFT JOIN, but UNION performed much better on
staging during initial testing, which is why I rewrote the query as a
UNION.
Considering we are seeing only a few of these, I wouldn't consider this
very important unless we get >10 timeouts a day.
Changed in rosetta: | |
importance: | Undecided → Medium |
status: | New → Triaged |
Данило Шеган (danilo) wrote : | #3 |
I've seen exactly what Jeroen has reported as well: most time is spent in matching on msgid_singular. This bit of the query needs optimization:
SELECT POTMsgSet.id FROM POTMsgSet JOIN TranslationTemp
Changed in rosetta: | |
milestone: | none → 3.0 |
importance: | Medium → High |
description: | updated |
Changed in rosetta: | |
assignee: | nobody → Данило Шеган (danilo) |
Данило Шеган (danilo) wrote : | #4 |
Trying to execute the above query results in different query plans on staging and production. Surprisingly, though, it runs in 2s on staging, but 8s on production.
Staging run is suboptimal because it does a full seqscan on pomsgid using ILIKE (subselects above are designed exactly to avoid such behaviour), and the production run does a full seqscan on potmsgset table.
explain analyze
SELECT POTMsgSet.id
FROM POTMsgSet
JOIN TranslationTemp
ON TranslationTemp
AND TranslationTemp
WHERE (
POTMsgSet.
AND POTMsgSet.
SELECT POMsgID.id FROM POMsgID
WHERE
id IN (
SELECT (msgid_singular)
FROM POTMsgSet
JOIN TranslationTemp
ON TranslationTemp
WHERE TranslationTemp
AND msgid ILIKE '%' || 'Media Management and Playback application' || '%'));
-------
Nested Loop (cost=253887.
-> Nested Loop (cost=253887.
-> HashAggregate (cost=253887.
-> Hash IN Join (cost=222355.
Данило Шеган (danilo) wrote : | #5 |
ok, the rendering above is not working very well; see http://
On Tue, Sep 22, 2009 at 11:58 PM, Данило Шеган <email address hidden> wrote:
> ok, the rendering above is not working very well; see
> http://
I think the following is a much simpler and faster variant of the above query:
SELECT PotMsgSet.id
FROM
PotMsgSet,
Translation
PoMsgId AS SingPoMsgId,
POTMsgSet AS SingPotMsgSet,
Translation
WHERE
PotMsgSet.id = TranslationTemp
AND TranslationTemp
AND SingPoMsgId.id = PotMsgSet.
AND SingPoMsgId.id = SingPotMsgSet.
AND SingTTI.potmsgset = SingPotMsgSet.id
AND SingTTI.
AND SingTTI.sequence > 0
AND lower(SingPoMsg
'%' || lower('Media Management and Playback application') || '%'
ORDER BY PotMsgSet.id;
--
Stuart Bishop <email address hidden>
http://
Stuart Bishop (stub) wrote : | #7 |
(Without the order by of course)
Данило Шеган (danilo) wrote : | #8 |
У сре, 23. 09 2009. у 09:31 +0000, Stuart Bishop пише:
>
> AND lower(SingPoMsg
> '%' || lower('Media Management and Playback application') ||
> '%'
Strangely enough, this is exactly the bit that makes it use a better
query plan on staging. If I change that back to SingPOMsgId.msgid ILIKE
'%' ..., I get ~8s runtime. With the lower() variant, it runs in ~2s
which is acceptable. Also, initial run on staging for matching 'Media'
took 60s, so it seems it's more prone to caching than...
With a materialized view, I was able to go down to 1.3s on staging for
even bigger result sets — matching on 'da' (all with explain analyze,
which does introduce significant overhead).
Changed in rosetta: | |
status: | Triaged → In Progress |
Changed in rosetta: | |
milestone: | 3.0 → 3.1.10 |
Данило Шеган (danilo) wrote : | #9 |
After configuration changes, this is working sufficiently well. However, if it reappears, we can open a new bug and just go ahead with the materialized views which will bring performance improvements anyway.
Changed in rosetta: | |
status: | In Progress → Fix Released |
У пон, 04. 05 2009. у 11:25 +0000, Jeroen T. Vermeulen пише: {singular, plural} IN (...)," don't check for {singular, plural} IS NOT NULL" as well. (One seq scan on
>
> * When checking "msgid_
> "msgid_
> POTMsgSet happens there).
We can probably avoid seqscans just with clause reordering. That's what
usually helped.
> * The POMsgID seq scan for matches is slow. For plurals, the plan
> first limits the search to ones belonging to relevant POTMsgSets which
> makes it much faster. See if we can get the same in the plan for
> singulars, or if we can't, either materialize into a temp table or
> replace the union of the msgid_singular and msgid_plural subqueries
> with a single query (using either OR or a double LEFT JOIN).
Actually, it used to be a LEFT JOIN, but UNION performed much better on
staging during initial testing, which is why I rewrote the query as a
UNION.
Considering we are seeing only a few of these, I wouldn't consider this
very important unless we get >10 timeouts a day.