poimport exhausts disk space on db server

Bug #408718 reported by Stuart Bishop
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Данило Шеган

Bug Description

In under 2 hours, poimport managed to cause the PostgreSQL database to chew up over 150GB of disk space. The process was killed before it could exhaust one of the partitions entirely.

I have the start of the query that was currently being executed:

SELECT
    POMsgId.msgid AS msgid,
    POMsgID_Plural.msgid AS msgid_plural,
    context,
    date_reviewed,
    is_current,
    is_imported,
    pt0.translation AS translation0,pt1.translation AS translation1,pt2.translation AS translation2,pt3.translation AS translation3,pt4.translation AS translation4,pt5.translation AS translation5
  FROM POTMsgSet
    JOIN TranslationTemplateItem ON
      TranslationTemplateItem.potmsgset=POTMsgSet.id
    JOIN POTemplate ON
      POTemplate.id=TranslationTemplateItem.potemplate
    JOIN POFile ON
      POFile.potemplate=POTemplate.id AND
      POFile.id=1118874
    JOIN TranslationMessage ON
      POTMsgSet.id=TranslationMessage.potmsgset AND
      (TranslationMessage.potemplate=POTemplate.id OR
       TranslationMessage.potemplate IS NULL) AND
      POFile.language=TranslationMessage.language AND

I don't have the rest of the query, so I can't investigate further. The disk space will have been chewed up by temporary tables, either explicitly created by poimport or implicitly as part of this query so hold intermediary results. I don't know if the query is broken, if there is something about this particular import that caused too many results to be generated, or if a really bad query plan was chosen by chance.

Related branches

Revision history for this message
Stuart Bishop (stub) wrote :

First step is to try to engineer the actual query being executed.

Changed in rosetta:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

The query comes from lp.translations.utilities.translation_import, the method being ExistingPOFileInDatabase._fetchDBRows.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

Here's what the query would look like:

SELECT
    POMsgId.msgid AS msgid,
    POMsgID_Plural.msgid AS msgid_plural,
    context,
    date_reviewed,
    is_current,
    is_imported,
    pt0.translation AS translation0,
    pt1.translation AS translation1,
    pt2.translation AS translation2,
    pt3.translation AS translation3,
    pt4.translation AS translation4,
    pt5.translation AS translation5
FROM POTMsgSet
JOIN TranslationTemplateItem ON
    TranslationTemplateItem.potmsgset=POTMsgSet.id
JOIN POTemplate ON
    POTemplate.id=TranslationTemplateItem.potemplate
JOIN POFile ON
    POFile.potemplate=POTemplate.id AND
    POFile.id=%(pofile)s
JOIN TranslationMessage ON
    POTMsgSet.id=TranslationMessage.potmsgset AND
    (TranslationMessage.potemplate=POTemplate.id OR
     TranslationMessage.potemplate IS NULL) AND
    POFile.language=TranslationMessage.language AND
    POFile.variant IS NOT DISTINCT FROM TranslationMessage.variant
LEFT OUTER JOIN POTranslation pt0 ON pt0.id = TranslationMessage.msgstr0
LEFT OUTER JOIN POTranslation pt1 ON pt1.id = TranslationMessage.msgstr1
LEFT OUTER JOIN POTranslation pt2 ON pt2.id = TranslationMessage.msgstr2
LEFT OUTER JOIN POTranslation pt3 ON pt3.id = TranslationMessage.msgstr3
LEFT OUTER JOIN POTranslation pt4 ON pt4.id = TranslationMessage.msgstr4
LEFT OUTER JOIN POTranslation pt5 ON pt5.id = TranslationMessage.msgstr5
JOIN POMsgID ON
    POMsgID.id=POTMsgSet.msgid_singular
LEFT OUTER JOIN POMsgID AS POMsgID_Plural ON
    POMsgID_Plural.id=POTMsgSet.msgid_plural
WHERE (is_current IS TRUE OR is_imported IS TRUE)
ORDER BY
    TranslationTemplateItem.sequence,
    TranslationMessage.potemplate NULLS LAST

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

The POTemplate and POFile are fixed and known when the query is constructed, so we can strip those out of the query:

SELECT
    POMsgId.msgid AS msgid,
    POMsgID_Plural.msgid AS msgid_plural,
    context,
    date_reviewed,
    is_current,
    is_imported,
    pt0.translation AS translation0,
    pt1.translation AS translation1,
    pt2.translation AS translation2,
    pt3.translation AS translation3,
    pt4.translation AS translation4,
    pt5.translation AS translation5
FROM POTMsgSet
JOIN TranslationTemplateItem ON
    TranslationTemplateItem.potmsgset = POTMsgSet.id AND
    TranslationTemplateItem.potemplate = %(potemplate)s
JOIN TranslationMessage ON
    POTMsgSet.id=TranslationMessage.potmsgset AND
    (TranslationMessage.potemplate = %(potemplate)s OR
     TranslationMessage.potemplate IS NULL) AND
    TranslationMessage.language = %(language)s AND
    TranslationMessage.variant IS NOT DISTINCT FROM %(variant)s
LEFT OUTER JOIN POTranslation pt0 ON pt0.id = TranslationMessage.msgstr0
LEFT OUTER JOIN POTranslation pt1 ON pt1.id = TranslationMessage.msgstr1
LEFT OUTER JOIN POTranslation pt2 ON pt2.id = TranslationMessage.msgstr2
LEFT OUTER JOIN POTranslation pt3 ON pt3.id = TranslationMessage.msgstr3
LEFT OUTER JOIN POTranslation pt4 ON pt4.id = TranslationMessage.msgstr4
LEFT OUTER JOIN POTranslation pt5 ON pt5.id = TranslationMessage.msgstr5
JOIN POMsgID ON
    POMsgID.id=POTMsgSet.msgid_singular
LEFT OUTER JOIN POMsgID AS POMsgID_Plural ON
    POMsgID_Plural.id=POTMsgSet.msgid_plural
WHERE (is_current IS TRUE OR is_imported IS TRUE)
ORDER BY
    TranslationTemplateItem.sequence,
    TranslationMessage.potemplate NULLS LAST

Revision history for this message
Stuart Bishop (stub) wrote :

We have failed to replicate a slow or wasteful query.

We are retrying recent failed imports to see if the problem can be reproduced. It might have been a glitch caused by bad planner statistics - if we cannot reproduce, we should bump up the default statistics sample size.

Revision history for this message
Stuart Bishop (stub) wrote :

I'm going to bump up the statistics, at which point I'll close this bug.

Changed in rosetta:
assignee: nobody → Stuart Bishop (stub)
milestone: none → 2.2.8
Stuart Bishop (stub)
Changed in rosetta:
status: Triaged → Fix Released
Revision history for this message
Данило Шеган (danilo) wrote :

Ok, so we've seen this again with exactly the same query. This is what I want to do to fix this:

 1. introduce a timeout on this query of about 5 minutes
 2. after that, abort the transaction, restart it, and go on with life without super-fast-imports

To be able to test this in any meaningful way possible, I'll probably make timeout value a config option.

Changed in rosetta:
assignee: Stuart Bishop (stub) → Данило Шеган (danilo)
milestone: 2.2.8 → 3.0
status: Fix Released → In Progress
Revision history for this message
Данило Шеган (danilo) wrote :

The query we've seen it again on was:

SELECT
    POMsgId.msgid AS msgid,
    POMsgID_Plural.msgid AS msgid_plural,
    context,
    date_reviewed,
    is_current,
    is_imported,
    pt0.translation AS translation0,
    pt1.translation AS translation1,
    pt2.translation AS translation2,
    pt3.translation AS translation3,
    pt4.translation AS translation4,
    pt5.translation AS translation5
FROM POTMsgSet
JOIN TranslationTemplateItem ON
    TranslationTemplateItem.potmsgset=POTMsgSet.id
JOIN POTemplate ON
    POTemplate.id=TranslationTemplateItem.potemplate
JOIN POFile ON
    POFile.potemplate=POTemplate.id AND
    POFile.id=1045953
JOIN TranslationMessage ON
    POTMsgSet.id=TranslationMessage.potmsgset AND
    (TranslationMessage.potemplate=POTemplate.id OR
     TranslationMessage.potemplate IS NULL) AND
    POFile.language=TranslationMessage.language AND
    POFile.variant IS NOT DISTINCT FROM TranslationMessage.variant
LEFT OUTER JOIN POTranslation pt0 ON pt0.id = TranslationMessage.msgstr0
LEFT OUTER JOIN POTranslation pt1 ON pt1.id = TranslationMessage.msgstr1
LEFT OUTER JOIN POTranslation pt2 ON pt2.id = TranslationMessage.msgstr2
LEFT OUTER JOIN POTranslation pt3 ON pt3.id = TranslationMessage.msgstr3
LEFT OUTER JOIN POTranslation pt4 ON pt4.id = TranslationMessage.msgstr4
LEFT OUTER JOIN POTranslation pt5 ON pt5.id = TranslationMessage.msgstr5
JOIN POMsgID ON
    POMsgID.id=POTMsgSet.msgid_singular
LEFT OUTER JOIN POMsgID AS POMsgID_Plural ON
    POMsgID_Plural.id=POTMsgSet.msgid_plural
WHERE (is_current IS TRUE OR is_imported IS TRUE)
ORDER BY
    TranslationTemplateItem.sequence,
    TranslationMessage.potemplate NULLS LAST;

Revision history for this message
Данило Шеган (danilo) wrote :

poimport is stopped for the time being until we fix this.

Changed in rosetta:
importance: High → Critical
Revision history for this message
Данило Шеган (danilo) wrote :

Fix landed in 9429 (that kills the query if it takes more than 5 minutes).

Changed in rosetta:
status: In Progress → Fix Committed
security vulnerability: yes → no
visibility: private → public
Changed in rosetta:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.