Too many queries issued in xx-pofile-translate-performance.txt

Bug #241394 reported by Jeroen T. Vermeulen
6
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
Low
Unassigned

Bug Description

The pagetest xx-pofile-translate-performance.txt started breaking today, in line 64: it seems the page issues more queries than expected.

Some timeout related oopses: OOPS-980E2229, OOPS-980EA21, OOPS-982B2121, OOPS-983A1046, OOPS-983E1925

Revision history for this message
Diogo Matsubara (matsubara) wrote :

Failing test has been disabled for now.

Changed in rosetta:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

If there is ever an actual problem here, we're more likely to spot it from the timeout reports.

Changed in rosetta:
importance: High → Medium
Ursula Junque (ursinha)
description: updated
Revision history for this message
Ursula Junque (ursinha) wrote :

Changing obsolete Confirmed status to Triaged and raising importance to High, since these timeout errors are happening often lately

Changed in rosetta:
importance: Medium → High
status: Confirmed → Triaged
Revision history for this message
Данило Шеган (danilo) wrote :

Hi Ursula, thanks for cleaning up the status here.

However, I'm lowering priority on this one, and I'll explain why: this bug is not about actual performance issues we have in LP Translations (for historical perspective, you may want to look at bug 30602). It's about a single test that was made during a time when it was essential to keep the number of queries down under a certain fixed threshold (we have since done a lot of DB redesign, improving performance, and then later we got flashy new hardware as well). Basically, a test was just made to fail if we inadvertently mess up. It had to be disabled because Storm migration brought some changes in the query count, and we didn't feel it important enough to fix right away.

I've looked through a few latest Edge and Lpnet OOPS reports, and it seems we are only triggering SoftTimeouts. Around a year ago we regularly got around a few thousand hard timeouts on +translate pages every day, and getting a few of them these days is a relatively positive outcome: we are not yet ready to consider soft timeouts a real problem. However, note that we are doing some more work to improve our performance further, but a "bug" about that might be slightly ill-placed (i.e. this is something that's really hard to quantify, especially with the amount of data we have, and Translations team). To see details about it, check https://blueprints.edge.launchpad.net/rosetta/+spec/message-sharing

And other things are happening as well (i.e. we are introducing DB replication as well, which will help with this problem too).

I hope this clarifies some bits here.

Changed in rosetta:
importance: High → Medium
Changed in rosetta:
importance: Medium → Low
Revision history for this message
Ursula Junque (ursinha) wrote :

Thanks for the clarification Danilo. But since the hard timeouts are happening very often, there is a bug for it, bug 271268, to keep track of these occurrences. In there I've placed the recent oopses, including the ones I was able to reproduce myself.

Revision history for this message
Robert Collins (lifeless) wrote :

This bug seems to be about a disabled test? So its not actually tracking timeouts - removing the timeout tag. I suggest closing the bug and deleting the test based on Danilo's comments.

tags: removed: timeout
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.