checkwatches doing an unreasonable amouht of db work

Bug #435952 reported by Stuart Bishop on 2009-09-24
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself

Bug Description

checkwatches is noticeably one of the higher consumers of database resources, often chewing up 30-70% of a core all on its own. This seems unreasonable given its job, and may indicate a sily bug or badly thought out piece of code.

I suspect something like:
 - missing join condition
 - unnecessary repetition of queries
 - suboptimal queries
 - retrieving lots of rows one row at a time

We should instrument it and work out wtf it is spending its time. LP_DEBUG_SQL or LP_DEBUG_SQL_EXTRA might provide enough information if we summarize it.

It it does need to do this much work, we should probably try and offload some or most of it to a slave database.

Stuart Bishop (stub) wrote :

Assigning to gmb because he said he would take a look.

This isn't a major production concern... yet ;)

Changed in malone:
assignee: nobody → Graham Binns (gmb)
importance: Undecided → Medium
status: New → Triaged
Graham Binns (gmb) on 2009-12-11
Changed in malone:
assignee: Graham Binns (gmb) → nobody
Stuart Bishop (stub) wrote :

This is now bringing the database to its knees, as checkwatches now has 6 connections open each hammering the database.

Increasing this to critical until we can reduce the number of database connections or fix the underlying performance problem.

Changed in malone:
importance: Medium → Critical
Tom Haddon (mthaddon) wrote :

Per a suggestion from gmb, we've applied to the crontab that runs checkwatches until this is fixed.

Stuart Bishop (stub) on 2010-02-02
Changed in malone:
importance: Critical → Medium
Gavin Panella (allenap) on 2010-04-20
tags: added: story-reliable-bug-syncing
Changed in malone:
importance: Medium → High
Tom Haddon (mthaddon) on 2010-05-28
tags: added: canonical-losa-lp
Gavin Panella (allenap) wrote :

The fix for bug 572211 should reduce db load a lot, I hope.

Robert Collins (lifeless) wrote :

Tom, could you please undo that workaround, so we can see how much allenap's change helped?

summary: - checkwatches doing an unreasonable amout of db work
+ checkwatches doing an unreasonable amouht of db work
Liam Young (gnuoy) wrote :

Reverted checkwatches to doing ten at a time (--jobs=10)

David Ames (thedac) wrote :

Reverted the revert. Now --jobs=1

William Grant (wgrant) wrote :

(It was getting pgkillactive'd again.)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers