Distribution:+bugs search has worrying overhead due to package name matching

Reported by Robert Collins on 2010-07-20
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Critical
Robert Collins

Bug Description

The big problem is the ILIKE '%foo%' substring search on BugTask.targetnamecache. On production, it consistently takes a little over a second. At the moment, we don't have infrastructure to do substring searches against this much data quickly enough so if we need a quick fix we need to drop this clause, relying on the full text query.

If we can't drop this clause, we will need a new index type like Wildspeed (http://www.sai.msu.su/~megera/wiki/wildspeed -- written by the same people who wrote tsearch2), trigrams (in postgresql-contrib), or NIH it and roll our own using a pair of tables and standard indexes for prefix searches and more complex queries.

This was originally filed as a fti bug, but was largely a dupe/operational issue; the discussion is worthwhile though in the context of ILIKE, so I've repurposed the bug to keep that discussion handy.

https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1662EA433

SQL time: 13899 ms
Non-sql time: 646 ms
Total time: 14545 ms
Statement Count: 29

https://bugs.launchpad.net/ubuntu/+bugs?field.searchtext=libbluray&orderby=-importance&search=Search&field.status%3Alist=NEW&field.status%3Alist=INCOMPLETE_WITH_RESPONSE&field.status%3Alist=INCOMPLETE_WITHOUT_RESPONSE&field.status%3Alist=CONFIRMED&field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.status%3Alist=FIXCOMMITTED&field.assignee=&field.bug_reporter=&field.omit_dupes=on&field.has_patch=&field.has_no_package=

Distribution:+filebug-show-similar is also showing up as a slow performer in the PPR and this may have the same root cause (see bug 618380)

Related branches

lp:~lifeless/launchpad/bug-607960
Merged into lp:launchpad at revision 12374
Robert Collins (community): Approve on 2011-02-14
Robert Collins (lifeless) wrote :
Download full text (5.8 KiB)

slow query:
 -- SELECT BugTask.assignee, BugTask.bug, BugTask.bugwatch, BugTask.date_assigned, BugTask.date_closed, BugTask.date_confirmed, BugTask.date_fix_committed, BugTask.date_fix_released, BugTask.date_incomplete, BugTask.date_inprogress, BugTask.date_left_closed, BugTask.date_left_new, BugTask.date_triaged, BugTask.datecreated, BugTask.distribution, BugTask.distroseries, BugTask.id, BugTask.importance, BugTask.milestone, BugTask.owner, BugTask.product, BugTask.productseries, BugTask.sourcepackagename, BugTask.status, BugTask.statusexplanation, BugTask.targetnamecache, "_prejoin1".owner, "_prejoin1".active, "_prejoin1".autoupdate, "_prejoin1".bug_reported_acknowledgement, "_prejoin1".bug_reporting_guidelines, "_prejoin1".bug_supervisor, "_prejoin1".bugtracker, "_prejoin1".date_next_suggest_packaging, "_prejoin1".datecreated, "_prejoin1".description, "_prejoin1".development_focus, "_prejoin1".displayname, "_prejoin1".downloadurl, "_prejoin1".driver, "_prejoin1".enable_bug_expiration, "_prejoin1".freshmeatproject, "_prejoin1".homepage_content, "_prejoin1".homepageurl, "_prejoin1".icon, "_prejoin1".id, "_prejoin1".lastdoap, "_prejoin1".license_approved, "_prejoin1".license_info, "_prejoin1".reviewed, "_prejoin1".logo, "_prejoin1".max_bug_heat, "_prejoin1".mugshot, "_prejoin1".name, "_prejoin1".official_answers, "_prejoin1".official_blueprints, "_prejoin1".official_malone, "_prejoin1".official_rosetta, "_prejoin1".private_bugs, "_prejoin1".programminglang, "_prejoin1".project, "_prejoin1".registrant, "_prejoin1".remote_product, "_prejoin1".reviewer_whiteboard, "_prejoin1".screenshotsurl, "_prejoin1".security_contact, "_prejoin1".sourceforgeproject, "_prejoin1".summary, "_prejoin1".title, "_prejoin1".translation_focus, "_prejoin1".translationgroup, "_prejoin1".translationpermission, "_prejoin1".wikiurl, "_prejoin2".id, "_prejoin2".name, Bug.date_last_message, Bug.date_last_updated, Bug.date_made_private, Bug.datecreated, Bug.description, Bug.duplicateof, Bug.heat, Bug.heat_last_updated, Bug.id, Bug.latest_patch_uploaded, Bug.message_count, Bug.name, Bug.number_of_duplicates, Bug.owner, Bug.private, Bug.security_related, Bug.title, Bug.users_affected_count, Bug.users_unaffected_count, Bug.who_made_private FROM Bug, BugTask LEFT JOIN Product AS "_prejoin1" ON BugTask.product = "_prejoin1".id LEFT JOIN SourcePackageName AS "_prejoin2" ON BugTask.sourcepackagename = "_prejoin2".id WHERE Bug.id = BugTask.bug AND BugTask.distribution = 1 AND ((BugTask.status = 10) OR (BugTask.status = 15) AND (Bug.date_last_message IS NOT NULL AND BugTask.date_incomplete <= Bug.date_last_message) OR (BugTask.status = 15) AND (Bug.date_last_message IS NULL OR BugTask.date_incomplete > Bug.date_last_message) OR (BugTask.status = 20) OR (BugTask.status = 21) OR (BugTask.status = 22) OR (BugTask.status = 25)) AND Bug.duplicateof is NULL AND (Bug.fti @@ ftq('blu-ray') OR BugTask.fti @@ ftq('blu-ray') OR BugTask.targetnamecache ILIKE '%' || 'blu-ray' || '%') AND (Bug.private = FALSE OR EXISTS ( SELECT BugSubscription.bug FROM BugSubscription, TeamParticipation WHERE TeamParticipation.person = 15177 AND BugSubscription.person = TeamParticipation.team AND BugSubscription....

Read more...

tags: removed: oops
Stuart Bishop (stub) wrote :

I'm seeing a lot of variation in times to just to a query against Bug.fti - up to five seconds. I think this is the root cause, not a side effect of the join. I don't see this variation querying bugtask. Perhaps it is because the Bug.fti values are generally much bigger, indexing the bug title and description, or perhaps the table or indexes are bloated. I'll check for bloat now.

Stuart Bishop (stub) wrote :

Yes, both bug and bug.fti are badly bloated on the master.

Stuart Bishop (stub) wrote :

I've rebuilt the indexes on Bug.fti on all databases, and the query is now usually < 500ms with occasional spike to < 1s. The Bug table is still bloated - I can't repack that without causing downtime.

Deryck Hodge (deryck) wrote :

I duped bug 611741 against this bug. I believe it's the same issue but for the top-level bugs domain search, rather than being distribution search only.

Robert Collins (lifeless) wrote :

Stuart, can we do anything incremental to fix this? If not, then I think we need to plan 2 things:
 - how to make sure we can fix this in future
 - down time to fix & convert to whatever thing future proofs us.

Robert Collins (lifeless) wrote :

@deryck, yes, that other bug has the same slow query, and worryingly it was filed two days after the indices were rebuilt.

Stuart Bishop (stub) wrote :

So this is currently performing better and more consistently on edge and production. The bloated indexes where causing some problems, and this has been fixed. I need to write and schedule reports to pick up these issues earlier and preemptively fix them.

The big problem is the ILIKE '%foo%' substring search on BugTask.targetnamecache. On production, it consistently takes a little over a second. At the moment, we don't have infrastructure to do substring searches against this much data quickly enough so if we need a quick fix we need to drop this clause, relying on the full text query.

If we can't drop this clause, we will need a new index type like Wildspeed (http://www.sai.msu.su/~megera/wiki/wildspeed -- written by the same people who wrote tsearch2), trigrams (in postgresql-contrib), or NIH it and roll our own using a pair of tables and standard indexes for prefix searches and more complex queries.

Stuart Bishop (stub) wrote :

I now realize Robert was testing on staging, not production. Thankfully this pointed me to bloat on production, but I'm not sure why the staging timings are different to my production timings. RAM will be significant with production having gobs more available.

One other thing we can do is tear out the hyphenation handling in our ftq() helper function. A search for 'blu-ray' is converted to '(blu & ray) | bluray'. I'm suggesting this because I many examples of slow searches I get shown contain hyphens, as what people assume is a simple query isn't really.

Robert Collins (lifeless) wrote :

Stuart - yes, I was on staging -the %foo% times are consistent with production there.

I think the *intent* of the wildcard search is 'find all bugs from packages with similar package names' ? The fulltext search won't (intrinsically) find libbluray when someone searches for bluray.

So, to compare options
One - just use the fti across all packages. Upside: fast. Downside: Won't return bugs in libbluray for searches in bluray.

Two - bring in a new index for %foo% handling. Upside: its faster. Downside: Needs a custom extension to pg built.

Three - drop targetnamecache and instead use fti on the bugtarget fti - sourcepackagemumble in this case - which has the package descriptions and thus should find libbluray for a search on bluray.

Four - stop including all bugs on packages with similar strings by default. Some related bugs to this:
 https://bugs.edge.launchpad.net/malone/+bug/76322 - wants a separate 'packagename' field
 https://bugs.edge.launchpad.net/malone/+bug/174394 - suggest narrower searches rather than all packages
 I was sure there was a bug about lp bug search being unsatisfying when a package name matches the query, but I can't find it trivially,

Anyhow, If I were hacking on this, I think I'd add a packagename field to the distro search (and a project name to the project group search) and drop the %foo% OR clause completely - I'd use the packagename/projectname field as an AND clause instead, when present, and when its not present search only on what the user put in.

I'm going to turn this into a bug about the ILIKE submarine issue, and file a separate bug about index bloating.

summary: - timeouts on Distribution:+bugs doing searches
+ Distribution:+bugs search timeout due to package name matching
Robert Collins (lifeless) wrote :
tags: removed: fti timeout
summary: - Distribution:+bugs search timeout due to package name matching
+ Distribution:+bugs search has worrying overhead due to package name
+ matching
description: updated
description: updated
description: updated
Robert Collins (lifeless) wrote :

so targetnamecache is a component of the slowness here:
 - there are 26K products
 - and 37K sourcepackagenames
 - but 700K bugs, so we're ilike searchings 15 times as much data.

lpmain_staging=> select count(*) from sourcepackagename where sourcepackagename.name ilike '%bluray%';
 count
-------
     1
(1 row)

Time: 37.813 ms

lpmain_staging=> select count(*) from bugtask where bugtask.targetnamecache ilike '%bluray%';
 count
-------
     0
(1 row)

Time: 1113.370 ms

lpmain_staging=> select count(*) from bugtask,sourcepackagename where bugtask.sourcepackagename=sourcepackagename.id and sourcepackagename.name ilike '%bluray%';
 count
-------
     0
(1 row)

Time: 74.883 ms

Robert Collins (lifeless) wrote :

Rearranging the status component gives us:

  (BugTask.status in (10, 20,21,22,25)) OR

 ((BugTask.status = 15) AND (
    (Bug.date_last_message IS NOT NULL AND
     BugTask.date_incomplete <= Bug.date_last_message))
 OR
 (
    (Bug.date_last_message IS NULL OR BugTask.date_incomplete > Bug.date_last_message)))

which is a little easier to read though no different to the planner.

15 is incomplete, so the middle bit is:
incomplete messages with a last message(date_last_message is not null) where the message is newer than the incompleteness
OR
there is no last message
OR
but the bug was made incomplete after the last message.

This all cancels out: we should just not restrict it.

Robert Collins (lifeless) wrote :

select count(*) from bug where date_last_message is null;
 count
-------
     0
(1 row)

Time: 28.980 ms

If we require that all bugs have a message it would be simpler too.

Robert Collins (lifeless) wrote :

Doing that takes 1.5 seconds off, without changing the %foo% clause, and returns the same number of rows on staging. (1745ms)
Changing to use the sourcepackagename and product names in like seems to be less of a win, so deferring that for now. (2400ms)

Changed in launchpad:
importance: High → Critical
tags: added: timeout
Launchpad QA Bot (lpqabot) wrote :
Changed in launchpad:
assignee: nobody → Robert Collins (lifeless)
milestone: none → 11.03
tags: added: qa-needstesting
Changed in launchpad:
status: Triaged → Fix Committed
tags: added: bad-commit-12374
description: updated
tags: added: qa-ok
removed: qa-needstesting
Robert Collins (lifeless) wrote :

My branch lets us turn off the target name component of the search per bug 268508 though it won't remove all the overheads we're seeing.

William Grant (wgrant) on 2011-02-16
Changed in launchpad:
status: Fix Committed → Fix Released
Robert Collins (lifeless) wrote :

We've disabled this search semi permanently via feature rules. In future we can reenable or reimplement or remove. Not all of the recommendations in this bug have been actioned, but I think we should look with fresh eyes at further timeouts.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers