Improve performance of community tag portlet

Bug #597838 reported by Paul Everitt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
KARL3
Fix Released
Medium
Chris Rossi

Bug Description

"""
I found your performance bottle neck. Not sure what to do about it yet, but it's the community tags portlet in the right sidebar of the community layout. well the call to tags.getFrequency() is taking 40-50 seconds on my local box. burns through a lot of CPU. possibly CPU bound rather than IO bound, but not positive. but would explain why ssd doesn't help.

looks like a possible O(n^2). the tagging stuff doesn't really contain an index for community. so if we do something like what we're doing in the community tags portlet, where we go:

  tags.getTags(community=community)

It marshalls every tag in the db and checks the community. It's probably only fixable with some kind of index on community.

Tags: tagging
Changed in karl3:
assignee: nobody → Chris Rossi (chris-archimedeanco)
Revision history for this message
Chris Rossi (chris-archimedeanco) wrote :

FWIW, the slow bit isn't O(n^2), it's just O(n) where n is fairly large.

Revision history for this message
Paul Everitt (paul-agendaless) wrote :

Handing some things over to Tres.

Changed in karl3:
assignee: Chris Rossi (chris-archimedeanco) → Tres Seaver (tseaver)
Tres Seaver (tseaver)
Changed in karl3:
status: Confirmed → In Progress
Revision history for this message
Tres Seaver (tseaver) wrote :

After getting the 'karl.tagging' test coverage to 100%, I have checked in
an optimization of getFrequency, avoiding intermediate calls to
'getTags' and 'getTagBojects' and the unnecessary creation of unused
intermediary data structures.

It would be useful if somebody with better knowledge of the current
deployment enviroment could test any speedup using production or
near-production data.

Changed in karl3:
status: In Progress → Fix Committed
Revision history for this message
Chris Rossi (chris-archimedeanco) wrote :

I'm still seeing about 40+ seconds for this with OSI data. Although avoiding unnecessary marshalling of the tag objects is obviously a good thing, I think our fundamental problem is this code is marshalling every single tag object in the site:

return set([x for x in result
                            if self._tagid_to_obj[x].community == community])

I'm not sure I see any way of fixing this without adding a _community_to_tagids structure analogous to the other *_to_tagids structures already present for item, user, and name.

Revision history for this message
Paul Everitt (paul-agendaless) wrote :

I'll move it back to "in progress" and we'll decide whether to continue investigating it or not. Perhaps we'll throw the profiler at it.

Changed in karl3:
status: Fix Committed → In Progress
Revision history for this message
Tres Seaver (tseaver) wrote :

The "slow load" issue is a cache locality problem. The following
computation:

 >>> [y for x, y in root.tags._tagid_to_obj.items()
 ... if y.community == 'nonesuch']
 []

takes many seconds (likely the 30-40 originally reported) on a cold ZODB
cache, but is nearly instantaneous on a hot cache.

Fixing this correctly likely involves generating sample data for profiling..

Here is some data, drawn from production:

 >>> len(root.tags._tagid_to_obj)
 37666
 >>> len(root.tags._item_to_tagids)
 7435
 >>> len(root.tags._name_to_tagids)
 4353
 >>> len(root.tags._user_to_tagids)
 500
 >>> len(set([y.community for x, y in root.tags._tagid_to_obj.items()]))
 384

Changed in karl3:
milestone: m44 → m999
assignee: Tres Seaver (tseaver) → nobody
Revision history for this message
Paul Everitt (paul-agendaless) wrote :

Chris did his idea on this and emailed the results of his branch to osi-dev. Tres, can you do a code review?

Changed in karl3:
assignee: nobody → Tres Seaver (tseaver)
milestone: m999 → m45
Revision history for this message
Tres Seaver (tseaver) wrote :

I have examined the diff::

 $ svn diff -c 5861 svn+ssh://<email address hidden>/home/osi/bfgsvn/karl/branches/community_tags_optimization

The main change looks good -- do we have enough test coverage for the cases
with / without communities to be confident that adding / using the index doesn't break
anything?

There seem to be some stray changes in the diff (comments in an older evolve script,
an apparently unrelated test), but they look harmless.

+1 for merge.

Changed in karl3:
assignee: Tres Seaver (tseaver) → Chris Rossi (chris-archimedeanco)
Revision history for this message
Chris Rossi (chris-archimedeanco) wrote :

Merged. Test coverage seems pretty good.

Changed in karl3:
status: In Progress → Fix Committed
Revision history for this message
JimPGlenn (jpglenn09) wrote :

fixed

Changed in karl3:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.