KARL3

Live Search Not Producing Expected Result

Bug #430037 reported by Anthony on 2009-09-15

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	KARL3	Fix Released	Medium	Chris Rossi

Bug Description

Bug Reported by Jonathan Hooper:

When I type the search term "fgrep" I don’t see any results, but a blog comment clearly contains this phrase:

https://karl.soros.org/communities/hoops-snips/blog/recursive-grep

Do we know why this is happening?

Anthony (agalietti) on 2009-09-15

Changed in karl3:
assignee:	nobody → Paul Everitt (paul-agendaless)

Revision history for this message

Paul Everitt (paul-agendaless) wrote on 2009-09-15:

Don't know if this is better for you or Shane. If you want, re-assign it.

Changed in karl3:
assignee:	Paul Everitt (paul-agendaless) → Chris McDonough (chrism-plope)
importance:	Undecided → Medium
milestone:	none → m32

Paul Everitt (paul-agendaless) on 2009-09-20

Changed in karl3:
milestone:	m32 → m33

Revision history for this message

Chris McDonough (chrism-plope) wrote on 2009-09-22:

This object converts its text to html when it's indexed, and also makes use of the title in the text index:

>>> ob = root['communities']['hoops-snips']['blog']['recursive-grep']['comments']['002']
>>> ob.text
u'<div xmlns="http://www.w3.org/1999/xhtml">\r\n <p>recursive grep, then delete<br><br>fgrep -lir baltimore * | xargs rm</p>\r\n </div>\n'
>>> from karl.utilities.converters.stripogram import html2text
>>> html2text(ob.text)
u'\n\nrecursive grep, then delete\n\n\n\nfgrep -lir baltimore * | xargs rm'
>>> ob.title
u'Recursive GREP'

For this object we've indexed the following words:

>>> docid = root['communities']['hoops-snips']['blog']['recursive-grep']['comments']['002'].docid
>>> wids = root.catalog['texts'].index.get_words(docid)
>>> index = texts = root.catalog['texts'].index
>>> lexicon = index._lexicon
>>> map(lexicon.get_word, wids)
[u'recursive', u'grep', u'recursive', u'grep', u'delete']

I can't make much sense out of that yet.

Revision history for this message

Chris McDonough (chrism-plope) wrote on 2009-09-22:

Found it. The stripogram stuff is a decoy. The actual implementation uses karl.content.models.adapters._html_cleaner, which has a bug:

>>> from karl.content.models.adapters import _html_cleaner
>>> _html_cleaner(u'<div xmlns="http://www.w3.org/1999/xhtml">\r\n <p>recursive grep, then delete<br><br>fgrep -lir baltimore * | xargs rm</p>\r\n </div>\n')

Returns:

'\r\n recursive grep, then delete'

It should return something more like the previous stripogram example (u'\n\nrecursive grep, then delete\n\n\n\nfgrep -lir baltimore * | xargs rm').

Revision history for this message

Chris McDonough (chrism-plope) wrote on 2009-09-22:

The bug which caused improper input to the text indexer has been fixed on the trunk.

We'll need to run "bin/reindex_catalog" on the production system after the next release to fix existing content.

Changed in karl3:
status:	New → Fix Committed

Revision history for this message

Paul Everitt (paul-agendaless) wrote on 2009-12-01:

Hi Chris. Looks like this didn't get done completely in production. Some old content with fgrep isn't searchable.

Changed in karl3:
assignee:	Chris McDonough (chrism-plope) → Chris Rossi (chris-archimedeanco)
milestone:	m33 → none
status:	Fix Committed → Incomplete

Revision history for this message

Chris Rossi (chris-archimedeanco) wrote on 2009-12-01:

Hi Paul, Chris M's fix is from 9/22 and the last release of Karl was 9/8, so we're still waiting for this to go into production.

Changed in karl3:
status:	Incomplete → Fix Committed

Paul Everitt (paul-agendaless) on 2010-01-06

Changed in karl3:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.