Remove the upper limit on extracted text

Bug #1365493 reported by Paul Everitt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
KARL4
New
Wishlist
Chris Rossi

Bug Description

As Chris noted in lp:1364383 the contextual summary expects to have the extracted text around. We just removed the extracted text in some cases in lp:1340295.

I'm putting this as a low priority, put in October, as it isn't clear that many search results will hit the small number of objects that no longer have extracted text. This conclusion presumes that LiveSearch does not call contextual summaries. If I'm wrong about that, or you disagree that this won't be triggered much, tell me.

Once we do decide to do it, perhaps we can consider some mild zlib compression on the extracted text, as a defense against database bloat.

Revision history for this message
Chris Rossi (chris-archimedeanco) wrote : Re: [Bug 1365493] [NEW] Remove the upper limit on extracted text

FWIW, we're already doing the zlib compression. I thought the contextual
summaries were in the LiveSearch but just looked and it looks like they
aren't, so that's good.

Generally speaking, this will just mean that in the somewhat rare cases
that a document without the extracted text cache shows up in a search
results listing, the text will be re-extracted--so it will contribute to
slowness but not breakage, except in pathological cases like lp:1364383
where the extractor hangs on a particular document.

On Thu, Sep 4, 2014 at 9:02 AM, Paul Everitt <email address hidden> wrote:

> Public bug reported:
>
> As Chris noted in lp:1364383 the contextual summary expects to have the
> extracted text around. We just removed the extracted text in some cases
> in lp:1340295.
>
> I'm putting this as a low priority, put in October, as it isn't clear
> that many search results will hit the small number of objects that no
> longer have extracted text. This conclusion presumes that LiveSearch
> does not call contextual summaries. If I'm wrong about that, or you
> disagree that this won't be triggered much, tell me.
>
> Once we do decide to do it, perhaps we can consider some mild zlib
> compression on the extracted text, as a defense against database bloat.
>
> ** Affects: karl3
> Importance: Low
> Assignee: Chris Rossi (chris-archimedeanco)
> Status: New
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1365493
>
> Title:
> Remove the upper limit on extracted text
>
> Status in KARL3:
> New
>
> Bug description:
> As Chris noted in lp:1364383 the contextual summary expects to have
> the extracted text around. We just removed the extracted text in some
> cases in lp:1340295.
>
> I'm putting this as a low priority, put in October, as it isn't clear
> that many search results will hit the small number of objects that no
> longer have extracted text. This conclusion presumes that LiveSearch
> does not call contextual summaries. If I'm wrong about that, or you
> disagree that this won't be triggered much, tell me.
>
> Once we do decide to do it, perhaps we can consider some mild zlib
> compression on the extracted text, as a defense against database
> bloat.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/karl3/+bug/1365493/+subscriptions
>

Revision history for this message
Paul Everitt (paul-agendaless) wrote :

Based on Chris's input, I will make this an even lower priority, farther out. :)

Changed in karl3:
importance: Low → Wishlist
milestone: m141 → m142
Revision history for this message
Chris Rossi (chris-archimedeanco) wrote : Re: [Bug 1365493] Re: Remove the upper limit on extracted text

With the caveat that if we make the change proposed in lp:1364383 then
we'll no longer try to re-extract the text and those documents will then
just not have a contextual summary, which is probably also not the end of
the world.

On Thu, Sep 4, 2014 at 9:28 AM, Paul Everitt <email address hidden> wrote:

> Based on Chris's input, I will make this an even lower priority, farther
> out. :)
>
> ** Changed in: karl3
> Importance: Low => Wishlist
>
> ** Changed in: karl3
> Milestone: m141 => m142
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1365493
>
> Title:
> Remove the upper limit on extracted text
>
> Status in KARL3:
> New
>
> Bug description:
> As Chris noted in lp:1364383 the contextual summary expects to have
> the extracted text around. We just removed the extracted text in some
> cases in lp:1340295.
>
> I'm putting this as a low priority, put in October, as it isn't clear
> that many search results will hit the small number of objects that no
> longer have extracted text. This conclusion presumes that LiveSearch
> does not call contextual summaries. If I'm wrong about that, or you
> disagree that this won't be triggered much, tell me.
>
> Once we do decide to do it, perhaps we can consider some mild zlib
> compression on the extracted text, as a defense against database
> bloat.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/karl3/+bug/1365493/+subscriptions
>

Revision history for this message
Paul Everitt (paul-agendaless) wrote : Re: [Bug 1365493] Remove the upper limit on extracted text

We'll consider that ticket to not be part of the OSF KARL project. gocept will need to contract with you if they want to make that happen.

--Paul

On Sep 4, 2014, at 9:50 AM, Chris Rossi <email address hidden> wrote:

> With the caveat that if we make the change proposed in lp:1364383 then
> we'll no longer try to re-extract the text and those documents will then
> just not have a contextual summary, which is probably also not the end of
> the world.
>
>
> On Thu, Sep 4, 2014 at 9:28 AM, Paul Everitt <email address hidden> wrote:
>
>> Based on Chris's input, I will make this an even lower priority, farther
>> out. :)
>>
>> ** Changed in: karl3
>> Importance: Low => Wishlist
>>
>> ** Changed in: karl3
>> Milestone: m141 => m142
>>
>> --
>> You received this bug notification because you are a bug assignee.
>> https://bugs.launchpad.net/bugs/1365493
>>
>> Title:
>> Remove the upper limit on extracted text
>>
>> Status in KARL3:
>> New
>>
>> Bug description:
>> As Chris noted in lp:1364383 the contextual summary expects to have
>> the extracted text around. We just removed the extracted text in some
>> cases in lp:1340295.
>>
>> I'm putting this as a low priority, put in October, as it isn't clear
>> that many search results will hit the small number of objects that no
>> longer have extracted text. This conclusion presumes that LiveSearch
>> does not call contextual summaries. If I'm wrong about that, or you
>> disagree that this won't be triggered much, tell me.
>>
>> Once we do decide to do it, perhaps we can consider some mild zlib
>> compression on the extracted text, as a defense against database
>> bloat.
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/karl3/+bug/1365493/+subscriptions
>>
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1365493
>
> Title:
> Remove the upper limit on extracted text
>
> Status in KARL3:
> New
>
> Bug description:
> As Chris noted in lp:1364383 the contextual summary expects to have
> the extracted text around. We just removed the extracted text in some
> cases in lp:1340295.
>
> I'm putting this as a low priority, put in October, as it isn't clear
> that many search results will hit the small number of objects that no
> longer have extracted text. This conclusion presumes that LiveSearch
> does not call contextual summaries. If I'm wrong about that, or you
> disagree that this won't be triggered much, tell me.
>
> Once we do decide to do it, perhaps we can consider some mild zlib
> compression on the extracted text, as a defense against database
> bloat.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/karl3/+bug/1365493/+subscriptions

affects: karl3 → karl4
Changed in karl4:
milestone: m142 → none
Changed in karl4:
milestone: none → 003
Changed in karl4:
milestone: 003 → 999
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.