Comment 13 for bug 1081104

Revision history for this message
Paul Everitt (paul-agendaless) wrote : Re: [Bug 1081104] Investigate switching from doctotext to a supported extractor

I think we're not going to consider anything that requires a server process. I had the impression when watching it run that it started pretty fast.

--Paul

On Mar 27, 2013, at 3:34 PM, Chris Rossi <email address hidden> wrote:

> Theune says what he showed you was LibreOffice, using the CLI. I'll
> play with it and see how it does. Seems to be a very heavy weight tool,
> a big process needs to load before it can do anything. So we might be
> looking at long extract times that would require us to use a queue and a
> separate thread or process to do the text extraction offline, so we
> don't slow down user HTTP requests. I
>
> --
> You received this bug notification because you are subscribed to KARL3.
> https://bugs.launchpad.net/bugs/1081104
>
> Title:
> Investigate switching from doctotext to a supported extractor
>
> Status in KARL3:
> Confirmed
>
> Bug description:
> Hi!
>
> we see a lot of errors of the following form in our KARL error
> monitor:
>
> Tue Nov 20 05:32:47 2012 ERROR mailin Error converting file
> /tmp/tmp46IR1W
>
> Error converting file /tmp/tmp46IR1W
>
> Traceback (most recent call last):
> File "/srv/multikarl/production/12/eggs/karl-3.99-py2.6.egg/karl/content/models/adapters.py", line 116, in _extract_file_data
> mimetype=context.mimetype)
> File "/srv/multikarl/production/12/eggs/karl-3.99-py2.6.egg/karl/utilities/converters/doc.py", line 39, in convert
> return self.execute('doctotext "%s"' % filename), 'utf-8'
> File "/srv/multikarl/production/12/eggs/karl-3.99-py2.6.egg/karl/utilities/converters/baseconverter.py", line 54, in execute
> close_fds=True)
> File "/usr/lib/python2.6/subprocess.py", line 623, in __init__
> errread, errwrite)
> File "/usr/lib/python2.6/subprocess.py", line 1141, in _execute_child
> raise child_exception
> OSError: [Errno 2] No such file or directory
>
>
> We see this in all sub-projects (Ariadne, Oxfam, Privacy International) and it always affects files written to /tmp/somefile.
>
>
> Best regards,
> Alex
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/karl3/+bug/1081104/+subscriptions