Use of libreoffice at the command line is fairly straightforward. I used a small .docx file sent to me by my daughter's preschool to test, as well as an NDA sent to me by a client. LibreOffice is noticeably slower than doctotext. Here are some sample times, first for doctotext: crossi@spirit:~/download$ time doctotext TLG\ Garden\ Rules.docx > test.txt Using ODF/OOXML parser. real 0m0.012s user 0m0.004s sys 0m0.004s crossi@spirit:~/download$ time doctotext TLG\ Garden\ Rules.docx > test.txt Using ODF/OOXML parser. real 0m0.014s user 0m0.012s sys 0m0.000s crossi@spirit:~/download$ time doctotext TLG\ Garden\ Rules.docx > test.txt Using ODF/OOXML parser. real 0m0.012s user 0m0.004s sys 0m0.004s crossi@spirit:~/download$ time doctotext TLG\ Garden\ Rules.docx > test.txt Using ODF/OOXML parser. real 0m0.012s user 0m0.000s sys 0m0.008s And now for LibreOffice: crossi@spirit:~/download$ time libreoffice --convert-to txt:Text --headless TLG\ Garden\ Rules.docx convert /home/crossi/download/TLG Garden Rules.docx -> /home/crossi/download/TLG Garden Rules.txt using Text Overwriting: /home/crossi/download/TLG Garden Rules.txt real 0m0.479s user 0m0.376s sys 0m0.072s crossi@spirit:~/download$ time libreoffice --convert-to txt:Text --headless TLG\ Garden\ Rules.docx convert /home/crossi/download/TLG Garden Rules.docx -> /home/crossi/download/TLG Garden Rules.txt using Text Overwriting: /home/crossi/download/TLG Garden Rules.txt real 0m0.464s user 0m0.376s sys 0m0.064s crossi@spirit:~/download$ time libreoffice --convert-to txt:Text --headless TLG\ Garden\ Rules.docx convert /home/crossi/download/TLG Garden Rules.docx -> /home/crossi/download/TLG Garden Rules.txt using Text Overwriting: /home/crossi/download/TLG Garden Rules.txt real 0m0.452s user 0m0.388s sys 0m0.040s crossi@spirit:~/download$ time libreoffice --convert-to txt:Text --headless TLG\ Garden\ Rules.docx convert /home/crossi/download/TLG Garden Rules.docx -> /home/crossi/download/TLG Garden Rules.txt using Text Overwriting: /home/crossi/download/TLG Garden Rules.txt real 0m0.476s user 0m0.360s sys 0m0.080s As you can see, the doctotext performance is very good, usually no more than about 15 seconds of real time. LibreOffice performance, by comparison, is marginal, adding about half a second per document for any request that needs it. Given performance that we already put up with, it's not horribly out of bounds, but it doesn't give me a warm fuzzy feeling either. I have a fairly zippy laptop with an SSD, so probably performance on one of Gocept's VMs would be worse. We should probably have Gocept install LibreOffice on a staging or dev server so we can look at what the real world performance might be. I also did use diff to look at quality of output. LibreOffice seems to do better than doctotext, but not in any way that impacts search. LibreOffice does a better job of preserving whitespace formatting and for some reason doctotext is discarding numbers from enumerated lists. So for example, if I have an enumerated list in my document, LibreOffice gives: 1. One thing 2. Another thing 3. And a third thing Where doctotext yields: One thing Another thing And a third thing Given that our goal is to extract text to feed to full text search, this is not a relevant difference. I did not notice any differences in output that seemed like they might be relevant for search.