Comment 10 for bug 264122

Revision history for this message
Andrew Sayers (andrew-bugs-launchpad-net) wrote :

I work in corpus linguistics, so I'm used to dealing with gigabytes of text, but I'm also used to much longer documents than it would be useful to search by MD5 hash :)

If you don't mind my asking, doesn't a full-text index leave you with gigantic index file and constant random reads from the disk? If so, do you have any data about whether the new solid state disks improve performance there?

I've had a quick look through the list of Rosetta blueprints, which makes me more curious about the possibilities for cross-pollination. Is there a standard list of use cases that I could look at? It would be useful to know what the "best" solution is in your domain, as distinct from my preconceptions.