Comment 13 for bug 745243

I've got some code lying around which is a hacked version of cjk-tokenizer which uses xapian's unicode routines; it wasn't hard to make. I'll shove a copy of it up on github in a moment. It still requires linking into an indexing and query parser, though.