Comment 1 for bug 1657171

Revision history for this message
Mike Rylander (mrylander) wrote :

This[1] is licensed under the Apache 2.0, which is compatible with the GPL, so maybe we should steal some of the normalizations from it and add them to our search_normalize() function?

[1] https://github.com/ciprian-chelba/1-billion-word-language-modeling-benchmark/blob/master/scripts/normalize-punctuation.perl