This[1] is licensed under the Apache 2.0, which is compatible with the GPL, so maybe we should steal some of the normalizations from it and add them to our search_normalize() function?
[1] https://github.com/ciprian-chelba/1-billion-word-language-modeling-benchmark/blob/master/scripts/normalize-punctuation.perl
This[1] is licensed under the Apache 2.0, which is compatible with the GPL, so maybe we should steal some of the normalizations from it and add them to our search_normalize() function?
[1] https:/ /github. com/ciprian- chelba/ 1-billion- word-language- modeling- benchmark/ blob/master/ scripts/ normalize- punctuation. perl