I tried the gofast patche, and did not find a real improvement.
However, Fedora is now using a different patch, which improve dramaticaly
grep performances on an UTF-8 environment.
Please find attached the following patches:
* I put the original Fedora patches in the orig directory. The other
patches are updated for the Debian package.
* 64-egf-speedup.patch
It does most of the work. Here is the explanation, according to: http://savannah.gnu.org/patch/?func=detailitem&item_id=3803
> The full story behind this patch is that grep-2.5.1a does not handle
> UTF-8 gracefully at all. The basic plan with handling UTF-8 in 2.5.1a
> is:
> * whenever a buffer is parsed, go through the entire buffer deciding
> how many bytes make up each character
> * use this information when necessary
>
> This patch changes that to:
> * when information about how many bytes make up a character is needed,
> work it out on demand
>
> On the face of it, this is a small obvious improvement. In fact it is
> much better than that, because the original scheme would calculate
> character lengths several times for each buffer: in fact, one full
> pass for every single potential match!
* 65-dfa-optional.patch
I'm not sure this one is really needed.
I've read the DFA algorithme is slow for UTF-8 and this patch disable
it in that case (and it can be forced enabled by setting an evirronment
variable)
* grep-2.5.1-tests.patch
Fedora also added a test for UTF-8.
* 66-match_icase.patch
* 67-w.patch
After testing the new UTF-8 tests, these too seems to be needed.
(It is not really related to the grep's speed, but these patches may
be interresting)
I tried a grep packages with all these patches, and for the following
command:
grep '^' /var/lib/dpkg/available> /dev/null
grep is more than 1500 faster on an UTF-8 environment.
(on my machine, it take less than 3/4s instead of more than 10 minutes!)
Also, I did not notice any regression, and grep is not dramatically
slower on the C locale.
These patches may be important for Etch since the transition to UTF-8 is
mentionned on the (unofficial) Etch TODO list: http://wiki.debian.net/?EtchTODOList
(And the French team is considering using UTF-8 for the default French
locale)
Package: grep
Version: 2.5.1.ds1-5
Followup-For: Bug #181378
Hello,
I tried the gofast patche, and did not find a real improvement.
However, Fedora is now using a different patch, which improve dramaticaly
grep performances on an UTF-8 environment.
Please find attached the following patches: speedup. patch savannah. gnu.org/ patch/? func=detailitem &item_id= 3803
* I put the original Fedora patches in the orig directory. The other
patches are updated for the Debian package.
* 64-egf-
It does most of the work. Here is the explanation, according to:
http://
> The full story behind this patch is that grep-2.5.1a does not handle
> UTF-8 gracefully at all. The basic plan with handling UTF-8 in 2.5.1a
> is:
> * whenever a buffer is parsed, go through the entire buffer deciding
> how many bytes make up each character
> * use this information when necessary
>
> This patch changes that to:
> * when information about how many bytes make up a character is needed,
> work it out on demand
>
> On the face of it, this is a small obvious improvement. In fact it is
> much better than that, because the original scheme would calculate
> character lengths several times for each buffer: in fact, one full
> pass for every single potential match!
* 65-dfa- optional. patch 5.1-tests. patch icase.patch
I'm not sure this one is really needed.
I've read the DFA algorithme is slow for UTF-8 and this patch disable
it in that case (and it can be forced enabled by setting an evirronment
variable)
* grep-2.
Fedora also added a test for UTF-8.
* 66-match_
* 67-w.patch
After testing the new UTF-8 tests, these too seems to be needed.
(It is not really related to the grep's speed, but these patches may
be interresting)
I tried a grep packages with all these patches, and for the following dpkg/available> /dev/null
command:
grep '^' /var/lib/
grep is more than 1500 faster on an UTF-8 environment.
(on my machine, it take less than 3/4s instead of more than 10 minutes!)
Also, I did not notice any regression, and grep is not dramatically
slower on the C locale.
These patches may be important for Etch since the transition to UTF-8 is wiki.debian. net/?EtchTODOLi st
mentionned on the (unofficial) Etch TODO list:
http://
(And the French team is considering using UTF-8 for the default French
locale)
Thanks in advance,
--
Nekral