Comment 13 for bug 7906

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <20030904190238.GB6837@crystal>
Date: Thu, 4 Sep 2003 15:02:38 -0400
From: "H. S. Teoh" <email address hidden>
To: <email address hidden>, <email address hidden>, <email address hidden>
Subject: Identical bugs

merge 181378 206470
thanks

These bugs appear to be the same (see latest messages in #181378).

As for the bugs themselves, could it be that the problem is caused by grep
localizing every input character, as opposed to localizing the regex and
then matching the resulting bytes? I haven't looked at the code to be
sure, but this is what immediately came to mind when I read about the
LC_CTYPE=C speed difference.

Translating every input character would, indeed, slow things down a lot. A
better alternative would be to localize the regex, match on a byte-by-byte
basis, and then localize the output only if it matches. However, this may
have pathological problems if multiple representations of the same
character are possible (e.g. Unicode combining diacritics vs. precomposed
characters). I'm not sure what the solution would be in this case.

T

--
If you look at a thing nine hundred and ninety-nine times, you are perfectly
safe; if you look at it the thousandth time, you are in frightful danger of
seeing it for the first time. -- G. K. Chesterton