grep performs *BAD*

Bug #24902 reported by Marius Karthaus
This bug report is a duplicate of:  Bug #7906: grep is extremely slow with UTF-8. Edit Remove
6
Affects Status Importance Assigned to Milestone
grep (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

i have a dataset of 640K lines and a grep takes for ever to complete look:

root@ubuntu:/tmp # wc -l bla
641460 bla
root@ubuntu:/tmp # time grep FOO bla |wc -l
121043

real 2m47.790s
user 2m45.674s
sys 0m0.118s

the proces takes LONG

on a debian box the same file and operation:

me@debian:/tmp$ wc -l bla
641460 bla
me@debian:/tmp$ time grep FOO bla|wc -l
121043

real 0m0.091s
user 0m0.050s
sys 0m0.030s

that is a factor 1835 times faster !

so something is going realy wrong here.

I've tested on 3 boxes: server (ubuntu hoary) , desktop (ubuntu breezy) and a
laptop (kubuntu breezy)

all boxes including the debian one use grep (GNU grep) 2.5.1

all boxes are curent models 2+ghz 512+MB

Marius Karthaus

Revision history for this message
Marius Karthaus (bugs-karthaus) wrote :

for reproducability i did the folowing experiment:

below are two sessions, one is done on ubuntu the other on debian. I used php5
on both to generate a random filled file ok 250K lines.

The proces of creating the file takes a 57 seconds on the ubuntu box and 51
seconds on the debian box. this small difference is because of the debian box
being a bit faster. nothing strange so far.

also counting the lines in the files takes very little time on both boxes so wc
is not the problem here and neither is reading the file.

But than i get to the part where I grep for FOO. on ubunutu it's finds 1438
lines in the random file in 4.3 seconds. On debian it finds 1412 lines in 0,049
seconds. That is a gigantic difference! I'm having the same issue with logfiles
etc. As files get larger and more complex i've seen the debian box outperforming
the ubuntu box with a factor of thousands of times.

And as i stated in the first bug submit this is not something i'm seeing on one
box, i've tested on multiple ubuntu/kubuntu installs with both hoary and breezy.
I even tested on a kubuntu that i myself did not install.

Marius Karthaus
<a href='http://www.BudgetDedicated.com>BudgetDedicated</a>

###SLOW ON UBUNTU

root@mub:~/BV.NL/slowgrep# cat randomdata.php

#!/usr/bin/php
<?

# create 250K of lines with random A-Z

$c=30;

for ($l=0;$l<250000;$l++){
$line='';
for ($n=0;$n<100;$n++){
#make pares of 3 chars
$line.=chr(rand(65,90));
}
$line.="\n";
echo $line;
}
?>

root@mub:~/BV.NL/slowgrep# time ./randomdata.php >test

real 0m57.386s
user 0m55.218s
sys 0m1.257s

root@mub:~/BV.NL/slowgrep# time wc -l test
250000 test

real 0m0.074s
user 0m0.030s
sys 0m0.034s

root@mub:~/BV.NL/slowgrep# time grep FOO test |wc -l
1438

real 0m4.317s
user 0m4.207s
sys 0m0.043s

### SAME STUFF FAST ON DEBIAN

root@web1:~# cat randomdata.php
#!/usr/bin/php
<?

# create 250K of lines with random A-Z

$c=30;

for ($l=0;$l<250000;$l++){
$line='';
for ($n=0;$n<100;$n++){
#make pares of 3 chars
$line.=chr(rand(65,90));
}
$line.="\n";
echo $line;
}

?>

root@web1:~# time ./randomdata.php >test

real 0m51.270s
user 0m50.380s
sys 0m0.850s
root@web1:~# time wc -l test
250001 test

real 0m0.028s
user 0m0.010s
sys 0m0.010s
root@web1:~# time grep FOO test |wc -l
1412

real 0m0.049s
user 0m0.020s
sys 0m0.020s

Revision history for this message
Dennis Kaarsemaker (dennis) wrote :

Is DMA enabled on your drive?

Revision history for this message
Marius Karthaus (bugs-karthaus) wrote :

Average troughput for large files on this box seems to be about 38 M/sec ( untar
a large file )
I do not see how it could be DMA, if the problem was disk troughput than 'wc -l
TESTFILE' whould be equaly slow. However DMA is enabled

root@mub:~# hdparm -d /dev/hda

/dev/hda:
 using_dma = 1 (on)

Revision history for this message
Adam Conrad (adconrad) wrote :

I assume you're using a UTF-8 locale on Ubuntu (as that's our default), and a
non-UTF-8 locale on Debian. Can you confirm this? If so, this is a duplicate
of #1148

Revision history for this message
Adam Conrad (adconrad) wrote :

Oh, if my suspicions are correct, can you try testing the sid version of grep
(http://packages.debian.org/unstable/utils/grep) on your system, and see if it
behaves better for you? If so, we can merge those changes for dapper ASAP and
close this longstanding bug.

Revision history for this message
Marius Karthaus (bugs-karthaus) wrote :

Hi Adam,

I took a look at the locales on the machines and you assumption was right.
I downloaded 'grep_2.5.1.ds2-2_i386.deb' from a mirror and installed with dpkg
-i without problems. The speed issue was solved.

thank you.

Revision history for this message
Matt Zimmerman (mdz) wrote :

This bug has been marked as a duplicate of bug 7906.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.