grep performs *BAD*
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
grep (Ubuntu) |
Invalid
|
Medium
|
Unassigned |
Bug Description
i have a dataset of 640K lines and a grep takes for ever to complete look:
root@ubuntu:/tmp # wc -l bla
641460 bla
root@ubuntu:/tmp # time grep FOO bla |wc -l
121043
real 2m47.790s
user 2m45.674s
sys 0m0.118s
the proces takes LONG
on a debian box the same file and operation:
me@debian:/tmp$ wc -l bla
641460 bla
me@debian:/tmp$ time grep FOO bla|wc -l
121043
real 0m0.091s
user 0m0.050s
sys 0m0.030s
that is a factor 1835 times faster !
so something is going realy wrong here.
I've tested on 3 boxes: server (ubuntu hoary) , desktop (ubuntu breezy) and a
laptop (kubuntu breezy)
all boxes including the debian one use grep (GNU grep) 2.5.1
all boxes are curent models 2+ghz 512+MB
Marius Karthaus
for reproducability i did the folowing experiment:
below are two sessions, one is done on ubuntu the other on debian. I used php5
on both to generate a random filled file ok 250K lines.
The proces of creating the file takes a 57 seconds on the ubuntu box and 51
seconds on the debian box. this small difference is because of the debian box
being a bit faster. nothing strange so far.
also counting the lines in the files takes very little time on both boxes so wc
is not the problem here and neither is reading the file.
But than i get to the part where I grep for FOO. on ubunutu it's finds 1438
lines in the random file in 4.3 seconds. On debian it finds 1412 lines in 0,049
seconds. That is a gigantic difference! I'm having the same issue with logfiles
etc. As files get larger and more complex i've seen the debian box outperforming
the ubuntu box with a factor of thousands of times.
And as i stated in the first bug submit this is not something i'm seeing on one
box, i've tested on multiple ubuntu/kubuntu installs with both hoary and breezy.
I even tested on a kubuntu that i myself did not install.
Marius Karthaus www.BudgetDedic ated.com>BudgetDedicate d</a>
<a href='http://
###SLOW ON UBUNTU
root@mub: ~/BV.NL/ slowgrep# cat randomdata.php
#!/usr/bin/php
<?
# create 250K of lines with random A-Z
$c=30;
for ($l=0;$ l<250000; $l++){ chr(rand( 65,90)) ;
$line='';
for ($n=0;$n<100;$n++){
#make pares of 3 chars
$line.=
}
$line.="\n";
echo $line;
}
?>
root@mub: ~/BV.NL/ slowgrep# time ./randomdata.php >test
real 0m57.386s
user 0m55.218s
sys 0m1.257s
root@mub: ~/BV.NL/ slowgrep# time wc -l test
250000 test
real 0m0.074s
user 0m0.030s
sys 0m0.034s
root@mub: ~/BV.NL/ slowgrep# time grep FOO test |wc -l
1438
real 0m4.317s
user 0m4.207s
sys 0m0.043s
### SAME STUFF FAST ON DEBIAN
root@web1:~# cat randomdata.php
#!/usr/bin/php
<?
# create 250K of lines with random A-Z
$c=30;
for ($l=0;$ l<250000; $l++){ chr(rand( 65,90)) ;
$line='';
for ($n=0;$n<100;$n++){
#make pares of 3 chars
$line.=
}
$line.="\n";
echo $line;
}
?>
root@web1:~# time ./randomdata.php >test
real 0m51.270s
user 0m50.380s
sys 0m0.850s
root@web1:~# time wc -l test
250001 test
real 0m0.028s
user 0m0.010s
sys 0m0.010s
root@web1:~# time grep FOO test |wc -l
1412
real 0m0.049s
user 0m0.020s
sys 0m0.020s