Ubuntu

mawk text-match count inconsistency

Reported by solidus126 on 2009-11-19
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
mawk (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: mawk

Hello all. I experienced a problem when I was trying to use a vanilla install of 9.10's mawk to process an Apache log file for a total count of 404's and 500 error's from field 9 of the log file. The log file's name is "access_log", and here is the code I ran:

mawk '$9 == 404 { count++ } END { print count }' access_log
mawk '$9 == 500 { count++ } END { print count }' access_log

Here is the requested information from the bug report's request:

1) I am using Ubuntu 9.10 Netbook Remix and on another machine, 9.10 Gnome (Regular).
2) The package version of mawk as reported by synaptic package manager is: 1.3.3-15-ubuntu.
3) I expected to get a result of 137 hits on field 9 for 500, 34167 hits for 404. This is what a friend of mine got (and I subsequently got) after running the same code under gawk 1:3.1.6.dfsg-0ubuntu2.
4) Results were 58 hits for 500, and 19093 hits for 404 when using mawk.

I am fairly new to bug reporting, at least to Ubuntu, but I will try to do what I can to help.

Thomas Dickey (dickey-his) wrote :

Unless access_log is encoded in UTF-8 (a possibility), mawk and gawk should
give the same result for that pattern.

cobo88 (cobo88) wrote :

Hi (I'm a french guy, so sorry for my english :-( )

I think I have a similar problem ....under ubuntu 9.10 and mawk 1.3.3
I write a program to check a log file (connexion POP / IMAP).
The input file was previously sort session by session (field 4, FS = :)

I found some case where my program don't detect all new/other session when I use mawk
So I decide to test my program with gawk (3.1.6) , and then it's OK

You will find below my program (only the essentiel lines), a file for testing (15 lines), and command and results for mawk and gawk

I think my program is correct, so ... May it help
-------------------------
#!awk
BEGIN {
FS=":";
savtoken="";
}
#
{
token=$4;
if (savtoken!=token)
  printf ("new session %s / %s\n", savtoken, token);
else
  printf ("same %s / %s\n", savtoken, token);
savtoken=token;
}
-----------------------

Feb 3 00:08:40 rppimap01 omapd[25320]:8e7205LA: Connect from 195.101.12.154:4843 to port 110
Feb 3 00:08:40 rppimap01 omapd[25320]:8e7205LA: Login from 195.101.12.154:
Feb 3 00:08:40 rppimap01 omapd[25320]:8e7205LA: Logout from 195.101.12.154:
Feb 3 00:08:40 rppimap01 omapd[25320]:8e7205LE: Connect from 81.80.79.239:16071 to port 110
Feb 3 00:08:40 rppimap01 omapd[25320]:8e7205LE: Login from 81.80.79.239:
Feb 3 00:08:41 rppimap01 omapd[25320]:8e7205LE: Logout from 81.80.79.239:
Feb 3 00:08:40 rppimap01 omapd[25320]:8e720721: Connect from 217.128.200.136:6159 to port 110
Feb 3 00:08:40 rppimap01 omapd[25320]:8e720721: Login from 217.128.200.136:
Feb 3 00:08:43 rppimap01 omapd[25320]:8e720721: Logout from 217.128.200.136:
Feb 3 00:08:14 rppimap01 omapd[25320]:8E720725: Connect from 195.101.4.227:4746 to port 110
Feb 3 00:08:14 rppimap01 omapd[25320]:8E720725: Login from 195.101.4.227:
Feb 3 00:08:14 rppimap01 omapd[25320]:8E720725: Logout from 195.101.4.227:
Feb 3 00:08:14 rppimap01 omapd[25320]:8E720728: Connect from 195.101.4.227:13246 to port 110
Feb 3 00:08:14 rppimap01 omapd[25320]:8E720728: Login from 195.101.4.227:
Feb 3 00:08:14 rppimap01 omapd[25320]:8E720728: Logout from 195.101.4.227:
--------------------------------
# only first 3 sessions found with mawk (not correct)
cat <myfile> | mawk -f <myawk_file>

new session / 8e7205LA
same 8e7205LA / 8e7205LA
same 8e7205LA / 8e7205LA
new session 8e7205LA / 8e7205LE
same 8e7205LE / 8e7205LE
same 8e7205LE / 8e7205LE
new session 8e7205LE / 8e720721
same 8e720721 / 8e720721
same 8e720721 / 8e720721
same 8e720721 / 8E720725
same 8E720725 / 8E720725
same 8E720725 / 8E720725
same 8E720725 / 8E720728
same 8E720728 / 8E720728
same 8E720728 / 8E720728
--------------------------------
# 5 sessions found with gawk (correct)
cat <myfile> | gawk -f <myawk_file>

new session / 8e7205LA
same 8e7205LA / 8e7205LA
same 8e7205LA / 8e7205LA
new session 8e7205LA / 8e7205LE
same 8e7205LE / 8e7205LE
same 8e7205LE / 8e7205LE
new session 8e7205LE / 8e720721
same 8e720721 / 8e720721
same 8e720721 / 8e720721
new session 8e720721 / 8E720725
same 8E720725 / 8E720725
same 8E720725 / 8E720725
new session 8E720725 / 8E720728
same 8E720728 / 8E720728
same 8E720728 / 8E720728

Thomas Dickey (dickey-his) wrote :

This test case does show something (the tokens are being parsed properly, but the
comparison is failing).

Thomas Dickey (dickey-his) wrote :

reviewing this, I see the problem, and have a fix for it

Thomas Dickey (dickey-his) wrote :

Fix was released in mawk-1.3.4-20120627

Changed in mawk (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers