mawk text-match count inconsistency

Bug #485574 reported by solidus126
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
mawk (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: mawk

Hello all. I experienced a problem when I was trying to use a vanilla install of 9.10's mawk to process an Apache log file for a total count of 404's and 500 error's from field 9 of the log file. The log file's name is "access_log", and here is the code I ran:

mawk '$9 == 404 { count++ } END { print count }' access_log
mawk '$9 == 500 { count++ } END { print count }' access_log

Here is the requested information from the bug report's request:

1) I am using Ubuntu 9.10 Netbook Remix and on another machine, 9.10 Gnome (Regular).
2) The package version of mawk as reported by synaptic package manager is: 1.3.3-15-ubuntu.
3) I expected to get a result of 137 hits on field 9 for 500, 34167 hits for 404. This is what a friend of mine got (and I subsequently got) after running the same code under gawk 1:3.1.6.dfsg-0ubuntu2.
4) Results were 58 hits for 500, and 19093 hits for 404 when using mawk.

I am fairly new to bug reporting, at least to Ubuntu, but I will try to do what I can to help.

Tags: awk mawk
Revision history for this message
Thomas Dickey (dickey-his) wrote :

Unless access_log is encoded in UTF-8 (a possibility), mawk and gawk should
give the same result for that pattern.

Revision history for this message
cobo88 (cobo88) wrote :

Hi (I'm a french guy, so sorry for my english :-( )

I think I have a similar problem ....under ubuntu 9.10 and mawk 1.3.3
I write a program to check a log file (connexion POP / IMAP).
The input file was previously sort session by session (field 4, FS = :)

I found some case where my program don't detect all new/other session when I use mawk
So I decide to test my program with gawk (3.1.6) , and then it's OK

You will find below my program (only the essentiel lines), a file for testing (15 lines), and command and results for mawk and gawk

I think my program is correct, so ... May it help
-------------------------
#!awk
BEGIN {
FS=":";
savtoken="";
}
#
{
token=$4;
if (savtoken!=token)
  printf ("new session %s / %s\n", savtoken, token);
else
  printf ("same %s / %s\n", savtoken, token);
savtoken=token;
}
-----------------------

Feb 3 00:08:40 rppimap01 omapd[25320]:8e7205LA: Connect from 195.101.12.154:4843 to port 110
Feb 3 00:08:40 rppimap01 omapd[25320]:8e7205LA: Login from 195.101.12.154:
Feb 3 00:08:40 rppimap01 omapd[25320]:8e7205LA: Logout from 195.101.12.154:
Feb 3 00:08:40 rppimap01 omapd[25320]:8e7205LE: Connect from 81.80.79.239:16071 to port 110
Feb 3 00:08:40 rppimap01 omapd[25320]:8e7205LE: Login from 81.80.79.239:
Feb 3 00:08:41 rppimap01 omapd[25320]:8e7205LE: Logout from 81.80.79.239:
Feb 3 00:08:40 rppimap01 omapd[25320]:8e720721: Connect from 217.128.200.136:6159 to port 110
Feb 3 00:08:40 rppimap01 omapd[25320]:8e720721: Login from 217.128.200.136:
Feb 3 00:08:43 rppimap01 omapd[25320]:8e720721: Logout from 217.128.200.136:
Feb 3 00:08:14 rppimap01 omapd[25320]:8E720725: Connect from 195.101.4.227:4746 to port 110
Feb 3 00:08:14 rppimap01 omapd[25320]:8E720725: Login from 195.101.4.227:
Feb 3 00:08:14 rppimap01 omapd[25320]:8E720725: Logout from 195.101.4.227:
Feb 3 00:08:14 rppimap01 omapd[25320]:8E720728: Connect from 195.101.4.227:13246 to port 110
Feb 3 00:08:14 rppimap01 omapd[25320]:8E720728: Login from 195.101.4.227:
Feb 3 00:08:14 rppimap01 omapd[25320]:8E720728: Logout from 195.101.4.227:
--------------------------------
# only first 3 sessions found with mawk (not correct)
cat <myfile> | mawk -f <myawk_file>

new session / 8e7205LA
same 8e7205LA / 8e7205LA
same 8e7205LA / 8e7205LA
new session 8e7205LA / 8e7205LE
same 8e7205LE / 8e7205LE
same 8e7205LE / 8e7205LE
new session 8e7205LE / 8e720721
same 8e720721 / 8e720721
same 8e720721 / 8e720721
same 8e720721 / 8E720725
same 8E720725 / 8E720725
same 8E720725 / 8E720725
same 8E720725 / 8E720728
same 8E720728 / 8E720728
same 8E720728 / 8E720728
--------------------------------
# 5 sessions found with gawk (correct)
cat <myfile> | gawk -f <myawk_file>

new session / 8e7205LA
same 8e7205LA / 8e7205LA
same 8e7205LA / 8e7205LA
new session 8e7205LA / 8e7205LE
same 8e7205LE / 8e7205LE
same 8e7205LE / 8e7205LE
new session 8e7205LE / 8e720721
same 8e720721 / 8e720721
same 8e720721 / 8e720721
new session 8e720721 / 8E720725
same 8E720725 / 8E720725
same 8E720725 / 8E720725
new session 8E720725 / 8E720728
same 8E720728 / 8E720728
same 8E720728 / 8E720728

Revision history for this message
Thomas Dickey (dickey-his) wrote :

This test case does show something (the tokens are being parsed properly, but the
comparison is failing).

Revision history for this message
Thomas Dickey (dickey-his) wrote :

reviewing this, I see the problem, and have a fix for it

Revision history for this message
Thomas Dickey (dickey-his) wrote :

Fix was released in mawk-1.3.4-20120627

Changed in mawk (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.