String indexes are inconsistent with other awks

Bug #26603 reported by Andrew Snare on 2005-12-05
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
mawk (Ubuntu)
Low
Unassigned

Bug Description

The substr() function from within mawk does not function correctly. In particular, the
substr(s,i,n) form returns n-1 characters, instead of n as required. To demonstrate:

% echo "1234" | mawk '{print substr($0,0,3)}'
12

It should display "123"; this can be confirmed by using gawk instead, or trying awk on
the *BSD platforms.

It appears the situation is more complex than I thought; string indexing is apparently 1-based,
not 0-based as I previously thought.

The matter is summarised at: <http://lists.gnu.org/archive/html/bug-gnu-utils/2004-09/
msg00083.html>

Indeed the following works as expected:
% echo 1234 | mawk '{print substr($0,1,3)}'

It may be undesirable behaviour, but it's not a bug per se.

 - Andrew

Matt Zimmerman (mdz) wrote :

I think this is a bug, but it's not a very severe one. If string indexes are defined to start at 1, then it's not entirely unreasonable for substr(s,0,n) to behave somewhat inconsistently in different implementations

Changed in mawk:
status: Unconfirmed → Confirmed
Thomas Dickey (dickey-his) wrote :

no - as Aharon Robbins pointed out (and X/Open):

substr(s, m[, n ])
Return the at most n-character substring of s that begins at position m, numbering from 1. If n is omitted, or if n specifies more characters than are left in the string, the length of the substring shall be limited by the length of the string s.

That's undefined behavior. It would be nice to match behavior in various aspects which aren't documented, and fall outside the standard. But that's a wishlist item rather than a bug.

I found a version of mawk, maintained by a new developer:
  http://invisible-island.net/mawk/

New mawk changelog (debian patches included + new things):
  http://invisible-island.net/mawk/CHANGES

This version of mawk (v0.3.4), gives the same output as gawk:

$ # mawk 0.3.3 of Ubuntu 10.10
$ echo "1234" | mawk '{print substr($0,0,3)}'
12
$ echo "1234" | gawk '{print substr($0,0,3)}'
123
$ echo "1234" | ./mawk-1.3.4-20100625/mawk '{print substr($0,0,3)}'
123

Mantas Kriaučiūnas (mantas) wrote :

mawk in Ubuntu and Debian is 18 years, see LP bug #1332114
Many bugs of Mawk are fixed in a new upstream versions 1.3.4-2010nnnn-2015nnnn, but the Debian maintainer does not want to update for an obscure reason.
We need to push here: http://bugs.debian.org/554167

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.