String indexes are inconsistent with other awks

Bug #26603 reported by Andrew Snare
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
mawk (Ubuntu)
Confirmed
Low
Unassigned

Bug Description

The substr() function from within mawk does not function correctly. In particular, the
substr(s,i,n) form returns n-1 characters, instead of n as required. To demonstrate:

% echo "1234" | mawk '{print substr($0,0,3)}'
12

It should display "123"; this can be confirmed by using gawk instead, or trying awk on
the *BSD platforms.

Revision history for this message
Andrew Snare (ajs-deactivatedaccount) wrote :

It appears the situation is more complex than I thought; string indexing is apparently 1-based,
not 0-based as I previously thought.

The matter is summarised at: <http://lists.gnu.org/archive/html/bug-gnu-utils/2004-09/
msg00083.html>

Indeed the following works as expected:
% echo 1234 | mawk '{print substr($0,1,3)}'

It may be undesirable behaviour, but it's not a bug per se.

 - Andrew

Revision history for this message
Matt Zimmerman (mdz) wrote :

I think this is a bug, but it's not a very severe one. If string indexes are defined to start at 1, then it's not entirely unreasonable for substr(s,0,n) to behave somewhat inconsistently in different implementations

Changed in mawk:
status: Unconfirmed → Confirmed
Revision history for this message
Thomas Dickey (dickey-his) wrote :

no - as Aharon Robbins pointed out (and X/Open):

substr(s, m[, n ])
Return the at most n-character substring of s that begins at position m, numbering from 1. If n is omitted, or if n specifies more characters than are left in the string, the length of the substring shall be limited by the length of the string s.

That's undefined behavior. It would be nice to match behavior in various aspects which aren't documented, and fall outside the standard. But that's a wishlist item rather than a bug.

Revision history for this message
Gert Hulselmans (hulselmansgert) wrote :

I found a version of mawk, maintained by a new developer:
  http://invisible-island.net/mawk/

New mawk changelog (debian patches included + new things):
  http://invisible-island.net/mawk/CHANGES

This version of mawk (v0.3.4), gives the same output as gawk:

$ # mawk 0.3.3 of Ubuntu 10.10
$ echo "1234" | mawk '{print substr($0,0,3)}'
12
$ echo "1234" | gawk '{print substr($0,0,3)}'
123
$ echo "1234" | ./mawk-1.3.4-20100625/mawk '{print substr($0,0,3)}'
123

Revision history for this message
Mantas Kriaučiūnas (mantas) wrote :

mawk in Ubuntu and Debian is 18 years, see LP bug #1332114
Many bugs of Mawk are fixed in a new upstream versions 1.3.4-2010nnnn-2015nnnn, but the Debian maintainer does not want to update for an obscure reason.
We need to push here: http://bugs.debian.org/554167

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.