Comment 120 for bug 580961

ryou ezoe (boostcpp) wrote :

I found a bug in this iconv patch: 04-unzip60-alt-iconv-utf8.
The problem is, this patch allocate buffer( which is for storing converted string ) twice the size of source string plus one byte.
As seen in line 81-84 of the patch.

+ slen = strlen(string);
+ s = string;
+ dlen = buflen = 2*slen;
+ d = buf = malloc(buflen + 1);

This cause conversion fails for some cases.
Because, in some character encodings, it requires more than twice the storage to represent a given character in other encodings(especially UTF-8, Ubuntu's default encoding).

For example, There are characters HALFWIDTH KATAKANA LETTER.
In SHIFT_JIS and CP932 encoding, halfwidth katakana letters are represented in one octet.
But, in UTF-8, it requires three octets.

For example,
'ア' ( U+FF71: HALFWIDTH KATAKANA LETTER A)
is encoded to 0xB1 in Shift_JIS and CP932.
This is one octet.
But in UTF-8, it is encoded to 0xEF, 0xBD, 0xB1.
This is three octets.

So, because current unzip just allocate twice the size of source string for buffer, it fails to handle zip file containing a file name consisting all or a lot of half width katakana letter.

I suggest to change the size of buffer, four times the size of source string plus one byte.
Because, Ubuntu's default encoding is UTF-8 and the largest valid UTF-8 sequence of one character is 4 octet.

replace the line 83 of 04-unzip60-alt-iconv-utf8 to the following:
+ dlen = buflen = 4*slen;