(command-line) cannot handle multi-byte command-line arguments

Bug #256453 reported by Derick Eddington
2
Affects Status Importance Assigned to Milestone
Ikarus Scheme
Fix Committed
High
Abdulaziz Ghuloum

Bug Description

$ cat command-line
#!/usr/bin/env scheme-script
(import (rnrs))
(write (command-line)) (newline)
$ ./command-line λ # goes berserk
Unhandled exception:
 Condition components:
   1. &interrupted
   2. &message: "received an interrupt signal"
$ ikarus -- λ
Ikarus Scheme version 0.0.3+ (revision 1579, build 2008-08-09)
Copyright (c) 2006-2008 Abdulaziz Ghuloum

> (command-line) ;; goes berserk
Unhandled exception:
 Condition components:
   1. &interrupted
   2. &message: "received an interrupt signal"
$

Note the REPL did not reset, the ikarus process terminated.

I don't yet know why it's going berserk.

There's another issue I noticed that led me to discover the above:

ikarus-main.c, line 79:

          string_set(str, i, integer_to_char(s[i]));

this will transcode the command-line arguments as Latin-1. Instead, I suggest the command-line arguments be put in bytevectors and then in Scheme code the bytevectors are transcoded using (native-transcoder). I'll try it out.

Related branches

Revision history for this message
Derick Eddington (derick-eddington) wrote : Re: [Bug 256453] [NEW] (command-line) cannot handle multi-byte command-line arguments
Download full text (3.6 KiB)

On Sat, 2008-08-09 at 20:05 +0000, Derick Eddington wrote:
> $ ikarus -- λ
> Ikarus Scheme version 0.0.3+ (revision 1579, build 2008-08-09)
> Copyright (c) 2006-2008 Abdulaziz Ghuloum
>
> > (command-line) ;; goes berserk
> Unhandled exception:
> Condition components:
> 1. &interrupted
> 2. &message: "received an interrupt signal"
> $
>
> Note the REPL did not reset, the ikarus process terminated.
>
> I don't yet know why it's going berserk.

It's going berserk because integer_to_char(s[i]) returns a word with the
high bits set because s[i] is a char and integer_to_char casts it to
int, so for λ it's doing:

  ce => ffffce0f
  bb => ffffbb0f

When I make this change, it does not go berserk (but still transcodes as
Latin-1):

=== modified file 'src/ikarus-main.c'
--- src/ikarus-main.c 2008-08-09 12:47:44 +0000
+++ src/ikarus-main.c 2008-08-10 01:46:33 +0000
@@ -76,7 +76,7 @@
       {
         int i;
         for(i=0; i<n; i++){
- string_set(str, i, integer_to_char(s[i]));
+ string_set(str, i, integer_to_char((unsigned char)s[i]));
         }
       }
       ikptr p = ik_unsafe_alloc(pcb, pair_size);

$ ./command-line λ
("./command-line" "Î\xBB;")

I didn't try to figure out if those high bits are revealing another bug
somewhere in Ikarus's character / string processing or if they are just
breaking an invariant.

Here's the strace of the berserking allocating a ton of memory:

execve("/home/d/bin/ikarus", ["ikarus", "--r6rs-script", "command-line.sps", "\316\273"], [/*
[... start-up stuff ...]
open("command-line.sps", O_RDONLY) = 3
read(3, "(import (rnrs))\n(write (command-"..., 16384) = 49
read(3, "", 16384) = 0
close(3) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb75c2000
mmap2(NULL, 4194304, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb71c2000
mmap2(NULL, 4194304, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb6916000
mmap2(NULL, 24576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb71bc000
munmap(0xb6ff8000, 20480) = 0
mmap2(NULL, 24576, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb71b6000
munmap(0xb70c7000, 20480) = 0
[... much more ...]
mmap2(NULL, 843776, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x84afa000
munmap(0x84fc8000, 835584) = 0
mmap2(NULL, 843776, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb70f4000
munmap(0x8e787000, 835584) = 0
mmap2(NULL, 4194304, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x846fa000

Also, this wasn't necessary to fix the berserking, and I don't
completely understand C casting, and I didn't look for other similar
casting spots, but it seems like these types should be something more
like this:

=== modified file 'src/ikarus-data.h'
--- src/ikarus-data.h 2008-07-20 07:14:09 +0000
+++ src/ikarus-data.h 2008-08-10 01:34:13 +0000
@@ -91,7 +91,7 @@

 typedef unsigned long int ikptr;

-typedef int ikchar;
+typedef ikptr ikchar;

 void ik_error(ik...

Read more...

Revision history for this message
Derick Eddington (derick-eddington) wrote :

On Sat, 2008-08-09 at 20:05 +0000, Derick Eddington wrote:
> $ ikarus -- λ
> Ikarus Scheme version 0.0.3+ (revision 1579, build 2008-08-09)
> Copyright (c) 2006-2008 Abdulaziz Ghuloum
>
> > (command-line) ;; goes berserk
> Unhandled exception:
> Condition components:
> 1. &interrupted
> 2. &message: "received an interrupt signal"
> $
>
> Note the REPL did not reset, the ikarus process terminated.

Why is that? The exception handler for the dynamic extent of the eval
of (command-line) is the cafe handler which should reset when it catches
the &serious &interrupted; is another handler being installed later?
It's getting printed but the reset doesn't happen... Maybe it's in C
code when the SIGINT is delivered, but it's obviously the Scheme code
raising and that Scheme code should have the cafe handler.

Revision history for this message
Derick Eddington (derick-eddington) wrote :

On Sat, 2008-08-09 at 20:05 +0000, Derick Eddington wrote:
> There's another issue I noticed that led me to discover the above:
>
> ikarus-main.c, line 79:
>
> string_set(str, i, integer_to_char(s[i]));
>
> this will transcode the command-line arguments as Latin-1. Instead, I
> suggest the command-line arguments be put in bytevectors and then in
> Scheme code the bytevectors are transcoded using (native-transcoder).
> I'll try it out.

Here's what I've done:

[This requires manually running makefile.ss with an old/working ikarus
to build a new ikarus.boot because the old initialization expression for
command-line-arguments in ikarus.boot.4.prebuilt does not work with the
new ikarus process; in fact it segfaults when the guard for
command-line-arguments attempts to call die, and this patch also fixes
that (rare) issue.]

=== modified file 'scheme/ikarus.command-line.ss'
--- scheme/ikarus.command-line.ss 2008-01-29 05:34:34 +0000
+++ scheme/ikarus.command-line.ss 2008-08-10 03:45:25 +0000
@@ -22,10 +22,12 @@

   (define (command-line) (command-line-arguments))
   (define command-line-arguments
- (make-parameter ($arg-list)
+ (make-parameter
+ (map (lambda (bv)
+ (bytevector->string bv (native-transcoder)))
+ ($arg-list))
       (lambda (x)
         (if (and (list? x) (andmap string? x))
             x
- (die 'command-list
- "invalid command-line-arguments ~s\n" x))))))
+ (die 'command-line-arguments "not a list of strings" x))))))

=== modified file 'scheme/makefile.ss'
--- scheme/makefile.ss 2008-08-08 15:29:18 +0000
+++ scheme/makefile.ss 2008-08-10 04:02:57 +0000
@@ -74,7 +74,6 @@
     "ikarus.numerics.ss"
     "ikarus.conditions.ss"
     "ikarus.guardians.ss"
- "ikarus.command-line.ss"
     "ikarus.codecs.ss"
     "ikarus.bytevectors.ss"
     "ikarus.posix.ss"
@@ -105,6 +104,7 @@
     "ikarus.promises.ss"
     "ikarus.enumerations.ss"
     "ikarus.not-yet-implemented.ss"
+ "ikarus.command-line.ss"
     "ikarus.main.ss"
     ))

=== modified file 'src/ikarus-main.c'
--- src/ikarus-main.c 2008-08-09 12:47:44 +0000
+++ src/ikarus-main.c 2008-08-10 03:02:00 +0000
@@ -70,17 +70,12 @@
     while(i > 0){
       char* s = argv[i];
       int n = strlen(s);
- ikptr str = ik_unsafe_alloc(pcb, align(n*string_char_size+disp_string_data+1))
- + string_tag;
- ref(str, off_string_length) = fix(n);
- {
- int i;
- for(i=0; i<n; i++){
- string_set(str, i, integer_to_char(s[i]));
- }
- }
+ ikptr bv = ik_unsafe_alloc(pcb, align(disp_bytevector_data+n+1))
+ + bytevector_tag;
+ ref(bv, off_bytevector_length) = fix(n);
+ memcpy((char*)(bv+off_bytevector_data), s, n+1);
       ikptr p = ik_unsafe_alloc(pcb, pair_size);
- ref(p, disp_car) = str;
+ ref(p, disp_car) = bv;
       ref(p, disp_cdr) = arg_list;
       arg_list = p+pair_tag;
       i--;

$ ./command-line ქართული 한국어 Ελληνικ
("./command-line" "ქართული" "한국어" "Ελληνικ")

Revision history for this message
Abdulaziz Ghuloum (aghuloum) wrote :

Will apply the fix.

Changed in ikarus:
assignee: nobody → aghuloum
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Abdulaziz Ghuloum (aghuloum) wrote :

Applied the patch in revision 1583. Thanks a bunch.

Changed in ikarus:
status: Confirmed → Fix Committed
Changed in ikarus:
milestone: none → 0.0.4
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.