sb-ext:*posix-argv* broken on Windows

Bug #1907970 reported by Timofei Shatrov on 2020-12-13
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Undecided
Unassigned

Bug Description

I'm trying to make a command line application that accepts Unicode text and I don't think the command line arguments are processed correctly

Steps to reproduce:

1. Run sbcl
2. (sb-ext:save-lisp-and-die #P"testexe.exe" :toplevel (lambda() (print sb-ext:*posix-argv*)) :executable t)
3. Run textexe.exe with random unicode text and observe very strange output.

For example:

=====
>testexe.exe 発泡ававав

("testexe.exe" ".git" ".gitattributes" ".gitignore" "data" "LICENSE")
=====

This seems like part of a directory listing of my current directory. Sometimes it just replaces characters with question marks.

I was able to reproduce this behavior on Windows 10 and Windows 7, but on Linux it seems to work ok.

Versions confirmed to have this bug:

SBCL 2.0.11 (built from source with --fancy)
SBCL 2.0.0 (official binary)

(:QUICKLISP :ASDF3.3 :ASDF3.2 :ASDF3.1 :ASDF3 :ASDF2 :ASDF :OS-WINDOWS
 :NON-BASE-CHARS-EXIST-P :ASDF-UNICODE :X86-64 :GENCGC :64-BIT :ANSI-CL
 :COMMON-LISP :IEEE-FLOATING-POINT :LITTLE-ENDIAN :PACKAGE-LOCAL-NICKNAMES
 :SB-CORE-COMPRESSION :SB-LDB :SB-PACKAGE-LOCKS :SB-SAFEPOINT
 :SB-SAFEPOINT-STRICTLY :SB-THREAD :SB-UNICODE :SBCL :WIN32)

Stas Boukarev (stassats) wrote :

That looks as if you invoked it as "exe *", are you sure your shell/terminal understand unicode?

Stas Boukarev (stassats) wrote :

Actually, it looks like "exe ?????"

If I run "echo 発泡ававав" in the same terminal, it correctly outputs the parameter. So I'm pretty sure the terminal does support Unicode. I also tried running it with uiop:run-command from SLIME (which definitely understands Unicode) with the same result.

And that's correct "testexe.exe ?????" also produces the partial listing of the directory.

Stas Boukarev (stassats) wrote :

to use wmain needs to be compiled with -municode, then everything else would need to expect wchars. CommandLineToArgvW and WideCharToMultiByte can probably simplify it.

Stas Boukarev (stassats) on 2020-12-13
Changed in sbcl:
status: New → Fix Committed

Thanks for the fix, hoping to try it out later!

I also discovered that apparently Windows 10 has a workaround for this issue, which is an option under

Language Settings/Administrative Language Settings/Change System Locale/Beta: Use Unicode UTF-8 for worldwide language support

Turning on this option (and restarting) magically makes even pre-existing executables accept Unicode text. I think it even fixes another SBCL/Unicode issue on Windows: https://bugs.launchpad.net/sbcl/+bug/1267540

Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers