UTF-8 filename causes decoding error

Bug #1737304 reported by Yan
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
SBCL
Fix Released
Undecided
Unassigned

Bug Description

In a shell
touch /tmp/mouse↑.svg
LC_CTYPE=C sbcl --eval '(directory #P"/tmp/*.*")'

causes

debugger invoked on a SB-INT:C-STRING-DECODING-ERROR in thread
#<THREAD "main thread" RUNNING {1001950083}>:
  :ASCII c-string decoding error: the octet sequence #(226) cannot be decoded.

(SB-IMPL::READ-FROM-C-STRING/ASCII #.(SB-SYS:INT-SAP #X028005FD) CHARACTER)

This causes problems in Quicklisp:
https://github.com/quicklisp/quicklisp-client/issues/152

Also
(open (format nil "/tmp/mouse~c.svg" (code-char 8593)))
fails in SBCL, but works in CCL for example.

Versions:
SBCL 1.4.2
SBCL 1.4.2.137-26f361f4a
on
Darwin now 15.6.0 Darwin Kernel Version 15.6.0: Sun Jun 4 21:43:07 PDT 2017; root:xnu-3248.70.3~1/RELEASE_X86_64 x86_64

(CFFI-FEATURES:FLAT-NAMESPACE CFFI-FEATURES:X86-64 CFFI-FEATURES:UNIX
 CFFI-FEATURES:DARWIN :CFFI CFFI-SYS::FLAT-NAMESPACE
 ALEXANDRIA.0.DEV::SEQUENCE-EMPTYP :QUICKLISP :ASDF3.3 :ASDF3.2 :ASDF3.1 :ASDF3
 :ASDF2 :ASDF :OS-MACOSX :OS-UNIX :NON-BASE-CHARS-EXIST-P :ASDF-UNICODE :64-BIT
 :64-BIT-REGISTERS :ALIEN-CALLBACKS :ANSI-CL :ASH-RIGHT-VOPS :BSD
 :C-STACK-IS-CONTROL-STACK :COMMON-LISP :COMPACT-INSTANCE-HEADER
 :COMPARE-AND-SWAP-VOPS :COMPLEX-FLOAT-VOPS :CYCLE-COUNTER :DARWIN
 :DARWIN9-OR-BETTER :FLOAT-EQL-VOPS :FP-AND-PC-STANDARD-SAVE :GENCGC
 :IEEE-FLOATING-POINT :IMMOBILE-CODE :IMMOBILE-SPACE :INLINE-CONSTANTS :INODE64
 :INTEGER-EQL-VOP :LINKAGE-TABLE :LITTLE-ENDIAN :MACH-EXCEPTION-HANDLER :MACH-O
 :MEMORY-BARRIER-VOPS :MULTIPLY-HIGH-VOPS :OS-PROVIDES-BLKSIZE-T
 :OS-PROVIDES-DLADDR :OS-PROVIDES-DLOPEN :OS-PROVIDES-PUTWC
 :OS-PROVIDES-SUSECONDS-T :PACKAGE-LOCAL-NICKNAMES :PRECISE-ARG-COUNT-ERROR
 :RAW-INSTANCE-INIT-VOPS :RAW-SIGNED-WORD :RELOCATABLE-HEAP
 :SB-CORE-COMPRESSION :SB-DOC :SB-EVAL :SB-LDB :SB-PACKAGE-LOCKS :SB-SIMD-PACK
 :SB-SOURCE-LOCATIONS :SB-THREAD :SB-THREAD :SB-UNICODE :SB-XREF-INTERNAL :SBCL
 :STACK-ALLOCATABLE-CLOSURES :STACK-ALLOCATABLE-FIXED-OBJECTS
 :STACK-ALLOCATABLE-LISTS :STACK-ALLOCATABLE-VECTORS
 :STACK-GROWS-DOWNWARD-NOT-UPWARD :SYMBOL-INFO-VOPS :UD2-BREAKPOINTS
 :UNBIND-N-VOP :UNDEFINED-FUN-RESTARTS :UNIX :UNWIND-TO-FRAME-AND-CALL-VOP
 :X86-64)

Revision history for this message
Jan Moringen (scymtym) wrote :

This is what happens:

1. LC_CTYPE=C tells SBCL that the system encodes characters using the ASCII character encoding. SBCL chooses the default external format, ASCII in this case, accordingly.

2. (directory #P"/tmp/*.*") requires converting (parts of the) the operating system filename in question (a sequence of octets) into a Lisp string which is done using the default external format.

3. Since the default external format is ASCII, the filename cannot be converted into a Lisp string and an error is signaled.

(open (format nil "/tmp/mouse~c.svg" (code-char 8593))) encounters the same problem but during the conversion from a Lisp string to a sequence of octets.

What did you expect to happen instead?

Revision history for this message
Yan (metayan) wrote :

Thank you for the detailed explanation.
Didn't know if it's the way it's supposed to be, since not all implementations treat it the same way.

The issue came up when I was starting SBCL on a system where LC_CTYPE=C was set as default and doing
(ql:register-local-projects)

Took me quite some time to figure out the cause, so thought it might be helpful for others to handle it in a nicer way somewhere along the line. Probably handling the error in Quicklisp and giving an informative message is the most appropriate approach.

Revision history for this message
Stas Boukarev (stassats) wrote :

LC_CTYPE no longer affects sbcl.

Changed in sbcl:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.