UTF-8 filename causes decoding error

Bug #1737304 reported by Yan on 2017-12-09
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Undecided
Unassigned

Bug Description

In a shell
touch /tmp/mouse↑.svg
LC_CTYPE=C sbcl --eval '(directory #P"/tmp/*.*")'

causes

debugger invoked on a SB-INT:C-STRING-DECODING-ERROR in thread
#<THREAD "main thread" RUNNING {1001950083}>:
  :ASCII c-string decoding error: the octet sequence #(226) cannot be decoded.

(SB-IMPL::READ-FROM-C-STRING/ASCII #.(SB-SYS:INT-SAP #X028005FD) CHARACTER)

This causes problems in Quicklisp:
https://github.com/quicklisp/quicklisp-client/issues/152

Also
(open (format nil "/tmp/mouse~c.svg" (code-char 8593)))
fails in SBCL, but works in CCL for example.

Versions:
SBCL 1.4.2
SBCL 1.4.2.137-26f361f4a
on
Darwin now 15.6.0 Darwin Kernel Version 15.6.0: Sun Jun 4 21:43:07 PDT 2017; root:xnu-3248.70.3~1/RELEASE_X86_64 x86_64

(CFFI-FEATURES:FLAT-NAMESPACE CFFI-FEATURES:X86-64 CFFI-FEATURES:UNIX
 CFFI-FEATURES:DARWIN :CFFI CFFI-SYS::FLAT-NAMESPACE
 ALEXANDRIA.0.DEV::SEQUENCE-EMPTYP :QUICKLISP :ASDF3.3 :ASDF3.2 :ASDF3.1 :ASDF3
 :ASDF2 :ASDF :OS-MACOSX :OS-UNIX :NON-BASE-CHARS-EXIST-P :ASDF-UNICODE :64-BIT
 :64-BIT-REGISTERS :ALIEN-CALLBACKS :ANSI-CL :ASH-RIGHT-VOPS :BSD
 :C-STACK-IS-CONTROL-STACK :COMMON-LISP :COMPACT-INSTANCE-HEADER
 :COMPARE-AND-SWAP-VOPS :COMPLEX-FLOAT-VOPS :CYCLE-COUNTER :DARWIN
 :DARWIN9-OR-BETTER :FLOAT-EQL-VOPS :FP-AND-PC-STANDARD-SAVE :GENCGC
 :IEEE-FLOATING-POINT :IMMOBILE-CODE :IMMOBILE-SPACE :INLINE-CONSTANTS :INODE64
 :INTEGER-EQL-VOP :LINKAGE-TABLE :LITTLE-ENDIAN :MACH-EXCEPTION-HANDLER :MACH-O
 :MEMORY-BARRIER-VOPS :MULTIPLY-HIGH-VOPS :OS-PROVIDES-BLKSIZE-T
 :OS-PROVIDES-DLADDR :OS-PROVIDES-DLOPEN :OS-PROVIDES-PUTWC
 :OS-PROVIDES-SUSECONDS-T :PACKAGE-LOCAL-NICKNAMES :PRECISE-ARG-COUNT-ERROR
 :RAW-INSTANCE-INIT-VOPS :RAW-SIGNED-WORD :RELOCATABLE-HEAP
 :SB-CORE-COMPRESSION :SB-DOC :SB-EVAL :SB-LDB :SB-PACKAGE-LOCKS :SB-SIMD-PACK
 :SB-SOURCE-LOCATIONS :SB-THREAD :SB-THREAD :SB-UNICODE :SB-XREF-INTERNAL :SBCL
 :STACK-ALLOCATABLE-CLOSURES :STACK-ALLOCATABLE-FIXED-OBJECTS
 :STACK-ALLOCATABLE-LISTS :STACK-ALLOCATABLE-VECTORS
 :STACK-GROWS-DOWNWARD-NOT-UPWARD :SYMBOL-INFO-VOPS :UD2-BREAKPOINTS
 :UNBIND-N-VOP :UNDEFINED-FUN-RESTARTS :UNIX :UNWIND-TO-FRAME-AND-CALL-VOP
 :X86-64)

Jan Moringen (scymtym) wrote :

This is what happens:

1. LC_CTYPE=C tells SBCL that the system encodes characters using the ASCII character encoding. SBCL chooses the default external format, ASCII in this case, accordingly.

2. (directory #P"/tmp/*.*") requires converting (parts of the) the operating system filename in question (a sequence of octets) into a Lisp string which is done using the default external format.

3. Since the default external format is ASCII, the filename cannot be converted into a Lisp string and an error is signaled.

(open (format nil "/tmp/mouse~c.svg" (code-char 8593))) encounters the same problem but during the conversion from a Lisp string to a sequence of octets.

What did you expect to happen instead?

Yan (metayan) wrote :

Thank you for the detailed explanation.
Didn't know if it's the way it's supposed to be, since not all implementations treat it the same way.

The issue came up when I was starting SBCL on a system where LC_CTYPE=C was set as default and doing
(ql:register-local-projects)

Took me quite some time to figure out the cause, so thought it might be helpful for others to handle it in a nicer way somewhere along the line. Probably handling the error in Quicklisp and giving an informative message is the most appropriate approach.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers