UTF-8 filename causes decoding error

Bug #1737304 reported by Yan on 2017-12-09
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

In a shell
touch /tmp/mouse↑.svg
LC_CTYPE=C sbcl --eval '(directory #P"/tmp/*.*")'


debugger invoked on a SB-INT:C-STRING-DECODING-ERROR in thread
#<THREAD "main thread" RUNNING {1001950083}>:
  :ASCII c-string decoding error: the octet sequence #(226) cannot be decoded.


This causes problems in Quicklisp:

(open (format nil "/tmp/mouse~c.svg" (code-char 8593)))
fails in SBCL, but works in CCL for example.

SBCL 1.4.2
Darwin now 15.6.0 Darwin Kernel Version 15.6.0: Sun Jun 4 21:43:07 PDT 2017; root:xnu-3248.70.3~1/RELEASE_X86_64 x86_64


Jan Moringen (scymtym) wrote :

This is what happens:

1. LC_CTYPE=C tells SBCL that the system encodes characters using the ASCII character encoding. SBCL chooses the default external format, ASCII in this case, accordingly.

2. (directory #P"/tmp/*.*") requires converting (parts of the) the operating system filename in question (a sequence of octets) into a Lisp string which is done using the default external format.

3. Since the default external format is ASCII, the filename cannot be converted into a Lisp string and an error is signaled.

(open (format nil "/tmp/mouse~c.svg" (code-char 8593))) encounters the same problem but during the conversion from a Lisp string to a sequence of octets.

What did you expect to happen instead?

Yan (metayan) wrote :

Thank you for the detailed explanation.
Didn't know if it's the way it's supposed to be, since not all implementations treat it the same way.

The issue came up when I was starting SBCL on a system where LC_CTYPE=C was set as default and doing

Took me quite some time to figure out the cause, so thought it might be helpful for others to handle it in a nicer way somewhere along the line. Probably handling the error in Quicklisp and giving an informative message is the most appropriate approach.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers