sb-bsd-sockets:get-host-by-name fails to resolve hostname with arabic characters

Bug #1380081 reported by jesse.alama@gmail.com
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Triaged
Undecided
Jan Moringen

Bug Description

(sb-bsd-sockets:get-host-by-name "عربي.امارات")

raises SB-BSD-SOCKETS:HOST-NOT-FOUND-ERROR. But this is an acceptable domain; Safari 7.1 on Mac OS X 10.9 can handle

http://عربي.امارات

just fine.

$ sbcl --version
SBCL 1.2.2

$ uname -a
Darwin jesse-alamas-macbook-pro.fritz.box 13.4.0 Darwin Kernel Version 13.4.0: Sun Aug 17 19:50:11 PDT 2014; root:xnu-2422.115.4~1/RELEASE_X86_64 x86_64 i386 MacBookPro6,2 Darwin

CL-USER> *features*
(:CL-WHO :HUNCHENTOOT :SBCL-DEBUG-PRINT-VARIABLE-ALIST :CL-FAD
 :BORDEAUX-THREADS CFFI-FEATURES:FLAT-NAMESPACE CFFI-FEATURES:X86-64
 CFFI-FEATURES:UNIX CFFI-FEATURES:DARWIN :CFFI CFFI-SYS::FLAT-NAMESPACE
 :CL-PPCRE :FLEXI-STREAMS :CHUNGA :5AM :THREAD-SUPPORT
 CHIPZ-SYSTEM:GRAY-STREAMS :RUNE-IS-CHARACTER :SWANK :QUICKLISP :ASDF3 :ASDF2
 :ASDF :OS-UNIX :NON-BASE-CHARS-EXIST-P :ASDF-UNICODE :ALIEN-CALLBACKS :ANSI-CL
 :ASH-RIGHT-VOPS :BSD :C-STACK-IS-CONTROL-STACK :COMMON-LISP
 :COMPARE-AND-SWAP-VOPS :COMPLEX-FLOAT-VOPS :CYCLE-COUNTER :DARWIN
 :DARWIN9-OR-BETTER :FLOAT-EQL-VOPS :GENCGC :IEEE-FLOATING-POINT
 :INLINE-CONSTANTS :INODE64 :LINKAGE-TABLE :LITTLE-ENDIAN
 :MACH-EXCEPTION-HANDLER :MACH-O :MEMORY-BARRIER-VOPS :MULTIPLY-HIGH-VOPS
 :OS-PROVIDES-BLKSIZE-T :OS-PROVIDES-DLADDR :OS-PROVIDES-DLOPEN
 :OS-PROVIDES-PUTWC :OS-PROVIDES-SUSECONDS-T :PACKAGE-LOCAL-NICKNAMES
 :RAW-INSTANCE-INIT-VOPS :SB-DOC :SB-EVAL :SB-LDB :SB-PACKAGE-LOCKS
 :SB-SIMD-PACK :SB-SOURCE-LOCATIONS :SB-TEST :SB-THREAD :SB-UNICODE :SBCL
 :STACK-ALLOCATABLE-CLOSURES :STACK-ALLOCATABLE-FIXED-OBJECTS
 :STACK-ALLOCATABLE-LISTS :STACK-ALLOCATABLE-VECTORS
 :STACK-GROWS-DOWNWARD-NOT-UPWARD :SYMBOL-INFO-VOPS :UD2-BREAKPOINTS :UNIX
 :UNWIND-TO-FRAME-AND-CALL-VOP :X86-64)

Jan Moringen (scymtym)
tags: added: sb-bsd-sockets
Jan Moringen (scymtym)
summary: - arabic
+ sb-bsd-sockets:get-host-by-name fails to resolve hostname with arabic
+ characters
Jan Moringen (scymtym)
Changed in sbcl:
assignee: nobody → Jan Moringen (scymtym)
status: New → In Progress
Revision history for this message
James Y Knight (foom) wrote :

Is this really a bug? Domain names actually cannot have unicode characters in them, after all. Clients (e.g. the web browser) are generally expected to do IDN punycode conversion, aren't they?

Revision history for this message
Jan Moringen (scymtym) wrote :

The current behavior of first accepting non-ASCII strings but then signaling a HOST-NOT-FOUND-ERROR is at least misleading if not an error.

The easiest fix would be changing GET-HOST-BY-NAME to only accept ASCII strings and expecting clients to do the IDN toASCII conversion.

The approach I'm currently trying consists in adding an SB-IDNA package providing a TO-ASCII function. GET-HOST-BY-NAME could call this function when necessary. This would be similar to the AI_IDN flag in glibc's getaddrinfo().

A third approach could be adding SB-IDNA but never automatically performing the conversion in GET-HOST-BY-NAME.

Implementing toASCII in SBCL has only recently become possible due to the addition of the functions in the SB-UNICODE package.

Revision history for this message
Douglas Katzman (dougk) wrote :

can't be in-progress for 6 years

Changed in sbcl:
status: In Progress → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.