Physical pathname tilde-home conventions are somewhat error-prone

Bug #1790330 reported by Richard M Kreuter
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
SBCL
New
Undecided
Unassigned

Bug Description

There are at least a couple pitfalls in how SBCL handles leading tildes as ``home directories''.

1. Home directories are represented as the second element of the directory list, e.g.,

* (pathname-directory "~/foo.lisp")

(:ABSOLUTE :HOME)
* (pathname-directory "~root/foo.lisp")

(:ABSOLUTE (:HOME "root"))

This has the unfortunate consequence that pathname algebra becomes harder to work with. For example, functions such as

  (defun relativize-pathname (p)
    (make-pathname :directory (cons :relative (rest (pathname-directory p)))
                   :defaults p))

  ;; You might want RELATIVIZE-PATHNAME in order to define something like
  (defun re-root-pathname (p &optional (r *default-pathname-defaults*))
    (merge-pathnames (relativize-pathname p) r))

  (re-root-pathname "~/foo.lisp")
  => #<PATHNAME (with no namestring)
                :HOST #<SB-IMPL::UNIX-HOST {10002F9383}>
                :DEVICE NIL
                :DIRECTORY (:ABSOLUTE "home" "me" :HOME)
                :NAME "foo"
                :TYPE "lisp"
                :VERSION :NEWEST>

It's useful of a pathnames implementation if it permits a user to reason that the set of pathnames that can be used to name a file (in SBCL parlance, those that have a ``native namestring'') are closed under some algebra. (I don't propose that ANSI implies that a pathnames implementation must offer invariants that make such reasoning possible -- it's way too broken for that -- but rather that such invariants, albeit highly implementation-depedendent, are good for users.)

2. Another invariant that SBCL has historically aspired to is namestring/parse-namestring equivalence, i.e.,

  (equal p (parse-namestring (namestring p) nil p))

or thereabouts (except where ANSI has been interpreted to prohibit such equivalences).

The handling of tildes in directory component elements loses at this in general:

--------
* (let* ((p1 (make-pathname :directory '(:relative "~root")))
         (p2 (parse-namestring (namestring p1))))
  (print p1)
  (print p2)
  (equal p1 p2))

#P"~root/"
#P"~root/"
NIL
--------

Analysis:

A. First, just to get it out of the way, let's stipulate that for the purpose of discussion 19.2.2.5 ``An implementation might support other values for some components'' means that what SBCL does here is conforming (I guess that's the rationale/rationalization/defense for why PATTERNs are conforming, too), despite the seeming inconsistency with all subsections of 19.2.2.4 whose titles begin with the word ``Restrictions''. (It's a hard to say how to reconcile those ``Restrictions'' sections with 19.2.2.5, as the latter appears to imply that there are no restrictions on what a program might read in any component.)

So this is just a question of whether SBCL DTRT in this case.

B. One way to think about point 1 is that the particular representation chosen for the parse of ``user home directory'' syntax, (:ABSOLUTE :HOME) or (:ABSOLUTE (:HOME <string>)) is unnecessarily general: there's probably no such concept as a ``relative user home directory''; the concept of the home directory is implicitly absolute (on every system where SBCL runs, anyway). So having the directory list start with :HOME or (:HOME <string>) would have been a better extension. [It would be important to thread that change through, and have MERGE-PATHNAMES and maybe other things recognize a home directory as semantically absolute. And while having the directory list start with something other than :ABSOLUTE or :RELATIVE might be surprising to users or programs, it might be better to force the surprise in the CAR of the directory list rather than allow it to get passed around in the rest of the list, as shown above.]

C. However I wonder whether a distinguished representation of the parse of user home directories is necessary or all that useful.

C.1 For the purposes of having a convenient input notation, it would suffice to resolve the tilde notation at parse time, e.g., to have

  (pathname "~/foo.lisp")
  => #P"/home/me/foo.lisp"

It appears that (some random versions of) Allegro, CCL, Clisp, and ECL do just this.

C.2 For the purpose of having an aesthetic output notation, it would have sufficed to offer an I/O control variable to govern the abbreviation of a home directory with a tilde. (I'm not aware of any implementation doing this. Of course this could be a pretty-printer function rather than baked into the namestring machinery.)

C.3 Unless I'm mistaken, the only other benefit is that you can dump such a pathname into a fasl or core and have it denote appropriately on another machine. But that's something you ought to be able to do explicitly and portably with ENOUGH-NAMESTRING and USER-HOMEDIR-PATHNAME (and something you *must* do to achieve that dynamism on other implementations). So as a user I'm not sure how to weigh this one benefit compared with having to be prepared to handle :HOME and (:HOME ...) in programs.

Consequently, I believe that doing away with the distinguished representation of ``home directories'' and and resolving the tilde notation during PARSE-NAMESTRING is probably the best way to implement the convenient input notation.

D. Problem 2 gets to the detail that if the implementation recognizes tilde notation in the directory portion of a namestring, then some escaping is necessary in namestring production, as tilde counts as a ``special'' character.

I believe, but I'm not completely certain, that it's sufficent to escape any initial tilde of a string occurring in a directory list in namestring production, so that, hypothetically,

  (namestring (make-pathname :directory '(:relative "~root")))
  => "\\~root/"

This problem is orthogonal to problem 1, i.e., SBCL ought to do this escaping howsoever it parses tildes in namestrings.

Bug filing boilerplate:

$ uname -a
Darwin m5.localdomain 14.5.0 Darwin Kernel Version 14.5.0: Wed Jul 29 02:26:53 PDT 2015; root:xnu-2782.40.9~1/RELEASE_X86_64 x86_64
$ sh ./run-sbcl.sh --no-userinit --no-sysinit
This is SBCL 1.4.10.145-0ec8b87b2-dirty, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
* *features*

(:X86-64 :64-BIT :64-BIT-REGISTERS :ALIEN-CALLBACKS :ANSI-CL :ASH-RIGHT-VOPS
 :BSD :C-STACK-IS-CONTROL-STACK :CALL-SYMBOL :COMMON-LISP
 :COMPACT-INSTANCE-HEADER :COMPARE-AND-SWAP-VOPS :COMPLEX-FLOAT-VOPS
 :CYCLE-COUNTER :DARWIN :DARWIN9-OR-BETTER :FLOAT-EQL-VOPS
 :FP-AND-PC-STANDARD-SAVE :GENCGC :IEEE-FLOATING-POINT :IMMOBILE-CODE
 :IMMOBILE-SPACE :INLINE-CONSTANTS :INODE64 :INTEGER-EQL-VOP :LINKAGE-TABLE
 :LITTLE-ENDIAN :MACH-EXCEPTION-HANDLER :MACH-O :MEMORY-BARRIER-VOPS
 :MULTIPLY-HIGH-VOPS :OS-PROVIDES-BLKSIZE-T :OS-PROVIDES-DLADDR
 :OS-PROVIDES-DLOPEN :OS-PROVIDES-PUTWC :OS-PROVIDES-SUSECONDS-T
 :PACKAGE-LOCAL-NICKNAMES :RAW-INSTANCE-INIT-VOPS :RAW-SIGNED-WORD
 :RELOCATABLE-HEAP :SB-DOC :SB-EVAL :SB-LDB :SB-PACKAGE-LOCKS :SB-SIMD-PACK
 :SB-SOURCE-LOCATIONS :SB-THREAD :SB-UNICODE :SBCL :STACK-ALLOCATABLE-CLOSURES
 :STACK-ALLOCATABLE-FIXED-OBJECTS :STACK-ALLOCATABLE-LISTS
 :STACK-ALLOCATABLE-VECTORS :STACK-GROWS-DOWNWARD-NOT-UPWARD :SYMBOL-INFO-VOPS
 :UD2-BREAKPOINTS :UNBIND-N-VOP :UNDEFINED-FUN-RESTARTS :UNIX
 :UNWIND-TO-FRAME-AND-CALL-VOP)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.