ENSURE-DIRECTORIES-EXIST creates too many directories in the presence of :UP

Bug #1893971 reported by Zach Beane
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
New
Undecided
Unassigned

Bug Description

SBCL 2.0.6

* (directory "/tmp/ede/*.*")
NIL

* (ensure-directories-exist "/tmp/ede/x/../y/../z/../foo.txt")
"/tmp/ede/x/../y/../z/../foo.txt"

* (directory "/tmp/ede/*.*")
(#P"/tmp/ede/x/" #P"/tmp/ede/y/" #P"/tmp/ede/z/")

I expected only "/tmp/ede/" to be created, since that is the directory that contains the file.

http://l1sp.org/cl/ensure-directories-exist says "Tests whether the directories containing the specified file actually exist, and attempts to create them if they do not."

Revision history for this message
Richard M Kreuter (kreuter) wrote :
Download full text (5.6 KiB)

As an interpretation of the standard, I think one could read the phrase "the directories containing the specified file" to describe the lexically apparent directories (and the lexically apparent file, for that matter) in E-D-E's argument, as distinct from the potential referents of the directories and files. After all, when E-D-E executes prior to creation of a new file, the there's no referent of "the specified file" at the time of such an E-D-E call; so it's unclear what directories E-D-E should be testing for if "the directories containing" and "the specified file" are supposed to describe referents instead of names. So I'm not convinced the quoted passage implies SBCL's current behavior is indefensible as a reading of the standard.

But more practically, note that E-D-E is supposed to return its argument. What should this compound form do?

(open (ensure-directories-exist "/tmp/ede/x/../y/../z/../foo.txt"))

ISTM that there are a few options:

1. [The current SBCL behavior.] E-D-E creates all those directories, OPEN passes "/tmp/ede/x/../y/../z/../foo.txt" to open(2) (though of course that's just a coincidental agreement between namestring and "native namestring" syntax), and Lisp lets the operating system decide what file (if any) that string names.

2. Let's suppose that E-D-E didn't create all those directories. In this case, we'd need to decide what file

(open "/tmp/ede/x/../y/../z/../foo.txt")

should try to operate on, since the OS will treat such a string as not naming any file. If the desired answer is a file that could be named by "/tmp/ede/foo.txt", then the question is "how do we get that behavior?" One approach would be for OPEN to "fold out the dotdot", i.e., removing both each :UP and its predecessor, and then converting the result to string to supply to open(2).

But such an ":UP elision" transformation in OPEN will cause other behaviors I'd consider undesirable:

2a. For example, if "/a/b" is a symlink to some directory other than "/a", then "/a/b/../c.txt" and "/a/c.txt" can name distinct files, but the corresponding pathnames would serve as names for only one of the two.

2b. As another example, suppose "/a/b" were a regular file. Then to Unix "/a/b/../c.txt" cannot name a file, but the corresponding Lisp pathname could.

Anyway, having OPEN perform such an ":UP elision" transform means that Lisp will idiosyncratically disagree with Unix for some Lisp pathnames that appear to correspond straightforwardly to POSIX pathnames. That would be conforming, but is it desirable? I would say no, for reasons I'll address below.

3. I suppose one might agree that a Lisp shouldn't *unconditionally* transform all the :UPs out of a pathname before accessing a file system, but still say that E-D-E shouldn't make the x/, y/ and z/ directories, on account of the :UPs. Could such a Lisp have

(open (ensure-directories-exist "/tmp/ede/x/../y/../z/../foo.txt"))

attempt to operate on a file that could be named by "/tmp/ede/foo.txt"? Sure: one way to get OPEN to do so in this case would be for it to check each level directory starting at the root, and whenever it reaches a level that's a symlink, find the symlink's target, merge the parse o...

Read more...

Revision history for this message
Richard M Kreuter (kreuter) wrote :

Correction: I had a typo in my test script that led me to believe that one implementation did something to resolve symlinks in the lexical directory within OPEN. This does not appear to be the case.

Revision history for this message
Luís Oliveira (luismbo) wrote :

Possibly relevant: some implementations parse the directory in the "/tmp/x/../y/" namestring as '(:absolute "tmp" "x" :back "y"), yielding #p"/tmp/y/", whereas SBCL parses it as '(:absolute "tmp" "x" :up "y").

Revision history for this message
Richard M Kreuter (kreuter) wrote :

Yes, though treating ".." as :BACK or having :UP cause elision of levels of directory is an unfortunate and misguided application of concepts in ANSI, and deserves bug reports for those implementations.

The reason for the distinction between :UP and :BACK is explicit in ANSI: in a file system where there can be multiple names for the same directory, if X is a name for a directory, then a name for the parent of that directory cannot be correctly computed just by looking at X, but only by interacting with the file system.

Here's an example on Unix:

$ ln -s /tmp /home/me/foo
$ realpath /home/me/foo/..
# could be /, or if /tmp is itself a symlink, could be anything, but probably not /home/me!

That is, a POSIX pathname that contains dotdot does not, in general, name the same thing as one that removes the dotdot and one preceding level of directory.

Now, CL implementations don't have to take inspiration from the OS they run on, but choosing not to do so has some practical disadvantages, the most notable of which is interoperability with other programs on the same system. Suppose you run a system command to generate Unix pathname strings that might contain dotdot sequences. If you want to parse them into Lisp into objects that denote what those strings do, then you need a pathname implementation that can represent the existence and meaning of dotdot. That's what :UP is for, and it's clearly documented as meaning something different from :BACK.

So IMO, a pragmatic CL implementation should parse dotdot as :UP and handle :UP as distinct from :BACK everywhere. (SBCL mostly does this :UP and :BACK, but conflates them within DIRECTORY; see https://bugs.launchpad.net/sbcl/+bug/1740777 To my knowledge, this is SBCL's only observable defect having to do with :UP.)

P.S., In case anybody cares, here's a fun paper on the topic of dotdot's semantics in the Unix tradition:

https://static.usenix.org/events/usenix2000/general/full_papers/pikelex/pikelex.pdf

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.