Comment 5 for bug 310098

Revision history for this message
Richard M Kreuter (kreuter) wrote :

I think there are two arguments here that SBCL's current behavior is non-conforming, and perhaps an argument that the behavior is suboptimal even if allowed. I'll present contrary views for each, and ask that returning /dev/fd or /proc/self/fd paths never be considered.

1. There's a claim that all file streams are pathname designators. I believe there are only 2 passages in ANSI that are binding here: the glossary entry for "pathname designator" and section 20.1.1.

The glossary entry uses a grammatical construction common to most designators' definitions, "an object that designates a <foo>; that is, an object that denotes a <foo>, and that is one of: <type1> (denoting <a particular foo according to rule1>)... or a <typeN> (denoting <a particular foo according to ruleN>)". The respective "(denoting ...)" phrases define, piecewise, "denotes" in the phrase "an object that denotes a <foo>". Consider "function designator": either a function (denoting itself) or a symbol (denoting the function named by that symbol in the global environment). I think it's uncontroversial that not all symbols are function designators, because for some symbols, there is no function described by the phrase "the function named by that symbol in the global environment". (In other words, the grammatical construction makes the definition self-defining; likewise for "class designator", "package designator".) If we apply this syntactic analysis to "pathname designator", we can say that a stream associated with a file is a pathname designator only if there exists a pathname that was the pathname used to open the file. IOW, the definition does not require each file stream to denote a pathname any more than "function designator" requires each symbol to denote a function.

As for 20.1.1, it says about a stream associated with a file that "such streams can be used as pathname designators". I suppose that this could be interpreted as a requirement that every file stream must be a pathname designator, but I don't think that's only way to read the sentence. "Can be used as" isn't textually identical to "are"; perhaps the difference signifies something. For example I'd say the sentence "symbols can be used as function designators" can be judged a true descriptive statement about symbols, but an incomplete description of how symbols designate functions, and so, while true, would not imply that all symbols are function designators.

So I'm not convinced all file streams are required to denote pathnames.

2. There's been some argumentation about whether a socket ought to count as a "file", i.e., "a named entry in a file system". (I think this reasoning rests on shaky ground: file systems were outside ANSI's scope to define rigorously. But let's give it a fair shake.)

First, I observe that ANSI's "file system" is described only as "a facility which permits data to be stored in named files..." I believe the "which" clause should be understood as nonrestrictive, i.e., a facility having additional capabilities can be a file system. (Very many historical operating systems had the idea that the file namespace could include names for disks, terminals, printers, batch queues, robot arms, on-line peripheral oil wells, etc.; TOPS-20 and Plan 9, put TCP addresses in the file namespace, too. Not all names in such a namespace refer to objects that can store aggregations of data on some medium, but the facility offers that capability, among others.) So there's no general rule that a socket can't be an object in a file system.

But can we say a network socket on Unix is a "file" per ANSI? "File" gets defined as a "named entry in a file system". One minor nit appears: if we splice the definition of "file" into the definition of "file system", we get "which permits data to be stored in named named entr[ies]..." That's one too many instances of "named"; perhaps one should be viewed as an editing error. Which one? IMO that's anybody's choice, but if we drop the one in "file", this argument goes away.

More seriously, though, ANSI doesn't define what it means for an entry in a file system to be "named". I observe that SUSv4 defines a (Unix) pathname as a string that "can be used to identify a file"; and a file descriptor as an integer that "can be used to identify an open file". If the normative operating system standard can use the same verb, "identify", for both cases, I see no reason why a Lisp implementation couldn't say an entry is "named" in case there's either a string or an integer that can be used to identify the file. (I'm not saying that the integers ought to be exposed at the Common Lisp level, only that they are one of two kinds of thing can be used inside the Lisp implementation to identify files.) If we may interpret "named" this way, everything Unix considers a "file" can count as a "named entry in a file system".

(Aside: this analysis deliberately ignores "/dev/fd" or "/proc/self/fd" paths for multiple reasons. First, they don't exist everywhere SBCL runs. Second, they have incompatible semantics across platforms that offer them (they work like dup(2) some places, but open a fresh file description on Linux). But most importantly, these things are dangerous to insinuate ex nihilo into a user program: if there's ever any reason to use a stream associated with a file as a pathname designator in OPEN, the pathname has to be one the user actually specified in the past, not one that refers to an unknowable item of dynamic process state. The best case scenario for opening a synthesized /dev/fd path is an error; if it doesn't error, lousy, hard to debug consequences will follow. So pretty please, with sugar on top, don't give the user a /dev/fd path they never specified.)

So I think it's reasonable to say that everything the OS calls a file, SBCL considers a file.

3. There's the claim that the current super/sub-class relationship between FILE-STREAM and FD-STREAM is "deeply wrong", and that the relationship should be the other way around.

ISTM that a conforming Lisp program can use or define predicates to determine a stream's "traits" (e.g., "input stream", "character stream", "random access stream", etc.), but can't portably suppose that any of those traits correspond to classes. For example, a file stream and a string stream can both be character input streams, but the most specific common superclass of the streams' classes might be STREAM. One consequence is that while you can dispatch on streams using types, you can't portably define methods specialized on most sorts of stream characteristics. (In fact, since the standard implicitly permits the named subclasses of stream to be subclasses of one another, specializing on any subclass of stream is probably formally non-portable.)

That is, in ANSI Common Lisp, the stream interface is permitted to be implemented via a sort of ad hoc polymorphism: stream classes do not comprise an ontology, and the exact abilities of any individual stream cannot conformingly be inferrred from a stream's class. Because this is how the input, output, bidirectional, no-directional, open, closed, character, binary, external format, random access, and interactive stream traits happen to work, I would find it inoffensively consistent if the "has a pathname" trait were to work the same way, i.e., distinguishable by predicate and not by class. (Unfortunately, that predicate can't be defined conformingly, but (lambda (x) (ignore-errors (pathname x))) suffices in 8 implementations I've tested.)

So I don't think the status quo is differently worse than any rearrangement would be, since streams' classes and capabilities are standardly orthogonal (and that's without even considering Gray Streams or Simple Streams, which necessitate extra dimensions of orthogonality in any implementation that does any two of Gray Streams, Simple Streams, and some built-in stream classes).