pathnames should include device uuids

Bug #1793883 reported by Johannes Martinez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Won't Fix
Undecided
Unassigned

Bug Description

Hi, I'm making a file tracker and it would be of immense help to get UUID's recorded in the device slot of pathnames, with directory structure being listed only from the device. Then to have NAMESTRING add on the mount point so that no matter where an external drive gets mounted that pathname itself will always be the same and refer to the same file.
IMO this is the right thing to do.
The code for pathnames is a bit all over the place, and I don't know if there are any plans for restructuring it(IP/URL support in host) so I think it would probably be best if someone who knows the code base implements it instead of me making a right mess out of it.

Tags: feature
Revision history for this message
Stas Boukarev (stassats) wrote :

That sound like something that should be handled by third party code.

Changed in sbcl:
status: New → Won't Fix
Revision history for this message
Johannes Martinez (johannes-martinez) wrote :

I'm sorry, you think pathnames should be handled by third party code? You think MAKE-PATHNAME and NAMESTRING should be overridden by third party code? That doesn't seem to make any sense at all.
Have you taken a look at the code at all? It looks like a bare minimum implementation of the standard that was supposed to be revisted but never was. HOST is basically useless and DEVICE isn't implemented at all. A PATHNAME written with HOST or DEVICE cannot be written and reconstructed with WRITE and READ.

Revision history for this message
Tomas Hlavaty (q-tom-o) wrote : Re: [Bug 1793883] Re: pathnames should include device uuids

Johannes Martinez <email address hidden> writes:
> Hi, I'm making a file tracker and it would be of immense help to get
> UUID's recorded in the device slot of pathnames,

how would that work on windows, for example, where the device slot
already contains the drive letter?

> The code for pathnames is a bit all over the place, and I don't know
> if there are any plans for restructuring it(IP/URL support in host) so
> [...]
> I'm sorry, you think pathnames should be handled by third party code?
> You think MAKE-PATHNAME and NAMESTRING should be overridden by third
> party code? That doesn't seem to make any sense at all.

similar to gray-streams, it would be nice to have something like
gray-pathnames (or gray-filesystems?), allowing for further
specialization by end users

for example, it would be great, if i could work with zip files directly
in lisp. i cannot do it now, because there are no hooks into the lisp
io, which would allow me to plug in my custom behaviour. instead i have
to (un)zip the file to the OS provided filesystem and only then i can
use already written lisp libraries to work on the zip file content. i
have extended https://common-lisp.net/project/zip/ so that i can open a
gray-stream without unziping the zip file, but i still have to use
custom functions from the zip library for many operations like OPEN,
DIRECTORY, FILE-LENGTH etc. it would be amazing, if i could write a
synthetic filesystem in lisp and all already written libraries using
standard io functions would work as expected

some time ago i wrote http://logand.com/sw/cl-olefs.html in order to
read XLS, PPT, DOC files (microsoft als stores emails on that kind of
filesystem in a file) and hit the same kind of issues

i can also imagine printing from pure lisp like in
http://logand.com/sw/cl-ipp.html except via some kind of lisp synthetic
filesystem

Revision history for this message
Tomas Hlavaty (q-tom-o) wrote :

if you just need to manipulate pathnames, it might actually by possible
by using custom host. see SB-IMPL::*PHYSICAL-HOST* and
unix-pathname.lisp and win32-pathname.lisp

Revision history for this message
Johannes Martinez (johannes-martinez) wrote :

It would work the same for windows. I think it's important to make the distinction that PATHNAMES aren't FILENAMES, and that 'drive letters' in windows aren't actually DEVICES but lettered mount points.

If I plug two different USB keys into windows, they will get mounted sequentially. A PATHNAME as it works now will only point to the correct file if you mount all drives always in the same order. If PATHNAMES included UUIDs in the DEVICE slot, then a PATHNAME could refer to the exact same file on a drive regardless of whether it's being mounted by a unix or windows host, regardless of mount order, and regardless of where in the filesystem it is mounted. This is a huge bonus if you use your external HDs between operating systems or if your BIOS is somewhat buggy and likes to switch things up.

This is a problem that all systems had, especially with distributed systems, which is why UUIDs are now used for pretty much all mounting operations. Guaranteed to always point to the right file on the right device.

Revision history for this message
Richard M Kreuter (kreuter) wrote :
Download full text (3.2 KiB)

Hi Johannes,

In the Common Lisp language, pathnames do indeed exist in order to model filenames. Cf. section 19.1.2 sentence 1: "Pathnames are structured objects that can represent, in an implementation-independent way, the filenames that are used natively by an underlying file system."

If it's important in your application to track filenames relative to drives identified by UUID, I would propose treating that addressing scheme as a separate, first-class abstraction, albeit this introduces some necessity to wrap standard CL operators such as OPEN. (Additionally, such an approach would stand a chance of being portable across multiple CL implementations, whereas relying on a hypothetical pathnames extension will tie your program to only some implementations.) Here is a trivial example that just uses absolute pathnames internally; if you've got a mechanism to address files by UUID-plus-relative-pathname without going through an ordinary absolute pathname, you could use that instead, though this would require a more complicated and probably less strictly portable implementation of OPEN*. Hope this helps.

;; Trivial example.
(defstruct drive
  (uuid nil :type string)
  (mount-pathname nil :type pathname))

(defvar *drives* ())

;; Somehow or other you'll want to initialize *DRIVES* based on your system.
;; Could conceivably use repeated calls to something like this.
(defun ensure-drive (uuid &key mount-pathname)
  (or (find uuid *drives* :test #'string-equal :key #'drive-uuid)
      (let ((new-drive
             (make-drive :uuid uuid :mount-pathname mount-pathname)))
        (push new-drive *drives*)
        new-drive)))

(defun find-drive-by-mount (mount-pathname)
  (find mount-pathname *drives* :test-not #'equal))

;; Note: you might want to give this a distinguished print/read syntax.
(defstruct drive-relative-pathname drive relative-pathname)

(defun convert-to-drive-relative-pathname (pathname)
  (setq pathname (pathname pathname))
  (unless (eql :absolute (first (pathname-directory pathname)))
    (error "Can't convert relative pathname ~A to drive-relative." pathname))
  (let (drive mount-dir)
    (dolist (dir (pathname-directory pathname))
      (setq mount-dir (append mount-dir (list dir)))
      (let ((found (find-drive-by-mount
                    (make-pathname :directory mount-dir
                                   :defaults pathname
                                   :name nil :type nil :version nil))))
        (setq drive found)))
    (unless drive
      (error "Pathname ~A is not on any known drive." pathname))
    (make-drive-relative-pathname
     :drive drive
     :relative-pathname
     (pathname (enough-namestring pathname (drive-mount-pathname drive))))))

(defun convert-to-pathname (pathspec)
  (etypecase pathspec
    ((or stream string) (pathname pathspec))
    (pathname pathspec)
    (drive-relative-pathname
     (merge-pathnames (drive-relative-pathname-relative-pathname pathspec)
                      (drive-mount-pathname
                       (drive-relative-pathname-drive pathspec))))))

(defun open* (pathspec &rest keys &key &allow-other-keys)
  (apply #'open (convert-to-pathname pathspec) keys))

;; Wrappers f...

Read more...

Revision history for this message
Johannes Martinez (johannes-martinez) wrote :

Yes, they are representations, not filenames. Implementation defined representations.
I'm not sure why I would wrap a whole bunch of CL functions instead of changing a handful of PATHNAME functions to store unique identifers for a device in the device slot and not have to change any other functions.

Does this not make sense to have a DEVICE stored in a DEVICE slot? All PATHNAMES have to be resolved to FILENAMES for things like OPEN-FILE, what would be the purpose of messing with things in CL that are implementation independent instead of putting it specifically where it's supposed to go which is the implementation dependent part?

What is the problem with modifying PATHNAME so that it can be read and written? Is everybody ok with PATHNAMES being unREADable? At the very least have things that are passed to MAKE-PATHNAME in HOST and DEVICE put in the written representation so that it is READable?

Revision history for this message
Richard M Kreuter (kreuter) wrote :

One benefit of the approach I proposed is potential portability: there are lots of Common Lisp implementations, and, to my knowledge, none of them do anything like what you're requesting.
In case you should ever desire your programs to run on other implementations in future, depending on the implementation happening to provide this capability will hamper that desire. And then the trivial example I sent was to show how little effort it might take to code up your preferred abstraction, and in a manner that stands a chance of portability.

Anyway, I don't entirely understand your proposal, but I'm trying to be charitable. Please note that the change you're requesting is not compatible with existing code, as people write code that examines pathname components, and that implicitly depends on pathname immutability (since pathnames have heretofore been immutable).

But here's a question: what if I want a pathname that addresses the same "location" in the file system's namespace over time, rather than addressing via some supposed notion of file identity as you're proposing? Today I get that with pathnames; for example, if I parse

  (parse-namestring "/foo/bar/baz.txt")

no such file or directory must exist at parsing time for that pathname to be usable to try to name a file later. I'm not sure how I'd get that semantic if all pathnames had to resolve UUIDs at any particular time.

Anyway, since your desired change is backward-breaking, might not allow for a modeling of ordinary filenames, and can be done in your own code layer anyway, I think the challenge is on you to account for what good comes for the proposed change.

P.S., Btw., a drive-relative pathname doesn't necessarily address the same file over time. Files and directories on the drive can be renamed, deleted and re-created, etc.; and there can be symbolic links in the directory structure, and maybe other things that make file names only temporary identifiers for file identities at best. If you're really interested in file identity, shouldn't you be looking for a triple consisting of the drive's UUID, an inode or serial number, and then perhaps some other number that's never re-used for a given UUID/serial pair (e.g., creation time, or file sequence number).

Revision history for this message
Johannes Martinez (johannes-martinez) wrote :

Perhaps I'm being unclear.
1) PATHNAMES in SBCL are not READable. if you put something in the DEVICE slot it will not get printed and not get read.

2) PATHNAMES are implementation INDEPENDENT structures, what goes into the slots and how they are resolved to a file in a filesytem is implementation DEPENDENT.

3) PATHNAMES are not NAMESTRINGs or TRUENAMES nor should they pretend to be. They may look like STRINGS but they are OBJECTS.

4) platform and implementation portability is achieved through LOGICAL PATHNAMES not PHYSICAL PATHNAMES. This is about PHYSICAL PATHNAMES since LOGICAL do not contain DEVICES.

5) This introduces no breaking changes whatsoever to PATHNAMES, their structure stays exactly the same. The components are still addressable exactly the same way. Merge pathnames still works exactly the same. The DEVICE slot is totally ignorable, exactly the same way it is now.

6) This is not about file identity. It is about recording what DEVICE a file resides on in the DEVICE slot. With the logical extension that what is recorded in DIRECTORY is relative to the DEVICE.

7) Having a UUID recorded in DEVICE has no effect on any CL function since PATHNAMES are resolved to TRUENAMES and NAMESTRINGS before any other function uses them. There is no breaking of code.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.