Steel Bank Common Lisp

file-position is confused by utf-8 buffering

Reported by Faré on 2010-10-09
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Medium
Unassigned

Bug Description

file-position seems to be computed by (- position-at-end-of-buffer (- end-of-buffer index-in-buffer)) which is bogus, and doesn't allow to use the resulting position as an argument to (file-position s position). Demonstration:

# cat ~/bug/sbcl-file-position.lisp
(in-package :cl-user)
(defparameter *u* "/tmp/u")
(with-open-file (s *u* :direction :output :if-exists :rename-and-delete)
  (princ "Faré λ 自由 foo" s))
(with-open-file (s *u* :direction :input)
  (format t "~&file length: ~D~%" (file-length s))
  (loop :for pos = (file-position s)
    :for c = (read-char s nil nil)
    :for nil = (format t "~&pos ~2D ~S~%" pos c)
    :while c))
(delete-file *u*)
(quit)

# sbcl --load ~/bug/sbcl-file-position.lisp
This is SBCL 1.0.42.37, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
file length: 19
pos 0 #\F
pos 7 #\a
pos 8 #\r
pos 9 #\LATIN_SMALL_LETTER_E_WITH_ACUTE
pos 10 #\
pos 11 #\GREEK_SMALL_LETTER_LAMDA
pos 12 #\
pos 13 #\U81EA
pos 14 #\U7531
pos 15 #\
pos 16 #\f
pos 17 #\o
pos 18 #\o
pos 19 NIL

This is on Linux amd64, which shouldn't matter, with a recentish SBCL.
I'm building the latest SBCL to test it, but from the git log
I don't think it has been magically solved.

NB: I found this problem while writing a function using file-position
to read backwards in a file and portably find the stream-line-column
at the current position of a file stream despite encoding issues. It
was quite annoying to not have read/write consistency for the position.

PS: thanks to Nikodemus for updating ASDF to 2.009.

Faré (fahree) wrote :

Note that the code that implements this was correct back in the days of CMUCL that only had 8-bit encodings.

Faré (fahree) wrote :

Also note that if I use :external-format :utf-8 when I open, I don't have the problem.
The issue is that sb-impl::fd-stream-char-size is wrongly set to 1 when the :external-format is :default.

set-fd-stream-routines should probably set the char-size function when the supplied external-format is :default.

Faré (fahree) wrote :

The patch attached fixes the issue for me. Please review and commit.

Faré (fahree) wrote :

The patch attached fixes the issue for me. Please review and commit.

Nikodemus Siivola (nikodemus) wrote :

Yep. It seems that :DEFAULT doesn't get treated right.

* (sb-impl::default-external-format)

:UTF-8
* (sb-impl::external-format-char-size :default)

1
* (sb-impl::get-external-format :default)

NIL
NIL

However, I think the right place to fix this is making GET-EXTERNAL-FORMAT understand :DEFAULT.

Changed in sbcl:
importance: Undecided → Medium
status: New → Triaged
Faré (fahree) wrote :

I don't fully understand the code, which I think could use some refactoring, but the reason I didn't just modify GET-EXTERNAL-FORMAT is that I believe that :DEFAULT means something special only if the ELEMENT-TYPE is CHARACTER, so GET-EXTERNAL-FORMAT would need a second argument (possible &OPTIONAL) to make sure it doesn't do anything silly when the ELEMENT-TYPE is (UNSIGNED-BYTE 8) or something.

Nikodemus Siivola (nikodemus) wrote :

:EXTERNAL-FORMAT :DEFAULT means always the same thing.

Maybe you're confusing things with :ELEMENT-TYPE :DEFAULT?

Anyways, fixing this looks simple enough, as long as I or someone else has a bit of time.

A merge-ready patch including a test-case will of course speed things along, but I should be able to attend to this before the week is over.

Changed in sbcl:
assignee: nobody → Nikodemus Siivola (nikodemus)
status: Triaged → In Progress
Nikodemus Siivola (nikodemus) wrote :

In 1.0.43.52.

Changed in sbcl:
assignee: Nikodemus Siivola (nikodemus) → nobody
status: In Progress → Fix Committed
Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers