file-position occasionally decreases while reading utf-8 file containing multibyte characters
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
SBCL |
Fix Released
|
Medium
|
Unassigned |
Bug Description
I expect file-position to increase monotonically after successive calls to read-char. However, if the file contains one or more multibyte unicode characters, file-position decreases at around each 500-byte point.
Test-case (with non-whitespace/
(in-package :cl-user)
(with-open-file (str "/tmp/test.txt"
(write-string
"xxx xxx xxxxx xxxx xxxxxxx xxxxxx xxx
xxx
xxx xxxxxxxxx xxx xxxx xx xxxx xxxxxxx xxxxx xxxxxxxxxxxxxxxx
xxx xxxxxxxxx xxx xxxx xx xxxx xxx xxxxxxxxxx xxxxxxxxxxxxxxx
xxx xxx ‘xxxxxxx’ xxx xxxxxxxxxxx
xxx
xxxxxx xx xxxxx x
x xxxxx xxxxxxx
xxxxx xxxxx xx
xxxxxxxxx xx xxxxxx
xx xxx xxxxxxxxxx x xxxxx xxxxxx
xxxxx xx xxxxxxxxx x
x xxxxx xxxxxxx
xxxxxx xxxxxxxxxxxx
xxxxxx xxxxxxxxxxxx
xxxxx xxxxxxxxx
xxxxx xxxxxxx
xxx xxxxxxxxxxxx
xxxxx xxxxxxx
xxx xxxxxxx
xxxx xxxx
xxxxx xxxxx
xxxxxx xxxxxxxxxxxx xx
xxxxxxxxxxx xx xxxxx x
x xxx xxxxx xxx xxxxxxxx
xxx xxxxxxxxxxxxx xxx xxxxxxxxxxxxx
xxx xxxxxxxxxxxxx xxx xxxxxxxxxxxxx
xxx xxxxxxxxxx xx
xxxxxxxxxx xx xxxxxxxxxxxx
xxxxxxxxxxxxxxxxx xx xxxxxxx
xxxxxxxxxxxxxxxxx xx xxxxxxxxxxxxxxxxx x xxxxxx x
x xxxxxxxxxxxxxxxx xx xxxx xxxxxxxxxxxxxxxxx xx
xxxxxxxxxxxxxxxxx xx xxxxxxxxxxxxxxxxx x xxxxxxx
xxxxxxxxxxxxxxxxxx xx xxxxxxx
xxxxxxxxxxxxxxxxxx xx xxxxxxxxxxxxxxxxxx x xxxxxx x
x xxxxxxxxxxxxxxxx xx xxxx xxxxxxxxxxxxxxxxxx xx
xxxxxxxxxxxxxxxxxx xx xxxxxxxxxxxxxxxxxx x xxxxxxx
xxxxxxx xx xxxxxxx
xxxxxxxxxxxx xx xxxxxxxx
xxxxxxxxxxxxx xx xxxxxxxx
xxxxxxxxxxxx xx xxxxxxxx
xx xxx x xx xxxx xxxx xxxxxxx xx xxxxxxx xxx xxxxxxx
xx
xx
xxxxxxxxxxxxxxxxx xx xxxxxxx
xxxxxxxxxxxxxxxxx xx xxxxxxxxxxxxxxxxx x xxxxxx x
x xxxxx xx xxxx xxxxxxxxxxxxxxxxx xx
xxxxxxxxxxxxxxxxx xx xxxxxxxxxxxxxxxxx x xxxxxxx
xxxxxxxxxxxxxxxxx xx xxxxxxxxxxxxxxxxxx
xx
xxxxxxxxxxxx xx xxxxxxxxxxxx
xxxxxxxxxxxx xx xxxxxxxxxxxx x x xxxx xxxxxx xxxx xxxxx xx
xxxxxxxxxxxxxxx xx xxxxxxxxxxxx x x xxxx xxxxxx xx
xx
xx xxx xxxx xx xxxxxxx xx xxxxxxxxx xxxxx xxxx xxxx xxxxxxxxxx x xxxxx xxxx xx
xx xxxx xxxxx xxxxxxx xxxxxxxxxx xxxxx
xx
x xx xxxxxx" str))
(with-open-file (str "/tmp/test.txt" :direction :input :external-format :utf-8)
(loop for n from 0 below 1600
for pos = (file-position str) then npos
for r = (read-char str)
for npos = (file-position str)
unless (> npos pos)
do (warn "at char ~A file-position was ~A, but became ~A after reading next char"
n pos npos)))
Output:
WARNING:
at char 508 file-position was 512, but became 509 after reading next char
WARNING:
at char 1016 file-position was 1020, but became 1017 after reading next char
WARNING:
at char 1524 file-position was 1528, but became 1527 after reading next char
$ sbcl --version
SBCL 2.4.1
$ uname -a
Darwin MacBook-Pro.local 20.6.0 Darwin Kernel Version 20.6.0: Thu Jul 6 22:12:47 PDT 2023; root:xnu-
* *features*
(:ARM64 :GENCGC :64-BIT :ANSI-CL :BSD :COMMON-LISP :DARWIN :IEEE-FLOATING-
:LITTLE-ENDIAN :MACH-O :PACKAGE-
:SB-PACKAGE-LOCKS :SB-THREAD :SB-UNICODE :SBCL :UNIX)
Changed in sbcl: | |
assignee: | nobody → Christophe Rhodes (csr21-cantab) |
status: | New → Triaged |
status: | Triaged → In Progress |
importance: | Undecided → Medium |
Changed in sbcl: | |
status: | Fix Committed → Fix Released |
Thanks for the report! Should be fixed in revision 9d51385d4d84f0e be0df7231127b88 440b3a2944.