file-position occasionally decreases while reading utf-8 file containing multibyte characters

Bug #2054169 reported by John Carroll
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Fix Released
Medium
Unassigned

Bug Description

I expect file-position to increase monotonically after successive calls to read-char. However, if the file contains one or more multibyte unicode characters, file-position decreases at around each 500-byte point.

Test-case (with non-whitespace/non-multibyte characters scrubbed):

(in-package :cl-user)

(with-open-file (str "/tmp/test.txt"
                     :direction :output :if-exists :supersede :external-format :utf-8)
  (write-string
"xxx xxx xxxxx xxxx xxxxxxx xxxxxx xxx

xxx
xxx xxxxxxxxx xxx xxxx xx xxxx xxxxxxx xxxxx xxxxxxxxxxxxxxxx
xxx xxxxxxxxx xxx xxxx xx xxxx xxx xxxxxxxxxx xxxxxxxxxxxxxxxxxxxx
xxx xxx ‘xxxxxxx’ xxx xxxxxxxxxxx
xxx

xxxxxx xx xxxxx x
x xxxxx xxxxxxx
  xxxxx xxxxx xx

xxxxxxxxx xx xxxxxx

xx xxx xxxxxxxxxx x xxxxx xxxxxx

xxxxx xx xxxxxxxxx x
x xxxxx xxxxxxx
  xxxxxx xxxxxxxxxxxx
  xxxxxx xxxxxxxxxxxx
  xxxxx xxxxxxxxx
  xxxxx xxxxxxx
  xxx xxxxxxxxxxxx
  xxxxx xxxxxxx
  xxx xxxxxxx
  xxxx xxxx
  xxxxx xxxxx
  xxxxxx xxxxxxxxxxxx xx

xxxxxxxxxxx xx xxxxx x
x xxx xxxxx xxx xxxxxxxx
  xxx xxxxxxxxxxxxx xxx xxxxxxxxxxxxx
  xxx xxxxxxxxxxxxx xxx xxxxxxxxxxxxx
  xxx xxxxxxxxxx xx
xxxxxxxxxx xx xxxxxxxxxxxx

xxxxxxxxxxxxxxxxx xx xxxxxxx
xxxxxxxxxxxxxxxxx xx xxxxxxxxxxxxxxxxx x xxxxxx x
x xxxxxxxxxxxxxxxx xx xxxx xxxxxxxxxxxxxxxxx xx
xxxxxxxxxxxxxxxxx xx xxxxxxxxxxxxxxxxx x xxxxxxx

xxxxxxxxxxxxxxxxxx xx xxxxxxx
xxxxxxxxxxxxxxxxxx xx xxxxxxxxxxxxxxxxxx x xxxxxx x
x xxxxxxxxxxxxxxxx xx xxxx xxxxxxxxxxxxxxxxxx xx
xxxxxxxxxxxxxxxxxx xx xxxxxxxxxxxxxxxxxx x xxxxxxx

xxxxxxx xx xxxxxxx
xxxxxxxxxxxx xx xxxxxxxx
xxxxxxxxxxxxx xx xxxxxxxx
xxxxxxxxxxxx xx xxxxxxxx

xx xxx x xx xxxx xxxx xxxxxxx xx xxxxxxx xxx xxxxxxx
xx
xx
xxxxxxxxxxxxxxxxx xx xxxxxxx
xxxxxxxxxxxxxxxxx xx xxxxxxxxxxxxxxxxx x xxxxxx x
x xxxxx xx xxxx xxxxxxxxxxxxxxxxx xx
xxxxxxxxxxxxxxxxx xx xxxxxxxxxxxxxxxxx x xxxxxxx
xxxxxxxxxxxxxxxxx xx xxxxxxxxxxxxxxxxxx
xx

xxxxxxxxxxxx xx xxxxxxxxxxxx
xxxxxxxxxxxx xx xxxxxxxxxxxx x x xxxx xxxxxx xxxx xxxxx xx
xxxxxxxxxxxxxxx xx xxxxxxxxxxxx x x xxxx xxxxxx xx

xx
xx xxx xxxx xx xxxxxxx xx xxxxxxxxx xxxxx xxxx xxxx xxxxxxxxxx x xxxxx xxxx xx
xx xxxx xxxxx xxxxxxx xxxxxxxxxx xxxxx
xx
x xx xxxxxx" str))

(with-open-file (str "/tmp/test.txt" :direction :input :external-format :utf-8)
  (loop for n from 0 below 1600
      for pos = (file-position str) then npos
      for r = (read-char str)
      for npos = (file-position str)
      unless (> npos pos)
      do (warn "at char ~A file-position was ~A, but became ~A after reading next char"
            n pos npos)))

Output:

WARNING:
   at char 508 file-position was 512, but became 509 after reading next char
WARNING:
   at char 1016 file-position was 1020, but became 1017 after reading next char
WARNING:
   at char 1524 file-position was 1528, but became 1527 after reading next char

$ sbcl --version
SBCL 2.4.1
$ uname -a
Darwin MacBook-Pro.local 20.6.0 Darwin Kernel Version 20.6.0: Thu Jul 6 22:12:47 PDT 2023; root:xnu-7195.141.49.702.12~1/RELEASE_ARM64_T8101 arm64
* *features*
(:ARM64 :GENCGC :64-BIT :ANSI-CL :BSD :COMMON-LISP :DARWIN :IEEE-FLOATING-POINT
 :LITTLE-ENDIAN :MACH-O :PACKAGE-LOCAL-NICKNAMES :SB-CORE-COMPRESSION :SB-LDB
 :SB-PACKAGE-LOCKS :SB-THREAD :SB-UNICODE :SBCL :UNIX)

Changed in sbcl:
assignee: nobody → Christophe Rhodes (csr21-cantab)
status: New → Triaged
status: Triaged → In Progress
importance: Undecided → Medium
Revision history for this message
Christophe Rhodes (csr21-cantab) wrote :

Thanks for the report! Should be fixed in revision 9d51385d4d84f0ebe0df7231127b88440b3a2944.

Changed in sbcl:
status: In Progress → Fix Committed
assignee: Christophe Rhodes (csr21-cantab) → nobody
Revision history for this message
John Carroll (jacarroll) wrote :

This was a show-stopper for me. Thank you for fixing it so quickly!

Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.