external formats with a replacement character are horribly slow

Bug #2007164 reported by Douglas Katzman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
New
Undecided
Unassigned

Bug Description

Standard I/O streams use an external format of
    `(,keyword :replacement ,replacement)
where keyword is usually :UTF-8 and replacement is usually #\replacement_character.

The replacement character implementation currently entails a handler bind of 4 closures around every character written. This makes each character written to *standard-output* take easily 2x the CPU cycles as for any other stream opened in :UTF-8 external format. This could be special-cased, for an easy speedup.

Preferably the encoding/decoding error handling mechanism could be redesigned in a way that does not entail a handler-bind for every character. Only if there is an error should we _possibly_ establish a condition handler so that the condition can be propagated to the user, who could decline to handle. But it seems to be that if there is a replacement character, it should probably just use the replacement character. This might require an extra argument to the ANSI stream write-character method to pass through the external format.
If Gray streams and Simple streams require the current inefficiency, so be it; but at least the standard streams should not perform so badly as they currently do.

Revision history for this message
Richard M Kreuter (kreuter) wrote :

It seems unlikely that Gray Streams or Simple Streams could truly require this approach.

Anyone using SBCL's external formats as part of implementing Gray Stream classes is only "allowed" to use public interfaces, SB-EXT:STRING-TO-OCTETS and SB-EXT:OCTETS-TO-STRING, both of which accept external format designators (keywords or lists), not SB-IMPL::EXTERNAL-FORMAT instances. So somebody's Gray Stream instance can't officially hold onto the external format instance or funcall its encoder/decoder functions directly: IOW, STRING-TO-OCTETS and OCTETS-TO-STRING could destructure the list-form E-F designator and supply an extra argument to its encoder/decoder explicitly.

As for Simple Streams, I haven't looked closely at their implementation, but I did notice that after loading sb-simple-streams, OPEN only accepts keywords as external format designators. (Filed as a separate bug at https://bugs.launchpad.net/sbcl/+bug/2008811) So nobody can ever have used both sb-simple-streams and tried OPENing a file with a list-valued external format designator.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.