wanted: exported and documented interface for handling character decoding errors

Bug #317072 reported by Nikodemus Siivola
4
Affects Status Importance Assigned to Milestone
SBCL
Fix Released
Wishlist
Unassigned

Bug Description

We have the infrastructure for handling character decoding errors in place, but it is neither documented nor exported:

At minimum, export SB-INT:CHARACTER-DECODING-ERROR from SB-EXT, document it and the way to use USE-VALUE restart with it.

Changed in sbcl:
importance: Undecided → Wishlist
status: New → Confirmed
Revision history for this message
Nathan Froyd (froydnj) wrote :

If/when we do decide to officially export this interface, it would be nice to really decide whether the USE-VALUE restart should accept multiple-character strings. Doing so significantly slows down OCTETS-TO-STRING in the general case. I think single-byte character encodings and UTF-8 are smart enough to optimize for the common case of USE-VALUE receiving a character or single-character string, but it'd be nice to get rid of this split if possible.

Revision history for this message
Christophe Rhodes (csr21-cantab) wrote : in progress

Hi,

 status inprogress
 tag octets alien streams
 done

Firstly, this is mostly done in my external-formats branch, available
From <http://rvw.doc.gold.ac.uk/sullivan/git/sbcl.git>; support for
handling decoding errors in all external formats under the various
different situations is almost there, and there is even the beginnings
of documentation (at the moment, for the :replacement external format
modified, but I do plan to harmonize the restart names and export some
conditions).

I like the flexibility of having an arbitrary string designator usable
as the value. If the octets code is too slow, it could be adapted so
that the common case (e.g. a unibyte encoding) fills a preallocated
buffer, only causing more allocation if the user asks for it -- but note
that even then there will be difficulties in calculating the length of
the buffer when external-formats grow line-ending / byte-order-mark
handling.

Christophe

Changed in sbcl:
status: Confirmed → In Progress
Revision history for this message
Christophe Rhodes (csr21-cantab) wrote : external-formats branch committed

Hi,

 status fixcommitted
 done

With the merge of the external-formats branch (and the accompanying
documentation of the :replacement encoding modifier), I'm treating this
bug as basically done. The c-string support is not there at the moment,
and the octet support while functional isn't terribly fast, but the
complexity is mostly hidden under the hood of the interface.

I do intend to document the restarts available, probably standardizing
on USE-VALUE for stream external formats; I'll use bug #317409 to track
that.

Cheers,

Christophe

Changed in sbcl:
status: In Progress → Fix Committed
Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.