Steel Bank Common Lisp

wanted: exported and documented interface for handling character decoding errors

Reported by Nikodemus Siivola on 2009-01-14
4
Affects Status Importance Assigned to Milestone
SBCL
Wishlist
Unassigned

Bug Description

We have the infrastructure for handling character decoding errors in place, but it is neither documented nor exported:

At minimum, export SB-INT:CHARACTER-DECODING-ERROR from SB-EXT, document it and the way to use USE-VALUE restart with it.

Changed in sbcl:
importance: Undecided → Wishlist
status: New → Confirmed
Nathan Froyd (froydnj) wrote :

If/when we do decide to officially export this interface, it would be nice to really decide whether the USE-VALUE restart should accept multiple-character strings. Doing so significantly slows down OCTETS-TO-STRING in the general case. I think single-byte character encodings and UTF-8 are smart enough to optimize for the common case of USE-VALUE receiving a character or single-character string, but it'd be nice to get rid of this split if possible.

Hi,

 status inprogress
 tag octets alien streams
 done

Firstly, this is mostly done in my external-formats branch, available
From <http://rvw.doc.gold.ac.uk/sullivan/git/sbcl.git>; support for
handling decoding errors in all external formats under the various
different situations is almost there, and there is even the beginnings
of documentation (at the moment, for the :replacement external format
modified, but I do plan to harmonize the restart names and export some
conditions).

I like the flexibility of having an arbitrary string designator usable
as the value. If the octets code is too slow, it could be adapted so
that the common case (e.g. a unibyte encoding) fills a preallocated
buffer, only causing more allocation if the user asks for it -- but note
that even then there will be difficulties in calculating the length of
the buffer when external-formats grow line-ending / byte-order-mark
handling.

Christophe

Changed in sbcl:
status: Confirmed → In Progress

Hi,

 status fixcommitted
 done

With the merge of the external-formats branch (and the accompanying
documentation of the :replacement encoding modifier), I'm treating this
bug as basically done. The c-string support is not there at the moment,
and the octet support while functional isn't terribly fast, but the
complexity is mostly hidden under the hood of the interface.

I do intend to document the restarts available, probably standardizing
on USE-VALUE for stream external formats; I'll use bug #317409 to track
that.

Cheers,

Christophe

Changed in sbcl:
status: In Progress → Fix Committed
Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers