SBCL

wanted: exported and documented interface for handling character decoding errors

Bug #317072 reported by Nikodemus Siivola on 2009-01-14

Affects		Status	Importance	Assigned to	Milestone
	SBCL	Fix Released	Wishlist	Unassigned

Bug Description

We have the infrastructure for handling character decoding errors in place, but it is neither documented nor exported:

At minimum, export SB-INT:CHARACTER-DECODING-ERROR from SB-EXT, document it and the way to use USE-VALUE restart with it.

Tags:

Nikodemus Siivola (nikodemus) on 2009-01-14

Changed in sbcl:
importance:	Undecided → Wishlist
status:	New → Confirmed

Revision history for this message

Nathan Froyd (froydnj) wrote on 2009-01-14:

If/when we do decide to officially export this interface, it would be nice to really decide whether the USE-VALUE restart should accept multiple-character strings. Doing so significantly slows down OCTETS-TO-STRING in the general case. I think single-byte character encodings and UTF-8 are smart enough to optimize for the common case of USE-VALUE receiving a character or single-character string, but it'd be nice to get rid of this split if possible.

Revision history for this message

Christophe Rhodes (csr21-cantab) wrote on 2009-10-31: in progress

Hi,

status inprogress
tag octets alien streams
done

Firstly, this is mostly done in my external-formats branch, available
From <http://rvw.doc.gold.ac.uk/sullivan/git/sbcl.git>; support for
handling decoding errors in all external formats under the various
different situations is almost there, and there is even the beginnings
of documentation (at the moment, for the :replacement external format
modified, but I do plan to harmonize the restart names and export some
conditions).

I like the flexibility of having an arbitrary string designator usable
as the value. If the octets code is too slow, it could be adapted so
that the common case (e.g. a unibyte encoding) fills a preallocated
buffer, only causing more allocation if the user asks for it -- but note
that even then there will be difficulties in calculating the length of
the buffer when external-formats grow line-ending / byte-order-mark
handling.

Christophe

Changed in sbcl:
status:	Confirmed → In Progress

Revision history for this message

Christophe Rhodes (csr21-cantab) wrote on 2009-11-11: external-formats branch committed

Hi,

status fixcommitted
done

With the merge of the external-formats branch (and the accompanying
documentation of the :replacement encoding modifier), I'm treating this
bug as basically done. The c-string support is not there at the moment,
and the octet support while functional isn't terribly fast, but the
complexity is mostly hidden under the hood of the interface.

I do intend to document the restarts available, probably standardizing
on USE-VALUE for stream external formats; I'll use bug #317409 to track
that.

Cheers,

Christophe

Changed in sbcl:
status:	In Progress → Fix Committed

Christophe Rhodes (csr21-cantab) on 2009-11-27

Changed in sbcl:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.