Second, can you make this work for arbitrary input prior to the
first such form in a file (especially arbitrary encoded characters in
comments)?
Third, how do you handle an external-format change that alters the
encoded character width (switching from utf-16 to utf-8 in the middle
of a file... or vice versa)?
Fourth, what about when the encoding directive /lies/? This isn't
quite as far-fetched as I'd like, as I've dealt with systems which
take a UTF-8 encoded XML message and transcode it to UTF-16 during
various processing, and then see the encoding specified as UTF-8 in
the header still. Not pretty. Or do the same with, say, latin-1 or
shift-jis and utf-8.
A few things:
First, how do you make this work for utf-16?
Second, can you make this work for arbitrary input prior to the
first such form in a file (especially arbitrary encoded characters in
comments)?
Third, how do you handle an external-format change that alters the
encoded character width (switching from utf-16 to utf-8 in the middle
of a file... or vice versa)?
Fourth, what about when the encoding directive /lies/? This isn't
quite as far-fetched as I'd like, as I've dealt with systems which
take a UTF-8 encoded XML message and transcode it to UTF-16 during
various processing, and then see the encoding specified as UTF-8 in
the header still. Not pretty. Or do the same with, say, latin-1 or
shift-jis and utf-8.