Re: Shift-JIS encoded text (was: RE: Tags and future new technologies [...])

From: Ken Whistler <kenw_at_sybase.com>
Date: Fri, 01 Jun 2012 14:17:24 -0700

On 6/1/2012 1:51 PM, Doug Ewell wrote:
> At what point does text
> encoded in a vendor's private-use extension to Shift-JIS become
> "Shift-JIS encoded text"?

A possibly less confusing way to put this is:

At what point does text encoded in a vendor's private-use extension
to *JIS X 0208* become "Shift-JIS encoded text"?

The reason for putting it that way is that JIS X 0208 is a character
encoding standard. It defines the repertoire of characters and
assigns numbers to them.

But 2022-JP, EUC-JP, and Shift-JIS are then 3 different ways of
turning JIS X 0208 character codes (and possibly vendor or other
extensions) into streams of bytes. Think of them as character encoding
schemes (in the Unicode character encoding model sense).

One of the reasons why there are "many Shift-JIS's" is not that the
principle of how to shift JIS X 0208 code values into bytes changes,
but because there are many different private extensions, all making
use of the same general principle for how to move the byte values
into a particular scheme for processing.

In summary, "Shift-JIS" is not a character encoding standard -- it is
a scheme for turning JIS (and various extensions) into a particular
format for processing.

--Ken
Received on Fri Jun 01 2012 - 16:19:10 CDT

This archive was generated by hypermail 2.2.0 : Fri Jun 01 2012 - 16:19:10 CDT