Re: Invalid code points

From: William J Poser (wjposer@ldc.upenn.edu)
Date: Sun May 31 2009 - 19:26:41 CDT

Next message: Asmus Freytag: "Re: Old Italic in RTL ??"

Previous message: Doug Ewell: "Re: Invalid code points"
In reply to: Doug Ewell: "Re: Invalid code points"
Next in thread: Doug Ewell: "Re: Invalid code points"
Reply: Doug Ewell: "Re: Invalid code points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> There is only one UTF-8, the one defined by Unicode and ISO/IEC 10646,
>which maps valid Unicode/10646 scalar values to sequences of bytes.
>Anything else is not UTF-8. Keep repeating this to yourself.

If I understand Hans Aberg's point, he means that one can abstract
the mapping from the non-negative integers to byte sequences used by
UTF-8 away from Unicode and use it for other purposes. One could,
for example, have a "UTF-8" encoding of the TRON indexed character
set, or of Nelson numbers. In this sense, there is "UTF-8", the
integer->byte sequence mapping, and UTF-8, the Unicode transformation
format that uses this mapping. This seems to me to be a perfectly valid point.
However, so as to avoid confusion, we ought to call them different
things, and since the "U" of "UTF-8" stands for "Unicode", it is the
mapping in the abstract that ought to be given another name, perhaps
the "Thompson mapping" or "diner encoding".

Bill

Next message: Asmus Freytag: "Re: Old Italic in RTL ??"
Previous message: Doug Ewell: "Re: Invalid code points"
In reply to: Doug Ewell: "Re: Invalid code points"
Next in thread: Doug Ewell: "Re: Invalid code points"
Reply: Doug Ewell: "Re: Invalid code points"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun May 31 2009 - 19:28:31 CDT