From: Joe (joe@unicode.org)
Date: Mon Nov 08 2004 - 22:08:45 CST
To add yet another dimension to what Michael & Asmus & Ken have said:
In a character encoding, the character is *not* the same thing as a text string of length 1.
Character identity is defined in theory by a minimal set of entities needed to get certain text processes to do the right things ... and in practice by a lot of blundering around.
Text/sequence equivalence is defined in specific contexts by specific criteria, under various names from "normalization" to "folding" to "spelling".
In that sense
>The aim of Unicode standardisation is surely to define a single and
>unambiguous representation of text.
is well and truly false. Thus, we can all agree on the letters of the Latin alphabet for English, abc...xyz -- but we cannot all agree on a single and unambiguous representation of the word "standardization".
Joe
- In the future, they will invent a chicken that runs on gasoline -- George Carlin
This archive was generated by hypermail 2.1.5 : Mon Nov 08 2004 - 22:10:28 CST