RE: About Encoding Theory (was: Re: Again not about Phoenician)

From: Joe (joe@unicode.org)
Date: Mon Nov 08 2004 - 22:08:45 CST

  • Next message: Jim Melton: "Re: official languages of ISO / IEC (CIE)"

    To add yet another dimension to what Michael & Asmus & Ken have said:

    In a character encoding, the character is *not* the same thing as a text string of length 1.

    Character identity is defined in theory by a minimal set of entities needed to get certain text processes to do the right things ... and in practice by a lot of blundering around.

    Text/sequence equivalence is defined in specific contexts by specific criteria, under various names from "normalization" to "folding" to "spelling".

    In that sense

    >The aim of Unicode standardisation is surely to define a single and
    >unambiguous representation of text.

    is well and truly false. Thus, we can all agree on the letters of the Latin alphabet for English, abc...xyz -- but we cannot all agree on a single and unambiguous representation of the word "standardization".

    Joe

    - In the future, they will invent a chicken that runs on gasoline -- George Carlin



    This archive was generated by hypermail 2.1.5 : Mon Nov 08 2004 - 22:10:28 CST