Re: not font designers?

From: Peter Kirk (
Date: Mon Nov 08 2004 - 09:00:01 CST

  • Next message: Ray Mullan: "Re: not font designers?"

    On 08/11/2004 12:47, Michael Everson wrote:

    > ... Perhaps Ken Whistler and I, in our abundant spare time, might try
    > to wordsmith the standard with regard to this issue. But your
    > insistence that some legalistic interpretation of that text will
    > determine what is and what is not a script is tiresome.
    As my spare time may be more abundant than Ken's and yours, I have
    drafted the following and submitted it to as an Error Report:

    Subject: Characters, Scripts and Semantic Distinctions

    According to the Unicode Standard 4.0 section 2.2. sub-section
    "Characters, Not Glyphs", p.15, "Characters are the abstract
    representations of the smallest components of written language that have
    semantic value." However (as Michael Everson agrees with me) the
    distinction between corresponding letters in different scripts is not
    properly described as "semantic". It is therefore possible to understand
    this sub-section as implying that this distinction between letters
    should be treated in Unicode as a glyph distinction rather than a
    character distinction. This is of course a misunderstanding, because
    Unicode does in fact encode corresponding letters in different scripts
    as distinct characters. But this misunderstanding has become widespread
    and has fuelled a long and acrimonious debate about the proposed
    Phoenician script. Therefore, to ensure consistency and minimise
    misunderstandings, the text of this sub-section should be amended to
    make it clear that corresponding letters in different scripts are
    considered distinct characters.

    I note that the issue is mentioned in passing in a different context on
    p.19, relating only to cases where there is no graphical distinction
    between scripts. But a clearer statement in the correct context would be
    much preferable.

    I propose the following text to be added to p.15, after the sentence
    "They represent primarily, but not exclusively, the letters,
    punctuation, and other signs that constitute natural language text and
    technical notation.":

    "The letters used in natural language text are grouped into scripts,
    sets of letters which are used together in writing any one language.
    Letters in different scripts, even when they correspond either
    semantically or graphically, are represented in Unicode by distinct

    I note that this change also impacts a few special cases such as the use
    of the Latin letters Q and W in Cyrillic script for the Kurdish
    language. According to the principle clarified here, distinct Cyrillic Q
    and W characters should be encoded for Kurdish.

    I would also suggest a separate definition of "script", a concept which
    is much used in Chapter 2 of the Standard but nowhere clearly defined.
    This definition should include a statement of the criteria by which
    Unicode distinguishes script differences, e.g. between Indic scripts,
    from graphical differences, e.g. between regular Latin, italic style and
    Fraktur. The lack of stated criteria for this has also contributed to
    serious misunderstandings concerning Phoenician.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Mon Nov 08 2004 - 09:14:39 CST