Uniqueness Rule (was: Malayalam vowel sign AU)

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sun Apr 02 2006 - 05:05:48 CST

  • Next message: Kent Karlsson: "e-mail settings..."

    James Kass wrote:
    >> When a new orthography was announced for German a few years ago,
    >> did you go and make two Latin fonts then, one for the old and one for
    >> the new orthography? I guess (and hope) not... When one for Finnish
    >> started to use ? and ? instead of sh and zh, did you go and make a
    >> font that displays sh as ? and zh as ?? I guess and hope not.
    >
    > Of course not. I've always figured that if anybody wants to represent the
    > "sh" sound with a question mark, they should just
    > use the question mark character at U+0037.
    >
    > (My browser settings munged your message.)

    No! The question mark U+0037 has inappropriate properties for a letter.
    Compare LATIN LETTER RETROFLEX CLICK (alias LATIN LETTER EXCLAMATION MARK)
    U+01C3 with the EXCLAMATION MARK U+0021. They have different BiDi
    properties for starters. (I'm not sure if the dire effects on
    spell-checkers of using puntuation as letters can be blamed on the Unicode
    properties. One of the Unicode annexes agonises over the apostrophe U+0027.

    >> "Uniqueness Rule"???
    >
    > "Two different encodings should not render same,
    > irrespective of the font or joiners used."
    >
    > http://varamozhi.blogspot.com/2005/07/unicode-uniqueness-rule-on-encoding.html

    That's at best a goal. There are blocks of exceptions, e.g. Arabic
    Presentation Forms! At best it can be rescued by adding 'unless they are
    compatibility equivalent'. If I write U+0061 U+200D U+0065 I have no idea,
    without knowing the rendering system, whether I will get the same as U+0061
    U+0065, the same as U+00E6 LATIN SMALL LETTER AE or something different.
    The rule would eliminate the second possibility.

    There are also cases where identical glyphs have been created without any
    qualms - the principle of script separation distinguishes the usually
    visually identical LATIN SMALL LETTER O, CYRILLIC SMALL LETTER O and GREEK
    SMALL LETTER OMICRON without serious worries, though I must admit I found a
    (hand-drawn) diagram with both LATIN CAPITAL LETTER M and GREEK CAPITAL
    LETTER MU distinctly naughty. (The contrast seemed to be totally oral.)

    The use of IPA in orthographies also creates havoc. The glyphs of LATIN
    SMALL LETTER ALPHA U+0251 are also glyphs of LATIN SMALL LETTER A U+0061,
    and are the glyphs usually used in children's books in England. There are
    also cases where glyph variation is constrained by grammatical
    considerations.

    Richard.



    This archive was generated by hypermail 2.1.5 : Sun Apr 02 2006 - 05:15:54 CST