Re: Formal alias for U+034F COMBINING GRAPHEME JOINER (CGJ)?

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Mar 12 2008 - 13:03:30 CST

  • Next message: Andrew West: "Re: Mongolian script: present state of the draft"

    Karl Pentzlin suggested:

    > In the code table, the character has a informative note
    > "The name of this character is misleading, it does not actually join
    > graphemes", without giving more information.

    The more information is actually in the text of the
    standard. There is a deliberate editorial policy not to extend
    notes in the character names list to the paragraphs
    that might be needed to explain oddballs such as this one.

    >
    > Is it appropriate to propose a formal alias like
    > "COMBINING GRAPHEME SEPARATOR"?

    I won't repeat what Asmus has already said.

    But it might be pertinent to point out that the term
    "SEPARATOR" in the standard is associated with
    visible punctuation marks:

    060D;ARABIC DATE SEPARATOR;Po;0;AL;;;;;N;;;;;
    066B;ARABIC DECIMAL SEPARATOR;Po;0;AN;;;;;N;;;;;
    066C;ARABIC THOUSANDS SEPARATOR;Po;0;AN;;;;;N;;;;;
    10FB;GEORGIAN PARAGRAPH SEPARATOR;Po;0;L;;;;;N;;;;;
    1368;ETHIOPIC PARAGRAPH SEPARATOR;Po;0;L;;;;;N;;;;;
    10100;AEGEAN WORD SEPARATOR LINE;Po;0;L;;;;;N;;;;;
    10101;AEGEAN WORD SEPARATOR DOT;Po;0;ON;;;;;N;;;;;
    1091F;PHOENICIAN WORD SEPARATOR;Po;0;ON;;;;;N;;;;;

    with white space:

    180E;MONGOLIAN VOWEL SEPARATOR;Zs;0;WS;;;;;N;;;;;
    2028;LINE SEPARATOR;Zl;0;WS;;;;;N;;;;;
    2029;PARAGRAPH SEPARATOR;Zp;0;B;;;;;N;;;;;

    with controls and format characters used in delineation
    syntax:

    001C;<control>;Cc;0;B;;;;;N;INFORMATION SEPARATOR FOUR;;;;
    001D;<control>;Cc;0;B;;;;;N;INFORMATION SEPARATOR THREE;;;;
    001E;<control>;Cc;0;B;;;;;N;INFORMATION SEPARATOR TWO;;;;
    001F;<control>;Cc;0;S;;;;;N;INFORMATION SEPARATOR ONE;;;;
    2063;INVISIBLE SEPARATOR;Cf;0;BN;;;;;N;;;;;
    FFFA;INTERLINEAR ANNOTATION SEPARATOR;Cf;0;ON;;;;;N;;;;;

    and with visible symbols for such things:

    2396;DECIMAL SEPARATOR KEY SYMBOL;So;0;ON;;;;;N;;;;;
    241C;SYMBOL FOR FILE SEPARATOR;So;0;ON;;;;;N;GRAPHIC FOR FILE SEPARATOR;;;;
    241D;SYMBOL FOR GROUP SEPARATOR;So;0;ON;;;;;N;GRAPHIC FOR GROUP SEPARATOR;;;;
    241E;SYMBOL FOR RECORD SEPARATOR;So;0;ON;;;;;N;GRAPHIC FOR RECORD SEPARATOR;;;;
    241F;SYMBOL FOR UNIT SEPARATOR;So;0;ON;;;;;N;GRAPHIC FOR UNIT SEPARATOR;;;;
    3037;IDEOGRAPHIC TELEGRAPH LINE FEED SEPARATOR SYMBOL;So;0;ON;;;;;N;;;;;

    There is not a combining mark among them, nor is "SEPARATOR"
    likely to be used in combining mark names in the future,
    since one of the most salient aspects of combining marks is
    that they are kept ("glued") to their base in most
    processing contexts.

    So no, I don't think "COMBINING GRAPHEME SEPARATOR" would
    be an appropriate alias for U+034F COMBINING GRAPHEME JOINER,
    much less an appropriate *formal* alias -- which, as Asmus
    pointed out, is effectively a claim that the formal alias
    is a normative correction of an existing, defective (but
    immutable) name for a character.

    My recommendation is to just get used to calling U+034F the
    "CGJ" and stop worrying about what the initialism letters
    stand for -- just like we don't actually spend too much
    time worrying about what the letters in "ISO" stand for.

    --Ken



    This archive was generated by hypermail 2.1.5 : Wed Mar 12 2008 - 13:05:56 CST