CJK Missing/Illegible/Censored Ideograph Character (was Re: A .notdef glyph )

From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Fri Nov 08 2002 - 07:19:37 EST

  • Next message: Michael Everson: "RE: A .notdef glyph"

    Thomas Chan wrote:

    > GETA MARK is also ambiguous to Chinese readers; an "M"-sized WHITE SQUARE
    > or WHITE CIRCLE (or LARGE CIRCLE) are more familiar.
    >

    Thomas is right, the Geta mark is a Japanese innovation, and is totally unknown
    in Chinese contexts.

    > I don't think the distinction
    > between #2 and #3 need or should be standardized at this level--it is up
    > to a convention that the author should establish with the reader, as with
    > any specialized notation

    I'd probably agree with you on this.

    > but there is certainly a difference between #1
    > (author succeeds in writing but reader fails in viewing) and #2/#3 (author
    > fails in writing).
    >

    Definitely !

    In modern printed Chinese texts, a missing character (for example when
    transcribing an ancient manuscript, or deliberately censoring rude words in a
    novel) is almost alway shown as a full-size hollow square (i.e. taking up the
    same space as a CJK ideograph). In digital texts on the internet I have noticed
    that this "missing/illegible/censored ideograph glyph" is usually represented by
    one of the following characters :

    1. U+56D7 (a rare CJK Ideograph)
    2. U+3007 Z (IDEOGRAPHIC NUMBER ZERO)
    3. U+25A1 (WHITE SQUARE)

    These are all unsatisfactory :

    U+56D7 and U+3007 simply do not have the semantics of
    "missing/illegible/censored ideograph glyph", and should not be used as such.

    The White Square is a more interesting case. As the White Square character is
    simply defined by its shape, and has no inherant meaning, it could be used to
    mean whatever is wanted. The problem is that although Chinese fonts such as
    MingLiU or SimSun-18030 draw the White Square glyph as a full-size hollow square
    that looks like the "missing/illegible/censored ideograph glyph", a non-Chinese
    font may not draw it at all like the "missing/illegible/censored ideograph
    glyph" - try it in Arial Unicode MS for example.

    I have always felt that there was a need for a specific Unicode character to
    represent the "missing/illegible/censored ideograph glyph" that is frequently
    encountered in Chinese texts.

    I believe there is similar need to encode the square ink blot mark (Chinese
    moding nB) that is frequently found in woodblock editions in which the blocks
    have been recarved, and an ideograph in the original block is no longer legible.
    This symbol is usually represented by U+25A0 (BLACK SQUARE) in digital texts
    on the internet, but has the same limitations as does using the White Square for
    the "missing/illegible/censored ideograph glyph".

    Andrew West



    This archive was generated by hypermail 2.1.5 : Fri Nov 08 2002 - 08:05:45 EST