Re: East Asian Emphasis Marks (Japanese bouten, etc)

From: Asmus Freytag (
Date: Mon Mar 13 2006 - 22:32:35 CST

  • Next message: fantasai: "Re: East Asian Emphasis Marks (Japanese bouten, etc)"

    On 3/13/2006 3:02 PM, Kenneth Whistler wrote:
    >> U+FE45 SESAME DOT
    > they were encoded for
    > compatibility with JIS X 0213.

    this is a great example where harping on the putative compatibility
    character status is really confusing and not helpful. Yes, X0213 had
    them before we did, but *compatibility* characters they are only if we
    *would not* have added them as characters for reasons of our own, or if
    they violate the character glyph model in some other way.

    In my estimate, we might have and they do not, at least not to the
    degree that makes them special in any way. (I'll deal with their
    similarity to the punctuation characters below).
    > And they were encoded in the CJK
    > Compatibility Forms block because much of that block consists of
    > forms used in vertical CJK text, as are the sesame marks.
    My recollection is, we picked up two empty slots that were handy, and
    the BMP was getting full, and there were no better locations in existing
    (non-compatibility) blocks. The 'related to vertical text' was a nice
    bonus, but - in fact- distracting, because the other characters violate
    Unicode's writing direction model, whereas these don't.

    (The other ones are among the "blackest" strain of black-sheep
    compatibility characters there are ;-).
    > But note that they have no compatibility decomposition mapping, and there
    > is no indication whatsoever that their use is discouraged.
    Therefore, it makes no sense to emphasize them as "compatibility
    characters" which are implicitly second class citizens. Let's reserve
    that label for the truly unwanted.
    > If you have need of referring to a sesame dot in CJK text, by
    > all means, *do* use U+FE45 SESAME DOT. That is what it is encoded
    > for.
    Nough said.
    >> In the case of the sesame at least, the shape in printed materials closely
    >> parallels U+3001 IDEOGRAPHIC COMMA, which is provided by the font.
    > I would *not* suggest using that.
    The committee consensus was to discourage precisely that *hack-o-rama*
    by providing dedicated codes.

    (The location of the comma and period in the character box is
    potentially different for each font, but for use as an emphasis mark,
    you need the 'ink' at a known location, usually centered, otherwise they
    won't look right).

    Note, that we might want to note the fact that - by convention -
    software scales the glyphs for these characters down (just as if they
    had been regular characters).


    PS: Form the last parenthetical remark, it should be clear that for
    other symbols, for which existing fonts have glyphs that are always
    centered, would not require specific codes for emphasis marks.

    This archive was generated by hypermail 2.1.5 : Mon Mar 13 2006 - 22:34:43 CST