Re: Myanmar Page @ Chapter 11

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sat Mar 01 2008 - 21:43:43 CST

  • Next message: David Starner: "Re: Myanmar Page @ Chapter 11"

    On 3/1/2008 9:58 AM, David Starner wrote:
    > On Sat, Mar 1, 2008 at 12:53 AM, Javier SOLA <lists@khmeros.info> wrote:
    >
    >> Dear Ngwe Tun,
    >>
    >> Why is it important to encode the fractions? Are they a common part of
    >> Myanmar text? Fractions are usually not encoded in other languages.
    >>
    >
    > Unicode intends to be a complete standard for encoding text, including
    > a huge array of characters including the interrobang, obscure phonetic
    > characters, characters for obsolete orthographies for small languages,
    > and Chinese characters found only in dictionaries. A character does
    > not need to be a common part of Myanmar text to be worth encoding.
    > None of this affects Kent Karlsson's argument that they've already
    > been encoded, of course.
    >
    >
    >
    >
    The question whether a character is common (or part of a common script)
    is not as important as whether it is something that (1) can and (2)
    should be standardized.

    (1) Things for which there is conflicting or sketchy usage information,
    or uncertainty about meaning and (salient features and permitted
    variation of) appearance (or both) simply can't be standardized, because
    the act of standardization requires that such information be available,
    sufficient, and settled.

    (2) Things that are private, short-lived, whimsical, idiosyncratic, or
    don't fit a reasonable description of "character" are things that
    shouldn't be standardized. (Also things that would violate stability
    policies). Character codes are forever, and standardization implies that
    there are both senders and receivers that will be interested in
    exchanging this character, and (at least some) implementations will
    incur the costs to make the possibility of such interchange real.

    Characters that are not common, and members of scripts that are not
    common, often suffer from the first problem, until such time that
    research has caught up with them. That, however, merely reflects the
    difficultly of getting access to sufficient information about something
    that's rare or obscure from the vantage point of the character coder. It
    does not reflect a desire to not encode something simply because its not
    common.

    Entities that should not be coded, on the other hand, can be quire
    common (e.g. the apple logo). If commonality was the primary yardstick
    it would have long since been part of Unicode.

    Anybody who has followed the discussion knows that the actual boundary
    is not drawn in black and white, but that resolving the status of
    proposed entities that are questionable involves a lot of judgment and
    discussion. Sometimes, decisions to rule out characters are even
    overturned later, unless doing so would violate a stability policy, of
    course.

    A./



    This archive was generated by hypermail 2.1.5 : Sat Mar 01 2008 - 21:46:37 CST