Re: Incorrect names for Arabic letters

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sat Mar 19 2005 - 14:12:12 CST

  • Next message: Simon Josefsson: "Re: Small Java implementation of NFC"

    At 01:42 AM 3/19/2005, Ahmad Gharbeia wrote:
    >While the mentioned letters' names in their current incorrect state
    >reflect the colloquial pronunciation in Egypt, where I am from, they
    >are not the canonical, globally understood letter names and are
    >considered invalid.

    Presumably, the reason we have the current form of the names is
    due to contributors from Egypt in the very early work of encoding
    characters. Because of the merger of efforts between ISO and the
    Unicode Consortium on character encoding, character names for
    the Unicode Standard match the names used in ISO/IEC 10646. That
    standard in turn intentionally matches the names used in 8-bit
    character set standards, such as ISO/IEC 8859.

    The names used for Arabic characters in Unicode therefore ultimately
    have a heritage that can be traced back several decades. It is ironic
    that early drafts of the Unicode Standard indeed used the names that
    you prefer.

    >While the proposed corrections do not aim to
    >precisely transcribe the sounds of the letters, they are simple to
    >implement and would result in identifiable names of the letters.

    The purpose of the names in the Unicode Standard is twofold. On the one
    hand we desire them to be descriptive so that they can be used as a
    convenient handle for the character in discussions and descriptions,
    or help users identify them in a list of characters.

    On the other hand, they are intended to serve a formal identifiers,
    just as scientific names for plants and animals. This is especially
    important for characters that are also part of other ISO standards,
    where they have different code numbers, but the same name.

    As a required for this second use, names, once assigned, cannot be
    changed, even to the limit of preserving a typo, as in the name for
    U+1D0C5, or in preserving the name of U+2118, which describes something
    different from what the character actually is.

    You write:

    >Although it is unlikely that this heritage of earlier encodings can
    >be modified now, this should be noted, however.

    An annotation or comment to the effect that the names represent
    a less than universal transliteration is always possible.

    >Finally, the order of Arabic letters as defined in the current version
    >of Unicode, known as the Hegaa'i order, is a relatively newer order
    >where letters are sorted according to their shape proximity, and is
    >not the original Abgadi order, which matches the (ABC) ordering of all
    >alphabets derived from the original Ugaritic alphabet.

    That is something that might be noted as well. There are other scripts
    for which the basic alphabet has more than one possible order. As the
    ordering affects primarily the users of the printed code charts trying
    to locate a character, picking a more modern ordering seems to be
    appropriate.

    As Doug Ewell already wrote, the ordering of *data* is of course not
    driven by the arrangement of characters in the code table, but I think
    you were not implying that.

    A./
    more modern arr



    This archive was generated by hypermail 2.1.5 : Sat Mar 19 2005 - 14:13:44 CST