RE: Never say never

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Feb 11 2003 - 20:38:15 EST

  • Next message: Andy White: "RE: Never say never"

    Andy White wrote:

    > And I today see that the precomposed character '0B71 ORIYA LETTER WA'
    > has been added to the UCS4.0 charts
    > http://www.unicode.org/charts/PDF/U40-0B00.pdf
    > This is clearly a composition of ORIYA LETTER O and ORIYA LETTER LETTER
    > VA (BA).

    People on the list today are playing a little fast and loose
    with the terminology of "precomposed" and "composition".

    In the Unicode Standard, a character is not precomposed or
    composite unless it has a formal decomposition mapping defined
    in the Unicode Character Database (namely in UnicodeData.txt).

    While ORIYA LETTER WA is graphically constructed of the
    form for the ORIYA LETTER O and the bottom half of PA (not BA),
    it doesn't fit the pattern one would expect for consonant
    conjuncts (C+C, not V+C), and it isn't given a formal
    decomposition in UnicodeData.txt, because even though it
    is graphically complex, it otherwise fits into the pattern of
    the regular consonant letters for Indic scripts (as an
    alternate for VA). Note that the new ORIYA LETTER VA is
    also graphically complex -- a dotted BA -- but is also
    not given a decomposition.

    For that matter, you could look to existing Oriya characters
    such as U+0B06 ORIYA LETTER AA and claim it is just a graphic
    combination of U+0B05 ORIYA LETTER A and U+0B3E ORIYA VOWEL
    SIGN AA. But such decompositions are *also* not used in
    the standard. So ORIYA LETTER AA is an *atomic* character
    in Unicode, despite the fact that it is graphically
    complex (and analyzable into parts).

    If anyone ones a pointless exercise in simplification for
    the benefit of complexity sometime, try working on the
    Yi syllabary charts (U+A000..U+A48C) and pull these
    graphically complex forms apart into all of their
    duplicated constituent parts. The mere fact that such
    forms are graphically complex and have identifiable parts
    is not what establishes, however, their status as atomic
    versus composite character in the Unicode Standard.

    --Ken



    This archive was generated by hypermail 2.1.5 : Tue Feb 11 2003 - 21:11:24 EST