Re: Decomposition vs Full decomposition?

From: Peter Kirk (peterkirk@qaya.org)
Date: Tue Mar 15 2005 - 17:20:19 CST

  • Next message: Patrick Andries: "Re: Decomposition vs Full decomposition?"

    On 15/03/2005 21:28, Michael Everson wrote:

    > At 13:16 -0800 2005-03-15, Peter Constable wrote:
    >
    >> U+048A CYRILLIC CAPITAL LETTER SHORT I WITH TAIL
    >
    >
    > The Cyrillic short thingy isn't a combining character. And it's not a
    > breve. (You knew this, but others mightn't.)

    Well, we know that it isn't defined as such by Unicode, but not that
    that is correct. U+0419 CYRILLIC CAPITAL LETTER SHORT I has a canonical
    decomposition to <U+0438, U+0306>, i.e. <CYRILLIC CAPITAL LETTER I,
    COMBINING BREVE>. So in that context "The Cyrillic short thingy" is a
    combining character and a breve; and the breve has its regular
    significance of indicating shortening, from [i] to [j] in IPA (i.e. like
    English y). U+048A CYRILLIC CAPITAL LETTER SHORT I WITH TAIL is U+0419
    with a tail, the same sort of tail that is can be added to most other
    Cyrillic basic letters. This tail is not a combining character. And
    presumably the sound of U+048A is a modification of [j]. So for
    consistency U+048A, which includes the very same "Cyrillic short thingy"
    indicating that this a modified [j] rather than a modified [i], should
    have a canonical decomposition to <CYRILLIC CAPITAL LETTER I WITH TAIL,
    COMBINING BREVE>.

    The only problem with that is that there is no CYRILLIC CAPITAL LETTER I
    WITH TAIL, because CYRILLIC CAPITAL LETTER I is one of the few Cyrillic
    letters which is not modified with a tail - except when combined with a
    breve. So, what we have is a precomposed character which consists of an
    existing combining mark combined with a base character which is used
    only with the combining mark. I wonder if this is a unique situation? I
    think not, because there is an Arabic chair character which is only used
    with a hamza. And that situation is also problematic. Also I think
    U+0640 ARABIC TATWEEL is supposed to be used only with combining marks.
    Anything else? In pointed Hebrew this is true of the letter shin, but
    this is used without combining marks in unpointed Hebrew.

    For stability reasons it is too late to change this situation with
    U+048A in Unicode. Nevertheless, I insist that "The Cyrillic short
    thingy" in U+048A is a breve, and consider the failure of Unicode to
    encode it as such is an error (but an uncorrectable one) in the Unicode
    standard.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    -- 
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.308 / Virus Database: 266.7.2 - Release Date: 11/03/2005
    


    This archive was generated by hypermail 2.1.5 : Tue Mar 15 2005 - 17:24:49 CST