From: Peter Kirk (peterkirk@qaya.org)
Date: Tue Mar 15 2005 - 17:20:19 CST
On 15/03/2005 21:28, Michael Everson wrote:
> At 13:16 -0800 2005-03-15, Peter Constable wrote:
>
>> U+048A CYRILLIC CAPITAL LETTER SHORT I WITH TAIL
>
>
> The Cyrillic short thingy isn't a combining character. And it's not a
> breve. (You knew this, but others mightn't.)
Well, we know that it isn't defined as such by Unicode, but not that
that is correct. U+0419 CYRILLIC CAPITAL LETTER SHORT I has a canonical
decomposition to <U+0438, U+0306>, i.e. <CYRILLIC CAPITAL LETTER I,
COMBINING BREVE>. So in that context "The Cyrillic short thingy" is a
combining character and a breve; and the breve has its regular
significance of indicating shortening, from [i] to [j] in IPA (i.e. like
English y). U+048A CYRILLIC CAPITAL LETTER SHORT I WITH TAIL is U+0419
with a tail, the same sort of tail that is can be added to most other
Cyrillic basic letters. This tail is not a combining character. And
presumably the sound of U+048A is a modification of [j]. So for
consistency U+048A, which includes the very same "Cyrillic short thingy"
indicating that this a modified [j] rather than a modified [i], should
have a canonical decomposition to <CYRILLIC CAPITAL LETTER I WITH TAIL,
COMBINING BREVE>.
The only problem with that is that there is no CYRILLIC CAPITAL LETTER I
WITH TAIL, because CYRILLIC CAPITAL LETTER I is one of the few Cyrillic
letters which is not modified with a tail - except when combined with a
breve. So, what we have is a precomposed character which consists of an
existing combining mark combined with a base character which is used
only with the combining mark. I wonder if this is a unique situation? I
think not, because there is an Arabic chair character which is only used
with a hamza. And that situation is also problematic. Also I think
U+0640 ARABIC TATWEEL is supposed to be used only with combining marks.
Anything else? In pointed Hebrew this is true of the letter shin, but
this is used without combining marks in unpointed Hebrew.
For stability reasons it is too late to change this situation with
U+048A in Unicode. Nevertheless, I insist that "The Cyrillic short
thingy" in U+048A is a breve, and consider the failure of Unicode to
encode it as such is an error (but an uncorrectable one) in the Unicode
standard.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ -- No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.308 / Virus Database: 266.7.2 - Release Date: 11/03/2005
This archive was generated by hypermail 2.1.5 : Tue Mar 15 2005 - 17:24:49 CST