Re: Proposal to add four characters for Kashmiri to the BMP of the UCS

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Jul 07 2008 - 14:05:27 CDT

Next message: John Hudson: "Normalisation and directionality (was: how to add all latin (and greek) subscripts)"

Previous message: J: "Getting A Newb Started"
Maybe in reply to: Pravin S: "Proposal to add four characters for Kashmiri to the BMP of the UCS"
Next in thread: Philippe Verdy: "RE: Proposal to add four characters for Kashmiri to the BMP of the UCS"
Reply: Philippe Verdy: "RE: Proposal to add four characters for Kashmiri to the BMP of the UCS"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

First of all, I will state up front that I have no objection
to the proposal as written -- it seems justified given the information
about the recent Kashmiri orthography reform.

> I suspect that most of the pre-composd isolate vowels were included for
> backwards compatibility with a pre-existing standard(s) like ISCII -

Yes, but not just for that reason.

> IMO there is no good reason to add additional pre-composed characters
> when a base character + combining mark will work fine particularly when
> these characters are for a what seems to be pretty well a brand-new
> orthography.

I disagree in this case. Devanagari works differently (for its
Unicode encoding) than Tibetan does.

U+0972 DEVANAGARI LETTER CANDRA A was added as recently as Unicode 5.1
(and not decomposed). We went through the same set of arguments then,
and I don't see the value of hashing through it every time another
example comes up.

> Preserving consistency could be used the next time someone wants to add
> more pre-composed Latin chars.

No, because precomposed Latin characters have canonical decompositions.
Devanagari (and most other Indic) independent matras do not.

> I thought there was a policy not to add more pre-composed characters. Is
> this not the case?

It generally *is* the case. But what that means is that characters
will not be encoded if by precedent characters of that type have
*canonical* decompositions to already encoded pieces.

It doesn't mean that there is an absolute proscription against
encoding complex graphic entities as characters.

--Ken

Next message: John Hudson: "Normalisation and directionality (was: how to add all latin (and greek) subscripts)"
Previous message: J: "Getting A Newb Started"
Maybe in reply to: Pravin S: "Proposal to add four characters for Kashmiri to the BMP of the UCS"
Next in thread: Philippe Verdy: "RE: Proposal to add four characters for Kashmiri to the BMP of the UCS"
Reply: Philippe Verdy: "RE: Proposal to add four characters for Kashmiri to the BMP of the UCS"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jul 07 2008 - 14:08:37 CDT