Re: [indic] Re: Proposal to add four characters for Kashmiri to the BMP of the UCS

From: Christopher Fynn (
Date: Mon Jul 07 2008 - 04:17:01 CDT

  • Next message: Andrew West: "Re: [indic] Re: Proposal to add four characters for Kashmiri to the BMP of the UCS"

    Hi Pravin

    I understand your points - but does one *have* to do things that way?

    For instance in Tibetan (an Indic script) for isolated vowels we use:

    ཨ U+0F68 ཨཱ U+0F68 U+0F71 ཨི U+0F68 U+0F72 ཨཱི U+0F68 U+0F71 U+0F72
    ཨུ U+0F68 U+0F74 ཨཱུ U+0F68 U+0F71 U+0F74 རྀ U+0F62 U+0F80 རཱྀ U+0F62
    U+0F71 U+0F80 ལྀ U+0F63 U+0F80 ལཱྀ U+0F63 U+0F71 U+0F80 ཨེ U+0F68 U+0F7A
    ཨཻ U+0F68 U+0F7B ཨོ U+0F68 U+0F7C ཨཽ U+0F68 U+0F7D ཨཾ U+0F68 U+0F7E ཨཿ
    U+0F68 U+0F7F

    As there were no pre-composed isolated vowels encoded for that script.
    (They were originally proposed but dropped since they were not
    necessary.) This works well.

    I understand the point you are trying to make about the "script grammar"
    used for Devanagari but it is actually not be at all difficult from an
    implementation point of view (e.g. in ICU, Pango or Uniscribe and in
    font lookups) to define an additional class of combining marks that
    *are* allowed to combine with an isolate vowel character like अ as a base.

    Isn't "not allowed" only an (arbitrary) human rule? I can't see any
    good *technical* reason for encoding four characters instead of two.

    - Chris

    Pravin S wrote:
    > Hi Chris,
    > Philippe has already explained all the points present in proposal
    > (thanks for that)
    > see my comments below
    > 2008/7/7 Christopher Fynn <>:
    >> Philippe Verdy wrote:
    >> ...
    >> I suspect that most of the pre-composd isolate vowels were included for
    >> backwards compatibility with a pre-existing standard(s) like ISCII - IMO
    >> there is no good reason to add additional pre-composed characters when a
    >> base character + combining mark will work fine particularly when these
    >> characters are for a what seems to be pretty well a brand-new orthography.
    > As per Devanagari syllable rule Matra's can,t combine with vowel's,
    > that's why Devanagari code chart consist of U+0904 to U+0914 vowels
    > and corresponding Matras at U+093E to U+094C
    > In fact recent character added in Unicode 5.1 U+0972 'ॲ' was also came
    > due to this reason only
    > since अ [0905]+ ॅ [0945] combining was not possible due to script grammar.
    > The same is not allowed in all Rendering Engine(ICU, PANGO and Uniscribe too).
    >> Generally I think it is a good idea try to conserve as much space as
    >> possible in the Devanagari block on the BMP as, given the number of
    >> languages written in Devanagari, it seems likely that there will eventually
    >> be more characters that it would be best to have there. IMHO adding
    >> unnecessary pre-composed characters when a combination (base char +
    >> combining mark) will do is not the best use of valuable space.
    > I agree we need reserved space, but IMO we should not violate script
    > grammar for preserving space.
    >>> That's the way I understand it. The proposal is preserving the
    >>> consistency.
    >> Preserving consistency could be used the next time someone wants to add more
    >> pre-composed Latin chars. Actually I don't see that encoding only the
    >> combining chars breaks the encoding model used for Devanangari which
    >> already has many combining chars.
    > Nope, presently no such instance available in Devanagari, i.e (Vowel)
    > + (Matras) Combination
    > and not allowed.
    > *Vowel's can combined only with Vowel Modifiers(U+0901 to U+0903) as
    > per syllable rule
    >> I thought there was a policy not to add more pre-composed characters. Is
    >> this not the case?
    > It will add exception's in script grammar.
    > Thanks & Regards,
    > -------------------------
    > Pravin Satpute

    This archive was generated by hypermail 2.1.5 : Mon Jul 07 2008 - 04:19:14 CDT