Re: Representative glyphs for combining kannada signs

From: Antoine Leca (
Date: Fri Mar 24 2006 - 06:08:22 CST

  • Next message: Antoine Leca: "Re: DIY OpenType Re-ordering"

    [ Sorry if double posting. First one was from an incorrect address, so I
    resend. ]

    Kent Karlsson wrote:
    > Antoine Leca wrote:
    >> Example 1, Hindi: should the I matra precedes the whole
    >> cluster, or only the last freestanding consonant, in the
    >> case of a cluster constituted from two
    >> or more visually distinct components?
    > A spelling difference that should be recorded in the sequence
    > of characters (in some, not yet standardised, way), quite apart
    > from font issues.

    Are you intending to say that one SHOULD (IS REQUIRED TO) register in the
    codepoints the use of 2-dot-like Umlaut in a different way from
    2-stroke-like Umlaut? Saying it is a "spelling" difference?
    Are you intending to say that if I wrote "Mme" (Mrs in French), I should
    differentiate, in a not yet standardised way, the fact that I write it with
    superscript characters or not? Saying it is a "spelling" difference?
    I guess you did not.

    So, if the original encoder does NOT make a distinction in meaning between
    the two forms, why would Unicode require him to encode this difference at
    codepoint level?

    I agree it could be defined a way in Unicode to REQUEST for one of the two
    forms, when they are viewed as different. Similarly to the case of
    requesting formation, or not, of single-glyph ligatures, with the ZWJ/ZWNJ
    But it should be optional (and supplementary), not mandatory.

    >> Example 2, Malayalam: dead RA can come either before the
    >> (last part of the) consonant, or below it.
    > A spelling difference that should be recorded in the sequence
    > of characters (in some, not yet standardised, way), quite apart
    > from font issues.

    Worse here, much worse.

    The difference is between two rendering styles, which are known to be BOTH
    in current use (disregarding the voiced assertions of the contrary, coming
    from both camps.)
    And it was a conscious (and reaffirmed) decision of ISO/Unicode to encode
    them joinly.

    What you are asking here is to BAN one of the two forms of writing Malayalam

    to use the straightforward way.
    However, it is not yet standardised to decide which form will be banned.
    So, each camp is required to voice his points in the loudest way it can.
    In the mean time, chaos is reigning; and basement-level Malayalee are unable
    to use Unicode.

    I find such a state of affair to be bad, really bad.

    Again, this NOT to say that one could find a way to specify the use of one
    or other style; but it probably has to be done outside of the codepoints
    stream, at least if one want to prevent the fiction of encoding joinly...

    >> Example 3, Malayalam again: the matra for AU U+0D4C can be
    >> shown either as
    >> two parts (as depicted in the tables), or only as the right part.
    > No it cannot. AU spelled with U+0D4C unambigously has two
    > (visible) parts. AU with only the right part is unambiguously
    > spelled with U+0D57 (quite regardless of the character name).

    I am confused here (and this is hardly new).

    I agree U+0D57 (as are its siblings xx55, xx56 or xx57 in the other scripts)
    do have the same properties etc. as the vowel signs, so this use could be
    possible without surgical operations on the UCD. But the current (5.0 draft)
    database says... :
         * only a representation of the right half of 0D4C
    And I am not sure this should be interpreted as you did.
    In fact, I read the word "only" as implying... the complete contrary.
    The French translation is not clearer:
         * simplement la représentation de la moitié droite de 0D4C

    Desiging it as the valid form to encode single-part Malayalam AU should IMHO
    be clearly spelled out by the UTC; and it relatively easy to do so (amending
    the note, for example).

    As last time I look at, it was not decided to do so, in fact it was not even
    decided to look at
    this issue (absent from the list of pending Indic issues). Despite being
    bring into the debate every now and then.

    I do not know if there is a mechanism for clarification requests (either at
    UTC or WG2 level), but it might be useful here, since the informal way does
    not seem to be operative.

    > This is already very clear, but apparently needs to be pointed out.

    It may be clear to you, but (and I beg your pardon for the offence), I would
    very much prefer clear statements from the relevant persons.

    In the indic forum, which is supposed to sort out this kind of things, none
    of the officious spokepersons of the UTC did make clear any of these issues,
    much the contrary.
    And the relevant logs of the UTC discussions did not make more clear either,
    again much the contrary (with the stop-and-go of the cillu issue in the
    middle). Presently, all the work about Malayalam in Unicode has been
    deferred to an ad-hoc working group (with no-one I know of represented
    there.) If all the issues were very clear, then this working group would
    have already bring its conclusions, at the very least a draft presenting the
    state of affairs; I did not see such a thing.
    I am not to say I know better, as I said I am not engaged in this working
    group, nor am I qualified to be I presume.
    Perhaps you are in this group. In such a case, can I kindly ask you to urge
    the group to present some definitive conclusions about the points which are
    "clearly" acknowledged (in the way of the document issued by the Kerala IT
    Mission, which DOES assert some of the above points, but which I cannot take
    for granted to represent the view of the UTC, quite the contrary.)


    This archive was generated by hypermail 2.1.5 : Fri Mar 24 2006 - 06:10:46 CST