RE: Representative glyphs for combining kannada signs

From: Kent Karlsson (kent.karlsson14@comhem.se)
Date: Fri Mar 24 2006 - 17:15:22 CST

  • Next message: Richard Wordingham: "Re: Representative glyphs for combining kannada signs"

    Antoine Leca wrote:

    > >> Example 1, Hindi: should the I matra precedes the whole
    > >> cluster, or only the last freestanding consonant, in the
    > >> case of a cluster constituted from two
    > >> or more visually distinct components?
    > >
    > > A spelling difference that should be recorded in the sequence
    > > of characters (in some, not yet standardised, way), quite apart
    > > from font issues.
    >
    > Are you intending to say that one SHOULD (IS REQUIRED TO)
    > register in the
    > codepoints the use of 2-dot-like Umlaut in a different way from
    > 2-stroke-like Umlaut?

    Yes, and they already are. U+0308 COMBINING DIAERESIS vs. U+030B
    COMBINING DOUBLE ACUTE. There is no "umlaut" character...
    And yes, those really ARE different, even "semantically" not just
    spellingwise: ó is to o as ő is to ö.

    > Saying it is a "spelling" difference?

    Definitely.

    > Are you intending to say that if I wrote "Mme" (Mrs in
    > French), I should
    > differentiate, in a not yet standardised way, the fact that I
    > write it with
    > superscript characters or not? Saying it is a "spelling" difference?

    Definitely. In this particular case one may debate whether to use
    markup or to (ab)use U+1D50 MODIFIER LETTER SMALL M and
    U+1D49 MODIFIER LETTER SMALL E. But it is a spelling difference
    either way. I would actually prefer using the MODIFIER LETTERs
    in this case (assuming they display ok).

    And m² is not at all the same as m2.

    > I guess you did not.

    Of course I did.

    > So, if the original encoder does NOT make a distinction in
    > meaning between
    > the two forms, why would Unicode require him to encode this
    > difference at codepoint level?

    How do you know if the "original encoder" makes the difference or
    not? You may not be given the chance to ask, and even if you do,
    do you really want to bother asking each and every "original encoder"
    (I call that "author" rather than "original encoder", but that is beside
    the point).

    > I agree it could be defined a way in Unicode to REQUEST for
    > one of the two forms,

    The author should choose one spelling or the other. The apparent
    spelling should NOT be up to frail font selections.

    > when they are viewed as different.

    Again, how do you know they are not. In the approach too often taken
    for Indic scripts, systems (or intervening people) may have changed
    fonts (if there was any font information at all in the original data), and
    then changed the apparent spelling, maybe in such a way that the
    original author is not comfortable with the resulting apparent spelling.

    > Similarly to the case of
    > requesting formation, or not, of single-glyph ligatures, with
    > the ZWJ/ZWNJ joiners.

    I really dislike the ZWJ/ZWNJ hack for Indic scripts. But its much too
    late to back out from that approach, so I have to play along with that hack.

    > But it should be optional (and supplementary), not mandatory.

    I have a really hard time understanding why apparent spell changes
    should be mediated by fonts changes for Indic scripts. It is not the
    done that way for any other scripts (though there are some similar
    mistakes along that line for Arabic). And that for good reasons: font
    selection should NOT be responsible for apparent spelling, as that
    may change the apparent spelling in some way (that may or may
    not be acceptable; better to stray on the safe side and take them
    as different, and thus encoded at the character level).

    > >> Example 2, Malayalam: dead RA can come either before the
    > >> (last part of the) consonant, or below it.
    > >
    > > A spelling difference that should be recorded in the sequence
    > > of characters (in some, not yet standardised, way), quite apart
    > > from font issues.
    >
    > Worse here, much worse.
    >
    > The difference is between two rendering styles, which are

    "rendering style" should not be confused with "spelling".
    If they are different in any other way than **purely aesthetic**
    (line thicknesses, embellishments (like swashes and serifs),
    roundness, with, boldness, inclination, and such), it's a spelling
    difference. (Even differences that are purely aesthetic for plain
    text may be significant in non-plaintext texts, e.g. for emphasis.)

    Thus, different placement of a reordrant vowel is a spelling difference.
    Before vs. below (and similar) is a spelling difference. Major glyph
    differences are spelling differences.

    > known to be BOTH
    > in current use (disregarding the voiced assertions of the
    > contrary, coming from both camps.)

    I have no idea what the "camps" are here. But those who prefer
    one spelling should be able to reliably use that spelling regardless
    of (later) font selection or font change. Likewise for those who prefer
    the other spelling. Changing the spelling is a "deeper" operation
    than changing the font.

    > And it was a conscious (and reaffirmed) decision of
    > ISO/Unicode to encode them joinly.

    IMO a bad idea. Especially if there are "camps" preferring different
    spellings.

    > What you are asking here is to BAN one of the two forms of

    No, quite the contrary. Both forms should be supported regardless
    of font selection.

    > writing Malayalam to use the straightforward way.
    > However, it is not yet standardised to decide which form will
    > be banned.

    I'm not advocating banning either form. As I mentioned just above
    and should be clear also from my original mail, quite the opposite.

    > So, each camp is required to voice his points in the loudest
    > way it can.
    > In the mean time, chaos is reigning; and basement-level
    > Malayalee are unable to use Unicode.

    With that I can agree.

    > I find such a state of affair to be bad, really bad.
    >
    >
    > Again, this NOT to say that one could find a way to specify
    > the use of one
    > or other style; but it probably has to be done outside of the
    > codepoints

    That would be too frail, and not reliable.

    > stream, at least if one want to prevent the fiction of
    > encoding joinly...
    >
    >
    > >> Example 3, Malayalam again: the matra for AU U+0D4C can be
    > >> shown either as
    > >> two parts (as depicted in the tables), or only as the right part.
    > >
    > > No it cannot. AU spelled with U+0D4C unambigously has two
    > > (visible) parts. AU with only the right part is unambiguously
    > > spelled with U+0D57 (quite regardless of the character name).
    >
    > I am confused here (and this is hardly new).
    >
    > I agree U+0D57 (as are its siblings xx55, xx56 or xx57 in the
    > other scripts)
    > do have the same properties etc. as the vowel signs, so this
    > use could be
    > possible without surgical operations on the UCD. But the
    > current (5.0 draft)
    > database says... :
    > 0D57 MALAYALAM AU LENGTH MARK
    > * only a representation of the right half of 0D4C

    I think that remark should be changed to something like:
            "* the modern spelling of the AU matra"
            "* right half of 0D4C"
    with a similar remark:
            "* the old spelling of the AU matra"
    for U+0D4C. (And 0D4C and 0D57 should collate
    with only a secondary level difference.)

    ...
    > > This is already very clear, but apparently needs to be pointed out.
    >
    > It may be clear to you,

    Haven't you noticed the canonical decomposition? (I will disregard
    Philippe's rant on that; canonically equivalent really means canonical
    equivalent, despite what Philippe may say.)

    Unfortunately, there are many should-have-been-but-are-missing
    canonical decompositions for characters in Indic scripts.

    ...
    > middle). Presently, all the work about Malayalam in Unicode has been
    > deferred to an ad-hoc working group (with no-one I know of represented
    > there.) If all the issues were very clear, then this working
    > group would
    > have already bring its conclusions, at the very least a draft
    > presenting the
    > state of affairs; I did not see such a thing.
    > I am not to say I know better, as I said I am not engaged in
    > this working
    > group, nor am I qualified to be I presume.
    > Perhaps you are in this group.

    No, I'm not.

                    /kent k



    This archive was generated by hypermail 2.1.5 : Fri Mar 24 2006 - 17:23:23 CST