Re: Representative glyphs for combining kannada signs

From: Antoine Leca (
Date: Mon Mar 27 2006 - 09:03:48 CST

  • Next message: Doug Ewell: "Re: Representative glyphs for combining kannada signs"

    Kent Karlsson wrote:
    > Antoine Leca wrote:
    >> Are you intending to say that one SHOULD (IS REQUIRED TO)
    >> register in the codepoints the use of 2-dot-like Umlaut in a
    >> different way from 2-stroke-like Umlaut?
    > Yes, and they already are. U+0308 COMBINING DIAERESIS vs. U+030B
    > COMBINING DOUBLE ACUTE. There is no "umlaut" character...

    I did use Umlaut to clearly (at least I thought) denote the characteristic
    German *feature*, NOT the codepoints. I beg your pardon to not have been
    more explicit.

    >> Are you intending to say that if I wrote "Mme" (Mrs in
    >> French), I should differentiate, in a not yet standardised
    >> way, the fact that I write it with superscript characters
    >> or not? Saying it is a "spelling" difference?
    > Definitely. In this particular case one may debate whether to use
    > markup or to (ab)use U+1D50 MODIFIER LETTER SMALL M and

    Put it in clear: to write the French equivalent of Mrs, I can:
    - either write the slightly incorrect Mme
    - or write the more "correct" M[][] (where [] represent the empty box that
    everybody except four cats will effectively see).

    Somewhere I am thinking this is *not* a working solution.

    > And m² is not at all the same as m2.

    I guess no, although I am not completely sure (particularly since I expect
    the second to read "m<SUP>2</SUP>" instead, of course, which is the same
    underlying codepoints, just presented in a different way.)

    But assuming it, the latter is often used when the author intend the former.
    And it could be conceived to modify the text to show the latter. In fact,
    several user IU do exactly that.

    >> So, if the original encoder does NOT make a distinction in
    >> meaning between the two forms, why would Unicode require
    >> him to encode this difference at codepoint level?
    > How do you know if the "original encoder" makes the difference or
    > not?

    Because *I* am the original encoder, in this stanza. :-)

    Because, in the Mme (or m2) case, the normal way to put it is to type the
    keys [shift] [m] [m] [e] [space], stop typing, select the second m and the
    e, execute a command which makes them superscripted (and which, according to
    your point of view above, should modify the underlying codepoints to be 1D50
    and 1D49), then return typing the full name...

    Because I expect to find "Mme", even if "me" is superscripted;
    _particularly_ if the Find dialog box does not offers me the possibility to
    specify that some letters are superscripted....

    >> when they are viewed as different.
    > Again, how do you know they are not.

    Because my feeling (in fact, my interpretation of the Unicode and ISCII
    description) is that the Indic codepoints are abstract characters, not those
    elements which combine in defined ways to produce some glyphic intermediate
    elements, which only remains to be actually drawn by the font, as it seems
    you are thinking.

    I base that view, first on the fact that the virama concept forces a need
    for some abstraction layer (reordering, combination, so-called backstore,
    etc.) which is absent even from Thai, and even more from Western scripts;
    and secondly because of the underlying nature of the Brahmi-derived scripts,
    with the sounds associated, the sandhi phenomena, etc.
    I consider that Indic character encodings are still in the infancy, we do
    not have the accumulated experience we have with Latin encodings, with
    several tries and errors; ISCII was a radical departure from the traditional
    model, and Unicode did quickly follow in that way, before the dust settled
    over the learnings of the ISCII experiences (in fact, even before the ink
    were put on the copies of the final ISCII standard... but that's another

    Furthermore, Unicode (following ISCII, but amplifying it, and we do not see
    an end to this move) did define several modifiers whose purpose is to
    fine-specify the rendering when it is not clear enough, or when the author
    is supposed to add some precision; this is much like the character styles
    used in Western typography (rendered as HTML spanning styles, for example).
    However, when the ultimate purpose is to convey an idea (not a fixed
    rendering form), and to assure the broadest interoperability, Unicode does
    NOT propose additional modifier; which I take as an indication (in addition
    to the preambles of the Standard) that _none_ are needed.

    >> But it should be optional (and supplementary), not mandatory.
    > I have a really hard time understanding why apparent spell changes
    > should be mediated by fonts changes for Indic scripts. It is not the
    > done that way for any other scripts

    Huh? If I want a rounded 'a' in Latin, I am required to select a font with
    such a design. Similar for a z or a J with descender, or a low-striked q. I
    do not expect to be forced to use the "alternative" codepoints, that have
    been added for special purposes, like U+0251 or U+0292, for an illustrative
    use where I do NOT want to add specific meaning.

    The difference here is that you are saying changing a z-shaped 'a' to a
    rounded one (etc.) is *not* a spelling change, while writing the i matra in
    one or other place *is*. My wild guess is that some Indians may see it
    exactly reversed...

    And I know the line is NOT easy to draw.

    >>>> Example 2, Malayalam: dead RA can come either before the
    >>>> (last part of the) consonant, or below it.
    >>> A spelling difference that should be recorded in the sequence
    >>> of characters (in some, not yet standardised, way), quite apart
    >>> from font issues.
    >> The difference is between two rendering styles, which are
    > "rendering style" should not be confused with "spelling".
    > If they are different in any other way than **purely aesthetic**
    > (line thicknesses, embellishments (like swashes and serifs),
    > roundness, with, boldness, inclination, and such),

    It is certainly such a difference (not purely aesthetic, I mean). See
    attached image. It is the same word (പ്രശ്ലേഷം, the name for U+0D3D if
    not-English characters were accepted ;-)). And of course, I used the same
    encoding to produce both images.

    > it's a spelling difference.

    > (Even differences that are purely aesthetic for plain
    > text may be significant in non-plaintext texts, e.g. for emphasis.)

    Should emphasis be recorded as different Unicode codepoints?
    My reading was it should not...

    > Major glyph differences are spelling differences.

    Sorry, but you did not define major, so it is a bit difficult to follow. I
    understand you discarded line thickness or obliqueness or serifs, that is,
    everything which is commonly associated in Europe with typographic styles.
    Is the wide variety of glyphs available for U+0026 a 'major' difference? Or
    the differences which used to be noted in the Unicode tables, e.g. for
    U+0901, U+090B, U+091D, U+0923, U+0936, to name a few?

    > I'm not advocating banning either form. As I mentioned just above
    > and should be clear also from my original mail, quite the opposite.

    You may be missing something here. Nowadays, and since 1990, there is ONE
    encoding for Malayalam. Not two, neither zero, just one. There is no
    acknowledgement I can find (from Unicode or for that instance from anyone in
    the West) that the difference of rendering are spelling differences
    (justifying specific codepoints).
    The best I can find is the acknowledgement (in the Indic OpenType
    specifications) that there is a need to distinguish two genuinely different
    "styles" in Uniscribe and related, one named "old style" encoded MAL as
    "language system", the other "reformed" encoded MLR.

    As a result, Malayalees began to encode their script with the available
    codepoints. Both "camps" picked the codepoints according to their views; and
    they discovered problems in implementing it (particularly the cillus). The
    problem were reported (for example here around 1997-98, by Jeroen
    Hellingman.) It tooks some times for the Westerners to understand that when
    they (the Malayalees) were presenting their arguments, what some called a
    'cat' was not viewed as a cat by everybody. Even (or particularly) if they
    used a Malayalam word for cat! (പൂച്ച?)

    Now, if you consider those two "spellings" should be encoded independantly,
    and differently, it will actually be a great news for Malayalees, since it
    means that what they see as (their) Malayalam is not 'correct';
    and if furthermore, you are saying "yes, but how it should be encoded is
    still indeterminate" (= to be delayed), it effectively prevents definitive
    documents to be encoded, from either camp.

    Particularly since nothing can prevent the "solution" to be some sort of
    Salomon's judgment, taking some bits of 'correctness' from each camp, thus
    requiring each camp to modify (recode) the current, already existing, texts
    in other areas, to fit the two-spelling model.



    This archive was generated by hypermail 2.1.5 : Mon Mar 27 2006 - 09:06:11 CST