RE: Suggestions in Unicode Indic FAQ

From: Keyur Shroff (keyur_shroff@yahoo.com)
Date: Mon Feb 03 2003 - 00:24:57 EST

  • Next message: Keyur Shroff: "RE: Suggestions in Unicode Indic FAQ"

    --- Kent Karlsson <kentk@md.chalmers.se> wrote:

    > >
    > > Without that dotted circle appearing, the e-matra would appear to
    > > have been properly encoded,
    >
    > No, with proper reordering (and "normal" display mode), the e-matra at
    > the beginning of the second word would appear to be last glyph of the
    > first "word". Similarly, for the second case, the e-matra glyph would
    > have come to the left of the pa. The fluent reader (ok, not me...)
    > would then see those errors anyway, just like I can find spelling
    > errors in Swedish, most often without any kind of special marking. (I'm
    > assuming through-out that reordrant combining characters are reordered.)

    Illegal sequences are not reordered as you indicated. Also, as far as I
    know there is no mention of reordering of illegal input sequence (or
    invalid combining mark) in Unicode standard.

    Consider the last set of glyphs (left-to-right, top-to-bottom) in the
    attached image. It is the rendering effect of illegal input sequence
    "Devanagari Vowel Sign I" [U+093F] + "Devanagari Letter Ka" [U+0915] and
    without any dotted circle. As you might be knowing the correct input
    sequence should be U+0915 followed by U+093F. In that case the result would
    have been similar to what appears right now. (Though some more
    sophisticated font/application may want to replace the appearing glyph for
    U+093F to be substituted by some other glyph with proper attachment point).
    Now there is no way that user can identify this illegal input sequence
    without dotted circle. In the worst case even this rendered glyph is
    attached to the character from a class (for example, consonant cluster of
    "Ka" "Virama" "Ma") for which the glyph has been designed to render with.
    In such case even a fluent reader can not identify the error.

    >
    > There are spelling errors, yes. But there are other ways of indicating
    > spelling errors, that are (by now) fairly conventional for any language
    > (as long as there is an appropriate dictionary installed), and that also
    > are more general (in catching more spelling errors) and less obtrusive
    > (the author really wants to write it that way, for some reason).
    >
    > > Apparently, Michka used a non-OpenType Bengali Unicode font when
    > > he embedded the fonts into the page. As long as you are looking
    > > at the page on-line, with the embedded fonts, these errors are
    > > invisible.
    > >
    > > It may be typographically horrible. It *should* be typographically
    > > horrible in order to illustrate bad sequences clearly.
    >
    > I'd prefer little red wiggly lines under the word, or yellow background
    > or some such (just for screen display, not for printing; screen grabs
    > not counted). And that for any spelling "error".

    Spelling mistakes can be categorized into two different classes. One
    arising from illegal input sequence (e.g., Vowel Sign E as the first
    character in a word) and the other one is legal input sequence with no
    contextual meaning in the dictionary. While indication of the second type
    of mistake is generally used only in sophisticated applications like word
    processor, everyone wants to know the first kind of mistake. With your
    explanation it seems that even plain text editor is not useful at all to
    identify such common typing mistakes!

    - Keyur

    __________________________________________________
    Do you Yahoo!?
    Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
    http://mailplus.yahoo.com


    img1.jpg

    This archive was generated by hypermail 2.1.5 : Mon Feb 03 2003 - 01:06:57 EST