From: Kent Karlsson (email@example.com)
Date: Mon Feb 03 2003 - 12:29:47 EST
> > No, with proper reordering (and "normal" display mode), the e-matra at
> > the beginning of the second word would appear to be last glyph of the
> > first "word". Similarly, for the second case, the e-matra glyph would
> > have come to the left of the pa. The fluent reader (ok, not me...)
> > would then see those errors anyway, just like I can find spelling
> > errors in Swedish, most often without any kind of special marking. (I'm
> > assuming through-out that reordrant combining characters
> are reordered.)
> Illegal sequences
There are no illegal sequences.
> are not reordered as you indicated.
Then that is a problem with the display software you are using.
> Also, as far as I
> know there is no mention of reordering of illegal input sequence (or
> invalid combining mark) in Unicode standard.
Again, there are no "illegal input sequences".
> Consider the last set of glyphs (left-to-right, top-to-bottom) in the
> attached image. It is the rendering effect of illegal input sequence
> "Devanagari Vowel Sign I" [U+093F] + "Devanagari Letter Ka"
> [U+0915] and without any dotted circle.
Let's see if I understand you. <093F, 0915> is the input. Since
093F is a combining character, one should (not must, but should)
treat this *as if* the input was <0020, 093F, 0915>. Since 093F
is also reordrant, one must reorder it before the preceding base
character (at least, more for consonant clusters), so the output
glyphs would be <<glyph for 0915, space, glyph for 0915>>.
(But your image does not show that.)
> As you might be knowing the correct input
> sequence should be U+0915 followed by U+093F.
That would be a different input (whether that is correct or
not depends on the authors intent).
> In that case the result would
> have been similar to what appears right now.
Similar ONLY if you disregard the space "glyph" that should
have been there.
> (Though some more
> sophisticated font/application may want to replace the
> appearing glyph for
> U+093F to be substituted by some other glyph with proper
> attachment point).
That may be.
> Now there is no way that user can identify this illegal input sequence
> without dotted circle.
Yes, there is. Don't disregard the space "glyph".
> In the worst case even this rendered glyph is
> attached to the character from a class (for example,
> consonant cluster of
> "Ka" "Virama" "Ma") for which the glyph has been designed to
> render with.
> In such case even a fluent reader can not identify the error.
> > There are spelling errors, yes. But there are other ways
> of indicating
> > spelling errors, that are (by now) fairly conventional for
> any language
> > (as long as there is an appropriate dictionary installed),
> and that also
> > are more general (in catching more spelling errors) and
> less obtrusive
> > (the author really wants to write it that way, for some reason).
> > > Apparently, Michka used a non-OpenType Bengali Unicode font when
> > > he embedded the fonts into the page. As long as you are looking
> > > at the page on-line, with the embedded fonts, these errors are
> > > invisible.
> > >
> > > It may be typographically horrible. It *should* be
> > > horrible in order to illustrate bad sequences clearly.
> > I'd prefer little red wiggly lines under the word, or
> yellow background
> > or some such (just for screen display, not for printing;
> screen grabs
> > not counted). And that for any spelling "error".
> Spelling mistakes can be categorized into two different classes.
> arising from illegal input sequence (e.g., Vowel Sign E as the first
> character in a word)
There are no illegal input sequences.
> and the other one is legal input sequence with no
> contextual meaning in the dictionary.
A simple spell checker just checks if the word is in the
dictionary or not (without worrying about the context).
That would catch what you call "illegal input sequences" too.
> While indication of the second type
> of mistake is generally used only in sophisticated
> applications like word processor,
Why? There is nothing in principle hindering a spell checker
to be used in a "plain text" editor.
> everyone wants to know the first kind of mistake.
Without a spell checker, but with proper rendering, spelling
errors can be detected by a fluent reader, since they look
different also without any dotted circles. For some ambiguous
Indic cases, like a prefix matra, consonant, postfix matra, all
possible character sequences for them are misspellings (as far
as I know).
> With your
> explanation it seems that even plain text editor is not
> useful at all to identify such common typing mistakes!
Consider English. If I write "nnnn", that may well be a spell error.
Do I deserve to get the rendering of that string to be littered by
dotted circles just because a sequence of four n's "has to" be
a spell error?
> - Keyur
This archive was generated by hypermail 2.1.5 : Mon Feb 03 2003 - 13:58:41 EST