RE: Generic base characters

From: Kent Karlsson (kent.karlsson14@comhem.se)
Date: Tue Jul 17 2007 - 04:33:40 CDT

Next message: Kent Karlsson: "RE: Generic base characters"

Previous message: Kent Karlsson: "RE: Generic base characters"
In reply to: Michael Maxwell: "RE: Generic base characters"
Next in thread: Otto Stolz: "Triple vowels (was: Generic base characters)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Michael Maxwell wrote:

I wrote:
> Because when we are entering Indic script text
> (for example), I have found it very helpful to
> have something obvious appear on the screen that
> indicates I made a mistake at the level of the script.

To which Kent Karlsson replied:
> There is no error at "the level of the script".

I thought my meaning would be obvious, but apparently I was wrong.

By "error at the level of the script", I meant having a dependent character without any character for it to be dependent on.

In that case there is an implicit NBSP base.

A diacritic not preceded by a base character, or a Bengali (etc.) dependent vowel sign not preceded by a Bengali consonant.

(Assuming this case is meant to be disjoint from the previous case:) But there is still a base character for it. The base needn't be
in the Bengali script. Or there may be other combining characters (Bengali or not) between the base and the considered instance of a
combining character.

One can imagine living in a parallel universe in which Unicode (and ISCII) represented Bengali (etc.) vowel signs and vowel letters
as alternative glyphs of a single character/ code point.

That would be just plain wrong. The combining ("dependent") vowels and the independent vowels look different, and behave
differently. It is NOT a matter of glyph variation.

In that case, I suppose a sequence of Bengali characters MA + O + O + O would be rendered as Bengali 'M' with the 'O' vowel sign to
its left and right, followed by two 'O' vowel letter glyphs.

That is a completely different kind of character string than the ones we are talking about.

(That's just a guess on my part of what the appropriate behavior would be, based on other vowel sequences I've seen in
Bengali--which are typed as vowel sign followed by vowel letter.)

But I don't (and I suspect you don't) live in that universe. I live in a universe in which vowel signs and vowel letters are
distinguished in Unicode as distinct code points. And so in my universe a sequence of vowel signs is just as bad as a diacritic
without a base character,

In that case there is an implicit NBSP base. Note that you can have multiple diacritics applied to a base character.

and it doesn't require a spell checker to know that. Hence an error at the level of the script (OK, to be technical, the script as
implemented in Unicode/ ISCII).

Deviating from the most common (or official) application of a script does not constitute an "error at the level of the script". If
I write moooose, I deviate from the common (official) application of the Latin script (and you can detect that without using a spell
checker). That does not make it an error "at the level of the script". That argument does not change just because the vowel
characters are combining characters

/kent k.

Putting it differently, a sequence of vowel signs would be just as bad in any other language using the Bengali (etc.)
script--Assamese, say. Whereas a spell checker would be particular to a certain language (and probably to a single writing system
for that language).

Mike Maxwell
CASL/ U MD

Next message: Kent Karlsson: "RE: Generic base characters"
Previous message: Kent Karlsson: "RE: Generic base characters"
In reply to: Michael Maxwell: "RE: Generic base characters"
Next in thread: Otto Stolz: "Triple vowels (was: Generic base characters)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Jul 17 2007 - 04:35:31 CDT