RE: Generic base characters

From: Kent Karlsson (
Date: Tue Jul 17 2007 - 04:33:40 CDT

  • Next message: Kent Karlsson: "RE: Generic base characters"

    Michael Maxwell wrote:

    I wrote:
    > Because when we are entering Indic script text
    > (for example), I have found it very helpful to
    > have something obvious appear on the screen that
    > indicates I made a mistake at the level of the script.

    To which Kent Karlsson replied:
    > There is no error at "the level of the script".

    I thought my meaning would be obvious, but apparently I was wrong.

    By "error at the level of the script", I meant having a dependent character without any character for it to be dependent on.

    In that case there is an implicit NBSP base.

      A diacritic not preceded by a base character, or a Bengali (etc.) dependent vowel sign not preceded by a Bengali consonant.

    (Assuming this case is meant to be disjoint from the previous case:) But there is still a base character for it. The base needn't be
    in the Bengali script. Or there may be other combining characters (Bengali or not) between the base and the considered instance of a
    combining character.

     One can imagine living in a parallel universe in which Unicode (and ISCII) represented Bengali (etc.) vowel signs and vowel letters
    as alternative glyphs of a single character/ code point.

    That would be just plain wrong. The combining ("dependent") vowels and the independent vowels look different, and behave
    differently. It is NOT a matter of glyph variation.

     In that case, I suppose a sequence of Bengali characters MA + O + O + O would be rendered as Bengali 'M' with the 'O' vowel sign to
    its left and right, followed by two 'O' vowel letter glyphs.

    That is a completely different kind of character string than the ones we are talking about.

     (That's just a guess on my part of what the appropriate behavior would be, based on other vowel sequences I've seen in
    Bengali--which are typed as vowel sign followed by vowel letter.)

    But I don't (and I suspect you don't) live in that universe. I live in a universe in which vowel signs and vowel letters are
    distinguished in Unicode as distinct code points. And so in my universe a sequence of vowel signs is just as bad as a diacritic
    without a base character,

    In that case there is an implicit NBSP base. Note that you can have multiple diacritics applied to a base character.

     and it doesn't require a spell checker to know that. Hence an error at the level of the script (OK, to be technical, the script as
    implemented in Unicode/ ISCII).

     Deviating from the most common (or official) application of a script does not constitute an "error at the level of the script". If
    I write moooose, I deviate from the common (official) application of the Latin script (and you can detect that without using a spell
    checker). That does not make it an error "at the level of the script". That argument does not change just because the vowel
    characters are combining characters

            /kent k.

    Putting it differently, a sequence of vowel signs would be just as bad in any other language using the Bengali (etc.)
    script--Assamese, say. Whereas a spell checker would be particular to a certain language (and probably to a single writing system
    for that language).

      Mike Maxwell
      CASL/ U MD

    This archive was generated by hypermail 2.1.5 : Tue Jul 17 2007 - 04:35:31 CDT