Re: Yerushala(y)im - or Biblical Hebrew

From: Ted Hopp (ted@newslate.com)
Date: Mon Jul 28 2003 - 20:47:49 EDT

  • Next message: Jonathan Coxhead: "Re: Yerushala(y)im - or Biblical Hebrew"

    Okay, Ken. I'm beginning to get it after reading your thoughtful
    explanations and after reading through the following two documents (highly
    recommended to all following this thread):

    http://www.w3.org/TR/WD-charreq
    http://www.w3.org/TR/charmod/

    After reading through some of the archives (some pointers to the relevant
    parts would be helpful, please--something beyond "consult the archives"), it
    strikes me that normalization, with its severe requirements, is going to
    eventually so distort Unicode that it will render it nearly unusable.
    Consider the thread that starts at
    http://www.unicode.org/mail-arch/unicode-ml/Archives-Old/UML020/0651.html
    (from 1999, for goodness sake!): if umlaut had been a later addition to
    Unicode, no vowel-umlaut code could be allowed to have a decomposition to
    vowel + umlaut after the umlaut was introduced (else normalization
    idempotence breaks). Conversely, if umlaut, but none of the composed
    vowel-umlaut characters, had been in from the start, when the latter were
    added they would all have to go into the compositions exclusions list (else
    normalization idempotence breaks). Obviously, neither occurred with umlaut,
    but the point is, I hope, clear. Normalization will ossify Unicode: it will
    become harder and harder to accept new, clean encodings. This is truly going
    to become the tail that wags the dog.

    My prediction: normalization will eventually force some sort of version
    indicator to be included in all (normalized) Unicode text. (Weak analogy:
    much as DTD references are, either explicitly or implicitly, part of all XML
    documents).

    Normalization and its applications (such as early normalization for string
    identity matching) may indeed be the show-stopper (today), so this question
    may be moot, but I'll ask it anyway: Are there any other uses of combining
    classes that would break (in ways apart from normalization breaking) if the
    assignments for the Hebrew vowels were changed? We might as well be sure
    that we know the entire scope of the issues involved.

    Ted

    Ted Hopp, Ph.D.
    ZigZag, Inc.
    ted@newSLATE.com
    +1-301-990-7453

    newSLATE is your personal learning workspace
       ...on the web at http://www.newSLATE.com/



    This archive was generated by hypermail 2.1.5 : Mon Jul 28 2003 - 21:22:03 EDT