Re: Yerushala(y)im - or Biblical Hebrew

From: Joan_Wardell@sil.org
Date: Mon Jul 28 2003 - 17:25:01 EDT

  • Next message: John Cowan: "Re: Yerushala(y)im - or Biblical Hebrew"

    > Why can't we just fix the database? :)

    Because changing the canonical ordering classes (in ways not
    allowed by the stability policies) breaks the normalization
    *algorithm* and the expected test results it is tested against.

    If the "expected test results" are bad data, it shouldn't matter then if it
    is consistent. Are you
    saying that somewhere there are lots of people who have worked very hard to
    implement
    Hebrew as it is currently described in Unicode 3 and they would have to
    "start over" if we
    changed the canonical order? And the biggest fear is that the data today
    won't be
    consistent with the data in the new order? My point is that there *is* no
    data today,
    because anyone who has attempted to produce biblical Hebrew data in the
    current
    canonical order would have stopped and said "Wait a minute! This won't
    work".

    That's what I'm saying. And I have no particular problem with the CGJ
    suggestion, but
    it doesn't go far enough. I don't think we can use it to fix meteg, a mark
    which occurs in
    three different positions around a low vowel, yet is canonically ordered
    before the shin/sin
    dots! Will we put one CGJ on the right to indicate a right meteg and one on
    the left to indicate
    a left meteg? There are many other examples of problems with the current
    canonical order.

    The apparent simplest solution to all the problems is to correct the
    canonical order.

    >>Unless you
    are talking about conversion algorithms for batch conversion of
    existing Biblical Hebrew repositories into Unicode -- but those
    are specialized code to begin with, and it is much less impact to
    ask people to update the tables in those to insert a CGJ into
    the point sequences than it is to ask all implementers to deal
    with the consequences of broken normalization.

    Yes, I am talking about the person writing a batch conversion from existing
    data into
    Unicode. That would be me. If you were only suggesting we insert one CGJ, I
    wouldn't complain.
    But we are looking at re-writing the font, the keyboards, and the
    conversion so that we can
    work around the numerous problems with canonical order. I am selfishly
    preferring that
    you "normalizers" re-write your code. :)

    Joan W.



    This archive was generated by hypermail 2.1.5 : Mon Jul 28 2003 - 17:58:28 EDT