Re: Yerushala(y)im - or Biblical Hebrew

From: John Hudson (
Date: Mon Jul 07 2003 - 22:23:42 EDT

  • Next message: John Burger: "Re: French group separators"

    At 08:51 07/07/2003, Ted Hopp wrote:

    > > > ... Given the small number of attested sequences that would be
    > > > adversely affected by normalisation re-ordering, I'm beginning to
    > > > favour the idea of encoding these sequences as individual characters.
    > > > We'd probably only need three or four, plus a right meteg, to solve
    > > > the problem, and rendering would work find with existing font and
    > > > layout engine technologies.
    > >
    > > This sounds like a sensible alternative.
    >This would make data entry difficult for users. Nobody thinks of these
    >character sequences as single characters.

    If, as Ken suggested, it is feasible to use CGJ or another control
    characters without the user needing to know about it, i.e. as something
    inserted in the backing string from input in which only the mark characters
    are entered by the user, then it should be feasible, and probably easier,
    to hide the use of these precomposed mark combinations.

    > Editing would also be an
    >"interesting" experience. Could one search for lamed-patah and find it as
    >part of lamed-<patah+hiriq>? Or would the proposal be to use these new codes
    >only as part of bookend processing around normalization (i.e., automatically
    >recognize the sequences and substitute, normalize, and then automatically
    >substitute back)?

    I suppose the latter is feasible. I am very keen that *any* solution should
    be invisible to the user.

    >I think we need to keep Peter Constable's point in mind that current usage
    >should not define the limits of Unicode functionality. Since the principle
    >is that all sequences of character codes are permitted (2.10), it seems
    >wrong to supply a fix for only "the small number of attested sequences".

    This is a concern, but not an overriding one. Yes, all sequences are
    permitted, and some will be reordered during normalisation. We are
    currently aware of a small number of attested sequences that definitely
    should not be reordered. At this stage, I really don't care whether other,
    unattested Hebrew mark sequences are reordered or not, just as I know there
    are some sequences that Uniscribe cannot render and some that my fonts
    cannot render. That said, it is always a possibility that some new sequence
    will be attested in an as yet undiscovered or unpublished manuscript, which
    is a legitimate if minor concern.

    John Hudson

    Tiro Typeworks
    Vancouver, BC

    The sight of James Cox from the BBC's World at One,
    interviewing Robin Oakley, CNN's man in Europe,
    surrounded by a scrum of furiously scribbling print
    journalists will stand for some time as the apogee of
    media cannibalism.
                             - Emma Brockes, at the EU summit

    This archive was generated by hypermail 2.1.5 : Mon Jul 07 2003 - 22:59:46 EDT