Re: Biblical Hebrew (U+034F Combining Grapheme Joiner works)

From: Peter_Constable@sil.org
Date: Wed Jul 02 2003 - 13:09:02 EDT

  • Next message: Nick Nicholas: "Nick on prosgegrammeni"

    [Inadvertently sent just to me; forwarded with Philippe's permission]

    On Wednesday, July 02, 2003 7:03 AM, Peter_Constable@sil.org
    <Peter_Constable@sil.org> wrote:
    > Philippe Verdy wrote on 06/28/2003 02:48:01 AM:
    >
    > > If the user strikes the two keys <patah> and <hiriq>, the input
    > > method for Traditional Hebrew will generate <patah,CGJ,hiriq>
    >
    > That requires* an input method that is aware of the input context (or
    > of what has already been input -- but awareness of context is far more
    > reliable).

    Not necessarily: the keyboard driver may return host-specific PUA for the
    vowels, and these will be mapped visually to render them with CGJ on the
    display interface, and the edited file can then be saved to standard
    Unicode by remapping them to the standard Unicode sequences, and an editor
    aware of this use of CGJ can also recreate these vowels by remapping
    <CGJ+hebrew vowel> to a single PUA during the edition, as this facilitates
    the internal implementation of character selection and string
    search/replace operations.

    Yes it requires some knowledge of this particular encoding in the editor,
    but it's not impossible. So in Traditional Hebrew mode, the vowel
    keystrokes could either be returned all with <CGJ+vowel> codepoints (not
    <vowel+CGJ> as it would be incorrect), or as PUA if this facilitates the
    implementation (notably for mouse selection), and unnecessary extra CGJ
    codepoints can easily be removed when saving the file.

    An alternative method may also be to use a single PUA instead of CGJ in
    the edited text, if one wants to preserve CGJ codepoints present in the
    input stream. This PUA would be mapped by the editor as meaning: "don't
    reorder the following combining character when serializing the text, so
    that the following combining character will keep its relative order after
    normalization", and it could then be completely language neutral.



    This archive was generated by hypermail 2.1.5 : Wed Jul 02 2003 - 13:51:42 EDT