Re: Grapheme clusters

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Oct 06 2004 - 06:32:23 CST

  • Next message: Peter Constable: "RE: Sample of german -burg abbreviature"

    From: "Chris Harvey" <chris@languagegeek.com>
    > The users seem determined to put the entire alphabet into the PUA, thus
    > making a single character for <ng>, <kw>, <ii> etc. I would like to be
    > able to present them with something that works and avoid this kind of
    > catastrophe.

    A better alternative to PUAs, which would require specific fonts and no
    interopable solution would be to use controls that make explicit grapheme
    clusters: ZWJ notably, and make sure that the editor handles it effectively
    as a single cluster, including for backspace.

    Or, may be using existing combining modifier letters, even if they look like
    superscript in existing fonts (if you are ready to go to PUAs, you would
    need to develop a font for them), but as we don't know the whole extents of
    the "alphabet", it's hard to determine which solution is best.

    I am assuming (I'm possibly wrong) that you'll need it to support some
    African languages, and if so, there are existing proposals to increase their
    support in Unicode with pending new Latin letters. Using PUAs could be an
    interim solution, before new characters are introduced, notably if you need
    combining modifier letters to act with the base letter as a single cluster.

    If you need that to support the Latin transliteration of Native North
    American languages that you support on your web site, as a convenient tool
    allowing a reverse transliteration to the native script (which has
    constraints on its syllabic structure), and a convenient way to fix the
    Latin orthography in order to create richer contents transliterated
    appropriately and automatically into the native script, may be you need
    really a specific editor that can check and enforce the Latin orthography.

    For example you cite the case of Pacific coast schwas, raised consonants and
    ejectives (like ə kw q̉), or Hawayian long vowels (with macrons, rarely
    supported in fonts) which are difficult to enter with existing keyboards and
    fonts. Using a more basic ASCII-based orthography seems like an input method
    for such languages, and an intermediate before the production of actual
    existing Unicode characters using the proper combining or modifier letters
    (in that case, Unicode itself is not the issue, and you may wonder how to
    create an input method editor which can show a "simplified" ASCII-only
    transliteration which can reliably be converted to the more exact
    orthography.



    This archive was generated by hypermail 2.1.5 : Wed Oct 06 2004 - 06:47:40 CST