RE: [hebrew] Re: Karaite manuscript

From: Philippe Verdy (
Date: Sun Jul 22 2007 - 13:08:23 CDT

  • Next message: Philippe Verdy: "RE: Orthographies using ZWNJ (was: Displaying control characters)"

    Are you making here a proposal to encode the Arabic
    archaeographemes/archeographemes (or “archigraphemes” as you call them, but
    I’m not sure this is a correct term for English, as “archi-“ is another
    prefix with another meaning to mark emphasis, stronger than “super-” and
    quite similar to “hyper-“), i.e. the skeletons (without the normally
    required markers), and possibly too, the markers themselves, separately ?


    If these were encoded in some extended Arabic block, I’m not sure it will
    cause severe havoc. Even for searches over the Internet or in plain-text
    documents, the morphological similarities between otherwise unrelated modern
    letters can be analyzed by some custom “decomposition” using PUAs (for now,
    because these units are not encoded separately), or using a tailored


    As this will be needed for palaeographic studies, most of the existing texts
    will not have to be re-encoded and changed, even if they appear to be really
    composite letters. Anyway, the Unicode stability prohibits “decomposing”
    them using any normalized decomposed forms ; this can still be done
    privately or through local collation algorithms, built specifically for
    paleographers. There should be no change to existing Arabic texts, and the
    letters should not be decomposed in standard texts.


    Anyway, the issue is quite similar with other letters in alphabetic scripts
    : the ae and oe ligatures in Latin can be decomposed in some languages, and
    they still should be decomposed when doing morphological analysis, even in
    today’s modern texts (at least in French), even if they should not be
    decomposed this way in standard texts (but it’s true that Unicode provided
    compatibility decompositions for them, something that was not done for
    Arabic letters with markers, and that can’t be done now)…



    De : Thomas Milo []
    Envoyé : jeudi 19 juillet 2007 22:09
    À : Simon Montagu;
    Cc : 'John Hudson';; 'Hebrew List'


    All these observations about asynchronic text notation (text recorded in
    phases) using independent character subsets (archigraphemic skeleton,
    disambiguation dots, vowel marks) even across nominally different writing
    systems also pertain to Arabic. Particularly regarding the text transmission
    of the Holy Qur'an this is very relevant.


    HQ Codices of the first few centuries were written without consonant markers
    (originally not points but small nib imprints) and vowel disambiguation
    marks (which were points in the earliest Arabic script). Editors
    (contemporary or later) added the consonant disambiguation markers and vowel
    signs (personal communication from Yasin Dutton during the Corpus Coranicum
    Workshop organized by the European Science Foundation in 2005, Berlin).



    To this day, this horizontal segmentation remains the deep structure of
    Arabic. Understanding it helps to deal with its generative power to combine
    any marker with any basic letter (i.e., archigrapheme). Hebrew, Aramaic and
    Arabic do occur in various mixes along this horizontal segmentation, which
    provides an additional argument for dealing with the horizontal segmentation
    of Arabic and related scripts.


    Unicode's present fixation with vertical segmentation (leading to the
    irrelevant concept of ligatures) in Arabic and national subsets leads to


    1. uneconomical proliferation of Arabic code points consisting of generic
    archigraphemes and generic markers

    2. serious problems in digitizing historical and even contemporary texts.


    For examples of see my Unicode Tutorial, page 7 for examples of
    Unicode-induced ambiguity in encoding exactly identical Arabic character
    groups and on page for examples of 12 the resulting every-day chaos:


    This archive was generated by hypermail 2.1.5 : Sun Jul 22 2007 - 13:11:49 CDT