Re: Looking for transcription or transliteration standards latin- >arabic

From: Asmus Freytag (
Date: Fri Jul 09 2004 - 21:34:29 CDT

  • Next message: Jony Rosenne: "RE: Arabic written in Syriac? Arabic written in Tifinagh?"

    At 08:33 PM 7/9/2004, John Cowan wrote:
    > > I have just reviewed this list and found it odd that Hebrew presentation
    > > forms are included but Arabic ones are not.
    >The specification actually called only for Latin, Greek, and Cyrillic;
    >I added Hebrew pour la lagniappe. If someone wants to add Arabic, I
    >encourage them to do so.
    > > the Hebrew presentation forms but also most of the precomposed
    > > characters are redundant in this list.
    >True; however, the current list indicates the scope of what actually
    >happens, even if it is overlong.

    I have taken the file from the server today and massaged it to be in a form
    suitable for inclusion in the next draft of TR#30, which will be issued in
    time for the UTC to review it in August.

    Once the review issue opens for this draft, please comment on the review
    form, so that the UTC has formal input to evaluate.

    My understanding of the folding would be that it would be more agressive in
    diacritic folding than some languages, so that it is useful in cross
    language searching. For example, it should allow English users to search
    for words with accented characters in them by supplying the equivalent word
    spelled in base letters only.

    'i' has a dot, but doesn't have a base letter that's more 'basic' than
    itself, since dotless-i, while theoretically there, is more specialized and
    not universally accessible from input devices.

    o-slash, can be analyzed as o and slash, even though that's not done
    canonically in Unicode. Allowing users outside Scandinavia to perform
    fuzzy searches for words with this character is useful.

    In this view of folding, Language-specific fuzzy searches would be tailored
    (usually by being based on collation information, rather than on generic
    diacritic folding).


    This archive was generated by hypermail 2.1.5 : Fri Jul 09 2004 - 21:36:06 CDT